A Beginner-Friendly Guide To Automated Machine Learning

What you need to know to leverage Auto ML for Business Intelligence and Analytics.

Read time: 9 minutes

Hey there,

Today's edition will dig deeper into Auto ML:

  • Why Auto ML is crucial for leveraging AI in Business Intelligence

  • How to set up Auto ML (e.g. with Microsoft Azure)

  • Do’s and Don’ts of Auto ML

Auto ML, or Automated Machine Learning, automates the process of building, refining, and deploying machine learning models.

Many people think you need to be (at least!) a senior data scientist to use stuff like this.

However, once you understand how Auto ML can actually make things easier instead of more complicated, you'll never want to miss this tool again.

Let's get started!

PS: This post will be a good reference point for getting started with Auto ML. Make sure to bookmark this tutorial so you can go back whenever you need it!

I'm thrilled to announce my upcoming Business Analytics Bootcamp - LIVE with O'Reilly! 

🎁 As a small New Year's gift, use this promo link to get free access to the entire O'Reilly platform for 30 days - including my bootcamp!

Why Auto ML is crucial for leveraging AI in BI

Auto ML is important for implementing AI in BI because it allows you to build and deploy your own machine learning models without needing a data scientist in the loop - at least in the beginning.

This allows BI analysts to iteratively start working on use cases like trend modeling, churn prediction, sales forecasting and many more - making their dashboards overall more valuable as I showed here for example:

Now that we know what we can potentially do with Auto ML, it's finally time to get hands-on!

How to use Auto ML (on Azure)

I'll demonstrate how to use Auto ML with Azure Machine Learning Studio, as I find it most user-friendly and it's free to try on a new Azure account.

Note that similar principles apply to other Auto ML platforms from vendors such as Google, Amazon, DataRobot, H2O, Dataiku, etc.

Generally, to use Auto ML, you must understand that it has two phases: training and serving (also called inference or prediction).

Let's start with training.

Step 1: Set up Azure Machine Learning Studio

To use Auto ML, sign up for a free Azure account and once you're greeted with the Azure Portal, create a new resource group as shown below:

A resource group will bundle all the resources you will use, making it easy to delete them later.

Next, search for "Machine Learning' in Azure Portal and create a new Azure Machine Learning Workspace, assigning it to the resource group you just created. Give it a name and leave all other settings as default.

Then go to ml.azure.com to access your newly created ML Studio Workspace.

Setup complete - Hooray!

Step 2: Provide your dataset

To use Auto ML in Azure Machine Learning Studio, select "Automated ML" from the left menu and click "New Automated ML job."

In the following prompt, specify a dataset by uploading a file from your computer or importing from another source such as Azure blob storage or SQL.

Need a demo file? Choose “from Web files” and enter this URL which is a demo file hosted by Microsoft: 

https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv 

Note that 99% of all Auto ML services expect your data set to be "tidy," which means it has three distinct characteristics:

  • Each row is one observation (e.g. one customer)

  • Each column is one variable (e.g. total customer revenue)

  • Each cell is one value (e.g. total revenue for a certain customer)

So, before you go ahead and upload your dataset for training, make sure it is tidy - otherwise Auto ML will not work properly!

In the next step, you define the schema of your data by selecting which columns (variables) to use to create the model. If you follow the demo file, exclude the column "day_of_week" as shown here:

Make sure you set the data types correctly. For example, mark categorical data as string, otherwise it will confuse your model (e.g. a product code "246" isn't worth twice as much as the product code "123" - this is categorical data, even if it looks numeric).

Also, ensure that all variables specified in training will be available during prediction.

If a variable is not available during prediction, don’t include it in training.

Step 3: Configure the Auto ML job

An experiment in Auto ML helps to organize different runs with different settings, define the target column the service will try to predict, and allocate a compute resource for the computation.

The compute resource can be local or in the cloud. Cloud compute resources are billed by usage. For example, if you select a compute resource that costs $0.04 per hour and you are running it for four hours you will be billed $0.16.

For demo purposes, create a new compute resource by selecting "New" and setting up a compute cluster with low priority as shown in the image below:

In Azure, there is no service or license fee for the ML Studio, only a charge for the resources used (compute + storage in this case). Check the pricing calculator for more details.

Next is choosing the machine learning task.

Regression involves predicting a continuous numeric variable, classification assigns an observation to one of several classes, and time series involves predicting multiple steps into the future (- to be honest it’s essentially regression with a different flavor).

Submit the Auto ML job by hitting Finish and let the magic happen!

Tip: When you’re just starting out, limit the maximum run time to 1 hour by selecting “View additional configuration settings” → “Exit criterion” → 1 hour.

Step 4: Let Auto ML find the best model

Before training any model, Azure Auto ML will do some quality checks on your data and prompt you if there are any major problems.

Note that this does not replace a proper exploratory data analysis and data cleaning!

Auto ML will now try to find the best model that can compute your target variable (also called the label) given all other variables (called features or predictors), so you can use the model to make predictions for new (unseen) data points. This process is called Supervised Machine Learning.

Feel free to grab a coffee! ☕️ 

After some time the process will be finished (indicated by a green status bar like this):

First thing to do now: Stop the compute resource! (Yeah it's still running! Compute → Compute clusters → Stop/Delete)

Explore the different models generated by Auto ML, which are ranked based on an evaluation metric.

Evaluation metrics help you measure the performance of your machine learning model and there are different metrics for regression and classification tasks.

Elaborating on them would be too much for now, but if you're keen to learn more check out Chapter 3 of my book AI-Powered Business Intelligence.

To be clear: All the models you're seeing here are ultimately trying to do the same thing (predicting your target variable given all other variables).

In the end, you'll need only one model - usually the best-performing one.

If you want to get a better understanding of your model, check the “View explanation” tab to view things like the feature importance.

Congratulations, you have just completed the training phase!

Step 5: Deploy a model

A model is only useful if it's solving some problem, right?

So let’s make the best model available for other users or applications - a process called deployment.

You basically have two options to deploy your model:

  • Real-time (online) deployment: Lets you access your model over an API and get predictions back in real time (When in doubt, try this!)

  • Batch deployment: Lets you score large amounts of data and write the outputs to a flat file or database, typically at fixed time intervals (e.g. once a day).

In both cases you will typically have an HTTPS endpoint which you can access to request predictions. In our example, I deployed the model as a real-time web service:

Note that you can also download the model and deploy it locally, so no need to host it online if you don’t want to. See resources below for more details.

Step 6: Consume a model

When you have deployed your model in Azure ML Studio you will see it appear under Endpoints:

Here, you can find the URL and some demo scripts to get predictions from your model.

You can also test it in the browser and see how it works!

We're almost there!

How can you integrate this model into your reports or dashboards?

Again, this depends on your deployment strategy:

  • Online prediction: You'll send data from your BI dashboard to the model endpoint whenever you need to, retrieve the predictions, and integrate them into your report in real time (this can be done with a small script in R/Python/Power Query, etc.)

  • Batch prediction: You let the model predict (score) your data e.g. once a day and save the results in a file or database. Then you load this data into your report. This has the advantage that you don't have to include scripts in your BI, but it requires some work on the backend/infrastructure side.

Which strategy you choose ultimately depends on your setup. I usually find it easiest to start with online prediction first and see how well the model works before building the infrastructure.

Step 7: Clean up resources

If you no longer want to use the resources you created, delete them so you don't incur any costs: Navigate to "Resource Groups" in the Azure portal. Select the resource group you created → Delete Resource Group.

Do’s and Don’ts of AutoML

Auto ML can be a great tool, but at the same time it's tempting to just throw your data into an Auto ML tool and call it a day.

Don't do that!

Check out this LinkedIn post where I’ve written some best practices around Auto ML and join the discussion:

That’s it!

As always, thanks for reading.

Hit reply and let me know what you found most helpful this week—I’d love to hear from you!

See you next Friday,

Tobias

Resources

AI-Powered Business Intelligence Book Cover

If you liked this content then check out my book AI-Powered Business Intelligence (O’Reilly). You can read it in full detail here: https://www.aipoweredbi.com