Getting Started with Azure Automl for Machine Learning

Author

Reads 1.2K

An artist’s illustration of artificial intelligence (AI). This image represents how machine learning is inspired by neuroscience and the human brain. It was created by Novoto Studio as par...
Credit: pexels.com, An artist’s illustration of artificial intelligence (AI). This image represents how machine learning is inspired by neuroscience and the human brain. It was created by Novoto Studio as par...

Azure AutomL is a powerful tool that allows you to automate the machine learning process, saving you time and effort.

To get started with Azure AutomL, you'll need to create a workspace, which is the central hub for all your machine learning activities. This workspace will store your models, datasets, and other resources.

Automating machine learning with Azure AutomL is a great way to streamline your workflow and improve model accuracy. By automating the process, you can focus on higher-level tasks and make data-driven decisions.

To start, you'll need to upload your dataset to Azure AutomL, which can be done directly from the Azure portal or through the Azure Machine Learning SDK.

Getting Started

Azure Auto ML is a powerful tool that can help you automate machine learning tasks. You can use it for free on a new Azure account.

To get started, you'll need to understand the two phases of Auto ML: training and serving. If a variable is not available during prediction, don't include it in training.

Discover more: Azure Ai Ml Studio

Credit: youtube.com, Introduction to Azure AutoML using regression

First, you'll need to configure the Auto ML job in Azure Machine Learning Studio. This is where you'll define the target column the service will try to predict and allocate a compute resource for the computation.

Cloud compute resources are billed by usage, so be mindful of the costs. For example, if you select a compute resource that costs $0.04 per hour and you're running it for four hours, you'll be billed $0.16.

In Azure, there's no service or license fee for the ML Studio, only a charge for the resources used. Check the pricing calculator for more details.

Next, you'll need to choose the machine learning task. You can choose from regression, classification, or time series, but keep in mind that time series is essentially regression with a different flavor.

To avoid overspending, consider limiting the maximum run time to 1 hour when you're just starting out.

A unique perspective: Azure Ai Ml

Automated Machine Learning

Automated Machine Learning is a game-changer for data scientists and analysts who want to build accurate machine learning models without the need for extensive manual trial and error. It uses Azure Machine Learning Studio to parallel process multiple models, saving time and identifying the best model for a particular use case.

Credit: youtube.com, Performing Automated Machine Learning with AzureML

With Azure Machine Learning Studio, you can automate machine learning using supervised models where you have training data and known labels. These models include classification, regression, and time series forecasting.

To run an automated machine learning algorithm, you need to specify the dataset with labels, configure the automated machine learning run, select the algorithm and settings, and review the best model generated. Azure Machine Learning Studio supports only supervised machine learning models.

To create an automated machine learning job, you need to select your subscription and workspace, select Authoring > Automated ML, and select New Automated ML job. You can also use Auto ML with other platforms like Google, Amazon, DataRobot, H2O, Dataiku, etc.

Auto ML has two phases: training and serving (also called inference or prediction). During training, you need to understand that if a variable is not available during prediction, don't include it in training.

When configuring the Auto ML job, you need to choose the machine learning task, which includes regression, classification, and time series. You can also limit the maximum run time to 1 hour by selecting "View additional configuration settings" → "Exit criterion" → 1 hour.

Credit: youtube.com, Getting Started with Azure Automated ML Classification

Azure Auto ML will do some quality checks on your data and prompt you if there are any major problems. It will then try to find the best model that can compute your target variable given all other variables.

To identify the best-performing model, you can specify the metrics for the same. Azure ML studio identifies the best-performing model by using the primary metric, which is the metric for which you want to optimize the model.

Azure Machine Learning has some useful features that make things easier for a wider audience, including on-demand compute, data ingestion engine, workflow orchestration, machine learning model management, and model deployment.

Key Features

Azure AutomL offers a range of key features that make machine learning more accessible to a wider audience.

One of the standout features is on-demand compute that can be customized based on the workload, making it easier to manage resources.

The data ingestion engine in Azure AutomL is extensive, accepting a wide variety of sources, which is a huge time-saver.

Credit: youtube.com, Build Recap | What’s new in Azure Machine Learning Automated ML

With Azure AutomL, workflow orchestration for machine learning is incredibly simple, eliminating the need for manual setup.

Azure AutomL has dedicated capabilities to manage machine learning model evaluation, making it easy to compare and select the best model.

Metrics and logs of all model training activities and services are readily available on the platform, providing valuable insights for optimization.

Here are some of the key features of Azure AutomL in a nutshell:

  • On-demand compute
  • Data ingestion engine
  • Workflow orchestration
  • Machine learning model management
  • Metrics & logs
  • Model deployment

Building and Running

Building and running Azure AutoML models involves several steps. You can choose from three approaches: Expert mode, Automated Machine Learning, and Designer mode. Expert mode allows you to use your programming knowledge to train machine learning models, while Automated Machine Learning uses Azure's machine learning studio to evaluate multiple models and return the best performing one.

To build a model in Azure AutoML, you can use the diabetes dataset, which is available on GitHub. The goal of this dataset is to determine if a person would be diabetic or not. You can use a logistic regression model for this prediction, as the outcome is a categorical measure.

Credit: youtube.com, How to kick off an Azure AutoML model training run

You can create a resource instance in Azure AutoML and provide the requested information, such as the workspace name, region, and storage account. You can also create a compute instance and get the path of the CSV file that you uploaded as a dataset. Data files can be accessed using datastores, which store connection information to Azure storage services.

Here are the three approaches to building machine learning models in Azure AutoML:

  1. Expert mode: Use your programming knowledge to train machine learning models.
  2. Automated Machine Learning: Use Azure's machine learning studio to evaluate multiple models and return the best performing one.
  3. Designer mode: A graphical utility that works along the lines of the No-code paradigm.

In Azure AutoML, an experiment is a collection of trials that represent multiple model runs. You can run experiments with different data, code, and settings. A run represents a single trial for an experiment, and you can use the get_metric method of the run class to print metrics like regularization rate, AUC, and accuracy.

Create Notebook and Connect to Workspace

To create a notebook and connect to your workspace, you'll need to import the azureml-core package, which enables you to connect and write code that uses resources in the workspace.

Credit: youtube.com, Databricks | Notebook Development Overview

This package is a Python package that's essential for working with Azure Machine Learning. You'll use it to access your workspace and its resources.

In my experience, the datastore name is a crucial piece of information. In this case, the datastore name is 'workspaceblobstorage'. You can find all the data sources registered by going to Home > Datasets > Registered DataSets.

To access your workspace, you'll need to have an Azure Machine Learning workspace set up. An Azure Machine Learning workspace is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models.

To create a workspace, you'll need to provide some basic information, including the workspace name, subscription, resource group, and region. Here are the details you'll need to provide:

Once you've created your workspace, you can create a notebook and connect to it using the azureml-core package.

Create and Load a Dataset

To create and load a dataset, you need to think of it as your dataset for the Automated ML job.

Credit: youtube.com, Why do we split data into train test and validation sets?

Upload your data file to your workspace in the form of an Azure Machine Learning data asset. This ensures your data is formatted appropriately for your experiment.

In Task type & data, choose Classification as the task type. Under Select data, choose Create to create a new data asset.

Select your dataset from the list, then review it by selecting the data asset and looking at the preview tab. Ensure it doesn't include day_of_week and select Close.

Select Next to proceed to task settings.

Run the Script

To run the script, you need to submit it as an experiment and pass the ScriptConfig details. This can be done by using the get_metric method of the run class to print the metrics like Regularization rate, AUC, Accuracy, etc.

You can use the Azure Machine Learning studio to run the experiment. First, you need to create a new automated ML job and configure it with the required settings, including the Job name and Experiment name.

An artist’s illustration of artificial intelligence (AI). This image represents how machine learning is inspired by neuroscience and the human brain. It was created by Novoto Studio as par...
Credit: pexels.com, An artist’s illustration of artificial intelligence (AI). This image represents how machine learning is inspired by neuroscience and the human brain. It was created by Novoto Studio as par...

The compute resource can be local or in the cloud, and cloud compute resources are billed by usage. For example, if you select a compute resource that costs $0.04 per hour and you are running it for four hours, you will be billed $0.16.

To configure the Auto ML job, you need to choose the machine learning task, which can be regression, classification, or time series. You also need to select the compute resource and allocate a compute resource for the computation.

Here are the steps to configure the Auto ML job:

  • Choose the machine learning task: regression, classification, or time series
  • Select the compute resource
  • Allocate a compute resource for the computation
  • Limit the maximum run time to 1 hour by selecting “View additional configuration settings” → “Exit criterion” → 1 hour

By following these steps, you can configure the Auto ML job and run the experiment to get the desired results.

Configure

Configure your Azure AutoML experiment by selecting the training model to use. You can choose from classification, regression, time series forecasting, natural language processing, or computer vision.

To select the training model, expand the Select task type dropdown menu and choose the desired model. The options include classification, regression, time series forecasting, natural language processing, or computer vision. For more information, see the descriptions of the supported task types.

Check this out: Binary Categorization

Credit: youtube.com, Chapter 32 Azure AutoML

The Additional configuration page shows default values based on your experiment selection and data. You can use the default values or configure the primary metric, enable ensemble stacking, use all supported models, and more.

The primary metric is used to score your model, and you can choose from various metrics such as accuracy, precision, recall, and more. For more information, see model metrics.

You can also enable ensemble stacking to improve machine learning results and predictive performance by combining multiple models. For more information, see ensemble models.

To use all supported models, select the Use all supported models option. This will allow you to configure the Blocked models setting, which allows you to exclude specific models from the training job.

The Blocked models setting is available when you select the Use all supported models option. You can use the dropdown list to select the models to exclude from the training job.

You can also configure featurization settings to perform actions on the data in preparation for training. Featurization is always enabled when your data contains non-numeric columns.

To configure featurization settings, select the Enable featurization option to allow configuration. You can then configure each available column, including feature type and impute with.

A unique perspective: Ai and Machine Learning Training

Credit: youtube.com, Automated Machine Learning Azure || Azure Automl Tutorial || Azure Machine Learning Tutorial

The featurization settings don't affect the input data needed for inferencing. If you exclude columns from training, the excluded columns are still required as input for inferencing on the model.

Here are the available featurization customizations:

You can also configure your experiment settings, including the target column, primary metric, and exit criterion. The target column is used to predict the outcome of the experiment.

The primary metric is used to score your model, and you can choose from various metrics such as accuracy, precision, recall, and more. For more information, see model metrics.

The exit criterion is used to determine when to stop the experiment. You can choose from various options, including a fixed number of iterations, a fixed time limit, or a specific accuracy threshold.

Here are the required parameters for a classification task:

  • Primary metric: accuracy
  • Exit criterion: 1 hour
  • Number of folds: 5

Metrics and Evaluation

Metrics and Evaluation are crucial components of Azure AutoML. You can configure various metrics in the studio, such as Explainability of AI, Discard algorithms, Exit criteria, Data split for validation, and Parallel processing.

You might enjoy: Confusion Matrix in Ai

Credit: youtube.com, Run an Automated Evaluation in Azure AI Studio

To evaluate the performance of your models, you can use metrics such as accuracy, AUC_weighted, average_precision_score_weighted, and precision_score_weighted. These metrics are suitable for classification multi-class scenarios. However, for small datasets, class imbalance, or when the expected metric value is close to 0.0 or 1.0, AUC_weighted might be a better choice.

For classification multi-label scenarios, accuracy is the only primary metric supported. For regression scenarios, you can use metrics such as r2_score, normalized_root_mean_squared_error, and normalized_mean_absolute_error to minimize prediction errors.

Studio Metrics

In the Azure Machine Learning studio, you can configure various metrics to help you evaluate and improve your models. One of the most important metrics is the Explainability of AI, which generates feature importance explanations for the best model identified.

You can also configure the discard algorithms, which are algorithms you can discard upfront and the automated engine will not consider these, helping to save cloud costs. Additionally, you can set exit criteria, such as the maximum amount of time or specific metric threshold, to stop the experiment.

Credit: youtube.com, How to evaluate ML models | Evaluation metrics for machine learning

The data split for validation is also configurable, allowing you to split the dataset between training data and test data. This is crucial for evaluating the performance of your models.

To monitor and evaluate your training results, Automated ML offers options for you to explore models and metrics. You can view the performance charts and metrics provided for each run, as well as the featurization summary and what features were added to a particular model.

The following metrics are used for regression scenarios:

These metrics help you measure the performance of your models and identify areas for improvement. By understanding what each metric represents, you can choose the most suitable metric for your specific use case.

Automated ML also allows you to view the training job details, including performance metric charts, model properties, and associated code. This helps you drill down on completed models and evaluate their performance.

In the Azure Machine Learning studio, you can configure various metrics to help you evaluate and improve your models. The metrics you can configure include the Explainability of AI, discard algorithms, exit criteria, data split for validation, and parallel processing settings.

By configuring these metrics, you can save cloud costs, improve model performance, and make data-driven decisions.

Metrics for Multi-Class Classification

Credit: youtube.com, Macro vs Micro for Imbalanced Multi-class Classification | Machine Learning Tutorials

Metrics for Multi-Class Classification are crucial in determining the performance of a model. Accuracy is a widely used metric for classification scenarios, including image classification, sentiment analysis, and churn prediction.

For small datasets or those with large class skew, metrics like accuracy might not be the best choice. AUC_weighted is a better option in these cases, as seen in fraud detection, image classification, and anomaly detection/spam detection.

Average_precision_score_weighted is particularly useful in sentiment analysis, where it can provide a more accurate measure of a model's performance.

Here's a summary of the metrics mentioned, along with their example use cases:

The choice of metric ultimately depends on the specific needs of your business, and it's essential to choose the one that best suits your goals.

Metrics for Multi-Label Classification

In multi-label classification, accuracy is a crucial metric to evaluate model performance.

For text classification multi-label, accuracy is the only primary metric supported. In my experience, this is often sufficient for simple tasks, but it can be limiting for more complex scenarios.

In image classification multi-label, the primary metrics are defined in the ClassificationMultilabelPrimaryMetrics enum. This enum provides a standardized set of metrics to compare model performance.

The specific metrics supported for image classification multi-label can vary depending on the enum.

Deployment and Scaling

Credit: youtube.com, Build and Deploy a Machine Learning Model using AutoML in Azure ML

Deployment and scaling are crucial steps in getting the most out of Azure AutoML. To deploy a model, you must register it to the workspace and then select the Deploy option in the studio, which can take about 20 minutes to complete.

You have two options for deployment: real-time (online) deployment, which lets you access your model over an API and get predictions back in real-time, and batch deployment, which lets you score large amounts of data and write the outputs to a flat file or database.

Automated ML supports distributed training for a limited set of models, including LightGBM for classification and regression tasks, and TCNForecaster for forecasting tasks, with data size limits of approximately 1 TB and 200 GB, respectively.

You can deploy a model as a web service using Automated ML, which allows you to integrate the model so it can predict on new data and identify potential areas of opportunity. For example, you can deploy the best model as a web service by selecting Deploy > Web service and populating the Deploy a model pane with the required information.

Credit: youtube.com, Model deployment and inferencing with Azure Machine Learning | Machine Learning Essentials

Here's a summary of the deployment options:

By following these steps and options, you can deploy and scale your models effectively using Azure AutoML.

Deploy Your

Deploying your model is a crucial step in making it accessible to others. You can deploy a model generated via the automl package with the Python SDK by registering it to the workspace.

There are two main options for deploying a model: real-time (online) deployment and batch deployment. Real-time deployment lets you access your model over an API and get predictions back in real-time. Batch deployment, on the other hand, lets you score large amounts of data and write the outputs to a flat file or database.

To deploy a model, you'll typically have an HTTPS endpoint which you can access to request predictions. You can also download the model and deploy it locally, so no need to host it online if you don’t want to.

Credit: youtube.com, The Best Way to Deploy AI Models (Inference Endpoints)

Here are the steps to deploy a model:

  1. Initiate the deployment by using one of the following methods:
  2. Populate the Deploy model pane with the required information, such as name, description, compute type, and authentication settings.
  3. Select Deploy and wait for the deployment to complete, which can take about 20 minutes.

The deployment process entails several steps, including registering the model, generating resources, and configuring them for the web service. You can monitor the deployment progress under the Deploy status section.

Once deployed, you can test the predictions by querying the service from the End-to-end AI samples in Microsoft Fabric.

Scaling Auto

Scaling Auto can be a game-changer for large data scenarios. Automated ML supports distributed training for a limited set of models, including LightGBM and TCNForecaster.

Data size limits apply to distributed training, with LightGBM supporting up to 1 TB of data and TCNForecaster supporting up to 200 GB.

Distributed training algorithms automatically partition and distribute your data across multiple compute nodes for model training.

Cross-validation, ensemble models, ONNX support, and code generation are not currently supported in the distributed training mode.

You'll need to set specific properties on the job object to use distributed training for forecasting tasks. These properties include training_mode, enable_dnn_training, max_nodes, and optionally max_concurrent_trials.

Worth a look: Azure Ai Training

Credit: youtube.com, Deployment Concepts with Auto Scaling.mp4

Here's a breakdown of these properties:

The number of nodes used for training is determined by the formula max(2, floor(max_nodes / max_concurrent_trials)).

Validate and Test

When you're working with Azure Automated ML, it's essential to validate and test your models to ensure they're accurate and reliable. The Validate and test section provides the necessary configuration options to achieve this.

You can specify the validation type to use for your training job, which will determine how Automated ML applies validation techniques based on the number of rows in your training data. If you don't explicitly specify a validation type, Automated ML will apply default techniques.

Automated ML applies a train/validation data split for datasets larger than 20,000 rows, taking 10% of the initial training data set as the validation set. For smaller datasets, a cross-validation approach is applied, with the default number of folds depending on the number of rows.

Here's a breakdown of the default validation techniques used by Automated ML:

You can also provide a test dataset to evaluate the recommended model generated by Automated ML. This will trigger a test job at the end of your experiment, allowing you to view the test results and metrics.

By following these steps, you can ensure that your Azure Automated ML models are thoroughly validated and tested, giving you confidence in their accuracy and reliability.

Prerequisites and Resources

Credit: youtube.com, Intro to AutoML on Azure | Learn with Dr G

To get started with Azure AutoML, you'll need a few things.

First, you'll need an Azure subscription, which can be either free or paid. You can create one by following the instructions on the Quickstart: Get started with Azure Machine Learning page.

You'll also need an Azure Machine Learning workspace or compute instance, which can be prepared by following the same Quickstart guide.

To use Azure AutoML, you'll need to have a data asset to train the model on. This can be an existing data asset or one that you create from a data source like a local file, web URL, or datastore.

Azure AutoML has two requirements for the training data:

Prerequisites

To get started with Azure Machine Learning, you'll need to meet a few prerequisites.

First and foremost, you'll need an Azure subscription, which can be either free or paid. Having a subscription will give you access to the resources you need to complete the tutorial.

Credit: youtube.com, Organize Your Course with Modules: Prerequisites and Requirements

Next, you'll need to create an Azure Machine Learning workspace or compute instance. This can be done by following the Quickstart guide, which will walk you through the process step by step.

You'll also need to have the data asset ready to use for the Automated ML training job. This can be an existing data asset or one that you create from a data source, such as a local file, web url, or datastore.

Here are the specific requirements for the training data:

  • Two requirements must be met for the training data.
  • These requirements are not specified in the provided text.

However, the text does mention that the data asset can be created from a data source, such as a local file, web url, or datastore.

Frequently Asked Questions

What is Azure AutoML?

Azure AutoML is a process that automatically selects the best machine learning algorithm for your specific data, enabling fast model generation. Learn more about how it works and its benefits in the Overview of the Automated ML process.

Which AutoML is best?

There is no single "best" AutoML, as each tool has its strengths and weaknesses, such as H2O's ease of use and Google Cloud AutoML's neural network architecture. To determine the best AutoML for your project, consider factors like your data type, complexity, and desired level of customization.

What is the difference between AutoAI and AutoML?

AutoAI is a broader field that encompasses AutoML, which automates the process of building and optimizing machine learning models. In other words, AutoAI is the umbrella term, while AutoML is a specific subset that focuses on automating machine learning tasks.

What is the difference between ML and AutoML?

AutoML automates the machine learning process, while Traditional ML requires manual steps like feature engineering and hyperparameter tuning. This key difference makes AutoML a more efficient and streamlined approach to solving real-world problems

What is AutoML used for?

AutoML automates the machine learning model development process, saving time and effort. It streamlines the creation of accurate and efficient models for various applications, from data analysis to predictive modeling.

Landon Fanetti

Writer

Landon Fanetti is a prolific author with many years of experience writing blog posts. He has a keen interest in technology, finance, and politics, which are reflected in his writings. Landon's unique perspective on current events and his ability to communicate complex ideas in a simple manner make him a favorite among readers.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.