Skip to content

The project aims at Operationalizing Machine Learning using Azure ML studio. In this, we create, deploy and consume an AutoML model and also create, publish and consume a Machine Learning Pipeline using Azure

Notifications You must be signed in to change notification settings

MonishkaDas/Operationalizing-Machine-Learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 

Repository files navigation

Operationalizing Machine Learning

This project is part of the Udacity Azure ML Nanodegree. The primary aim of the project is to Operationalize Machine Learning and put it to use. We create, deploy and consume an AutoMl model and also create, publish and consume a pipeline.

Overview

This dataset ("https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv") contains data about 32950 individuals. The data includes their age, marital status, education, housing, loans, contact etc. We first use Azure AutoMl tool on the dataset provided to find the best model based on the metrics (like Accuracy). We then deploy the model using Azure Container Instances and enable Application Insights and Authentication and then consume it and check the performance using Application Insights. Later, we create, publish and consume a pipeline using Jupyter Notebook Azure ML Studio.

Architectural Diagram

Alt text

Authentication

Authentication is crucial for the continuous flow of operations. Continuous Integration and Delivery system (CI/CD) rely on uninterrupted flows. When authentication is not set properly, it requires human interaction and thus, the flow is interrupted. An ideal scenario is that the system doesn't stop waiting for a user to input a password. So whenever possible, it's good to use authentication with automation.

Authentication types

Key- based

Azure Kubernetes service enabled by default Azure Container Instances service disabled by default

Token- based

Azure Kubernetes service disabled by default Not support Azure Container Instances

Interactive

Used by local deployment and experimentation (e.g. using Jupyter notebook)

Azure AutoML

Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

In this project we use AutoMl to find the model that provides the most accurate results. In this case, it was Voting Ensemble with 0.91958 accuracy.

Deploy

Deployment is about delivering a trained model into production so that it can be consumed by others. Configuring deployment settings means making choices on cluster settings and other types of interaction with a deployment. Having a good grasp on configuring production environments in Azure ML Studio and the Python SDK is the key to get robust deployments.

In this project, we deploy the best model, Voting Ensemble, with Azure Container Instances and we enable authentication.

Enable Application Insights

Application Insights collects log, performance, and error data. By automatically detecting performance anomalies and featuring powerful analytics tools, you can more easily diagnose issues and better understand how your functions are used. These tools are designed to help you continuously improve performance and usability of your functions. You can even use Application Insights during local function app project development.

In this project we run one of the starter files logs.py in order to enable Application Insights.

Consume Endpoints

Swagger is a tool that helps build, document, and consume RESTful web services like the ones you are deploying in Azure ML Studio. It further explains what types of HTTP requests that an API can consume, like POST and GET.

You can consume a deployed service via an HTTP API. An HTTP API is a URL that is exposed over the network so that interaction with a trained model can happen via HTTP requests.

Users can initiate an input request, usually via an HTTP POST request. HTTP POST is a request method that is used to submit data. The HTTP GET is another commonly used request method. HTTP GET is used to retrieve information from a URL. The allowed requests methods and the different URLs exposed by Azure create a bi-directional flow of information.

In this project we us ethe starter file endpoint.py to consume the endpoint of the deployed model. We send two input queries and we get appropriate response.

Create a Pipeline

This is the most common Python SDK class you will see when dealing with Pipelines. Aside from accepting a workspace and allowing multiple steps to be passed in, it uses a description that is useful to identify it later.

Publish pipelines

Publishing a pipeline is the process of making a pipeline publicly available. You can publish pipelines in Azure Machine Learning Studio, but you can also do this with the Python SDK.

When a Pipeline is published, a public HTTP endpoint becomes available, allowing other services, including external ones, to interact with an Azure Pipeline.

Pipelines can perform several other tasks aside from training a model. Some of these tasks, or steps are:

  1. Data Preparation
  2. Validation
  3. Deployment
  4. Combined tasks

Key Steps

Dataset

Alt text

AutoMl-Run

Alt text

AutoMl-Run The best model

Alt text

Voting Ensemble- The best model is deployed using Azure Container Instances and Authentication is Enabled

Alt text

Best Model Explanation

Alt text

Best Model Explanation- Aggregate plots

Alt text

Explanation- Feature Importance

Alt text

Explanation- Datapoints

Alt text

Deploy

Alt text

Model Deployed- Basic Consumption Info

Alt text

logs.py execution

We use logs.py to Enable Application Insights

Alt text

Applications Insight Enabled

Alt text

Application Insights

Alt text

Alt text

Directory Listing

Alt text

Running swagger.sh and serve.py

Alt text

endpoint.py and benchmark.sh Execution

Alt text

Performance Tracking

Alt text

Swagger- localHost

Alt text

Swagger Input Description Alt text

Swagger Responses Alt text

Pipeline RunDetails Widget

Alt text

Pipeline Publishing

Alt text

Published Pipeline

Alt text

Endpoint of the Pipeline

Alt text

Pipeline Runs

Alt text

Pipeline Endpoints

Alt text

Jupyter Notebook

Alt text

Screen Recording

Link to the screencast: https://youtu.be/X4hyRzPFG3Y

The screencast highlights the significant aspects of the project. It starts with diplaying the dataset in the datasets tab. Then, we check the AutoML Experiments tab and find the AutoMl run. We find the best model to be VotingEnsemble and check the explanation provided and the conclusions drawn from the experiment. We then move to the models section to find the deployed model 'bank-marketing-model-deploy'. We check thta the application insights in enabled and visit the link. We then run endpoint.py, benchmark.py and serve.py in the terminal. We launch the localhost to check the swagger documentation of our model. We then move to the pipelines section. The pipeline runs and endpoints are checked and we run through the jupyter notebook provided.

Future Work

Working on this project has been a rewarding experience. I look forward to working with Azure and checking even more of the salient features that it has to offer. I will explore different techniques, like regularization, cross validation, data cleaning as well as using only some of the features, and check how I can deploy the best possible model for each case. Furthermore, I will study the exceptions in detail so as to publish a service that works and shows appropriate results for every query.

About

The project aims at Operationalizing Machine Learning using Azure ML studio. In this, we create, deploy and consume an AutoML model and also create, publish and consume a Machine Learning Pipeline using Azure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.0%
  • Shell 3.5%
  • Python 3.5%