Machine Learning February 02 ,2025

Deployment Options

Once the model is serialized, it can be deployed using different approaches, depending on the use case.

1. Deployment via REST APIs

Deploying Machine Learning Models via REST APIs

Deploying a machine learning (ML) model as a REST API enables applications, users, or other services to send input data and receive model predictions over HTTP requests. This approach makes ML models accessible and scalable, allowing real-time inference in web applications, mobile apps, or automated pipelines.

Why Deploy ML Models as REST APIs?

  • Accessibility: Users can interact with the model without needing to understand the internal ML code.
  • Scalability: APIs allow models to be served to multiple clients simultaneously.
  • Flexibility: ML APIs can be integrated into various applications, including web apps, mobile apps, and IoT devices.

Common Frameworks for Building ML APIs

Several Python frameworks are used to deploy ML models as APIs:

  1. FastAPI – Lightweight, fast, and ideal for high-performance applications.
  2. Flask – Simple and widely used, best suited for small projects.
  3. Django – Robust and structured, great for large applications with built-in security features.

1. Deploying an ML Model Using FastAPI

FastAPI is a modern, high-performance web framework for building APIs with Python. It is asynchronous by design and significantly faster than Flask.

Installation

pip install fastapi uvicorn joblib numpy

Example: Deploying an ML Model Using FastAPI

from fastapi import FastAPI
import joblib
import numpy as np

# Initialize FastAPI app
app = FastAPI()

# Load trained ML model
model = joblib.load("model.joblib")

# Define prediction endpoint
@app.post("/predict/")
def predict(data: list):
    prediction = model.predict(np.array(data).reshape(1, -1))
    return {"prediction": prediction.tolist()}

# Run using: uvicorn filename:app --reload

Running the FastAPI Server

Run the following command to start the server:

uvicorn filename:app --reload

FastAPI provides automatic interactive API documentation at:

  • Swagger UI: http://127.0.0.1:8000/docs
  • Redoc: http://127.0.0.1:8000/redoc

2. Deploying an ML Model Using Flask

Flask is a lightweight and widely used framework for creating APIs. It is easy to set up and well-suited for small-scale applications.

Installation

pip install flask joblib numpy

Example: Deploying an ML Model Using Flask

from flask import Flask, request, jsonify
import joblib
import numpy as np

# Initialize Flask app
app = Flask(__name__)

# Load trained model
model = joblib.load("model.joblib")

# Define prediction endpoint
@app.route("/predict", methods=["POST"])
def predict():
    data = request.get_json()  # Get input data as JSON
    prediction = model.predict(np.array(data).reshape(1, -1))
    return jsonify({"prediction": prediction.tolist()})

# Run the server
if __name__ == "__main__":
    app.run(debug=True)

Running the Flask Server

Run the script with:

python filename.py

Access the endpoint:

POST http://127.0.0.1:5000/predict

3. Deploying an ML Model Using Django

Django is a full-stack web framework that includes built-in security and database integration. While Flask is more lightweight, Django is useful for large-scale applications.

Installation

pip install django joblib numpy djangorestframework

Steps to Deploy an ML Model with Django

  1. Create a new Django project and app:

    django-admin startproject ml_project
    cd ml_project
    django-admin startapp ml_api
    
  2. Add ml_api to INSTALLED_APPS in ml_project/settings.py:

    INSTALLED_APPS = [
        ...
        'rest_framework',
        'ml_api',
    ]
    
  3. Create API Endpoint in ml_api/views.py:

    from django.http import JsonResponse
    from rest_framework.decorators import api_view
    import joblib
    import numpy as np
    
    # Load trained ML model
    model = joblib.load("model.joblib")
    
    @api_view(["POST"])
    def predict(request):
        data = request.data.get("data")
        prediction = model.predict(np.array(data).reshape(1, -1))
        return JsonResponse({"prediction": prediction.tolist()})
    
  4. Define URL Path in ml_api/urls.py:

    from django.urls import path
    from .views import predict
    
    urlpatterns = [
        path("predict/", predict, name="predict"),
    ]
    
  5. Include API URLs in ml_project/urls.py:

    from django.contrib import admin
    from django.urls import path, include
    
    urlpatterns = [
        path("admin/", admin.site.urls),
        path("api/", include("ml_api.urls")),
    ]
    
  6. Run Django Server:

    python manage.py runserver
    
    • The API will be available at: http://127.0.0.1:8000/api/predict/
    • Send a POST request with JSON input:

      {"data": [5.1, 3.5, 1.4, 0.2]}
      
      

2. Cloud Deployment

For large-scale applications, deploying machine learning (ML) models on the cloud provides scalability, security, and performance optimization. Cloud platforms allow developers to train, deploy, and serve ML models without managing hardware resources, making it easy to handle high workloads and integrate with existing applications.

Popular Cloud Platforms for ML Deployment

Several cloud platforms provide specialized services for deploying machine learning models:

  1. AWS SageMaker – Amazon's fully managed service for building, training, and deploying ML models.
  2. Google AI Platform (Vertex AI) – Google's cloud-based ML service for training and serving models at scale.
  3. Azure Machine Learning – Microsoft's cloud service for building, deploying, and monitoring ML models.

Let’s explore each platform in detail with practical deployment examples.

1. Deploying ML Models on AWS SageMaker

Why Use AWS SageMaker?

  1.  Fully managed ML service for training, tuning, and deploying models.
  2. Supports multiple ML frameworks like TensorFlow, PyTorch, and Scikit-learn.
  3. Offers built-in security and auto-scaling for high-performance workloads.

Steps to Deploy a Model on AWS SageMaker

  1. Train a model locally and save it as a joblib or pickle file:

    import joblib
    from sklearn.ensemble import RandomForestClassifier
    
    # Sample training
    model = RandomForestClassifier()
    model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0])
    
    # Save the trained model
    joblib.dump(model, "model.joblib")
    
  2. Upload the model to an Amazon S3 bucket:

    aws s3 cp model.joblib s3://your-bucket-name/
    
  3. Create an AWS SageMaker model using the S3 path:

    import boto3
    
    sagemaker = boto3.client("sagemaker")
    
    response = sagemaker.create_model(
        ModelName="MyMLModel",
        PrimaryContainer={
            "Image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/sklearn-inference:latest",
            "ModelDataUrl": "s3://your-bucket-name/model.joblib",
        },
        ExecutionRoleArn="arn:aws:iam::123456789012:role/SageMakerRole"
    )
    
  4. Deploy the model as an endpoint:

    response = sagemaker.create_endpoint(
        EndpointName="MyMLModelEndpoint",
        EndpointConfigName="MyMLModelConfig"
    )
    
  5. Make predictions using the deployed model:

    import requests
    
    response = requests.post(
        "https://your-endpoint-url.amazonaws.com",
        json={"data": [[5.1, 3.5, 1.4, 0.2]]}
    )
    
    print(response.json())
    

2. Deploying ML Models on Google AI Platform (Vertex AI)

Google AI Platform (Vertex AI) provides a serverless environment for deploying ML models with auto-scaling capabilities.

Why Use Google AI Platform?

  1.  Supports TensorFlow, Scikit-learn, and PyTorch models.
  2. Easy integration with Google Cloud Storage (GCS).
  3. Auto-scaling and real-time prediction support.

Steps to Deploy a Model on Google AI Platform

  1. Save your trained model using TensorFlow or Scikit-learn:

    import joblib
    from sklearn.ensemble import RandomForestClassifier
    
    model = RandomForestClassifier()
    model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0])
    joblib.dump(model, "model.joblib")
    
  2. Upload the model to Google Cloud Storage (GCS):

    gsutil cp model.joblib gs://your-bucket-name/
    
  3. Create a model on Vertex AI using the GCS path:

    gcloud ai models create my-ml-model --region=us-central1
    
  4. Deploy the model as a prediction endpoint:

    gcloud ai endpoints create --display-name=my-ml-endpoint --region=us-central1
    
  5. Make predictions using the deployed model:

    from google.cloud import aiplatform
    
    endpoint = aiplatform.Endpoint("projects/YOUR_PROJECT_ID/locations/us-central1/endpoints/YOUR_ENDPOINT_ID")
    
    response = endpoint.predict(instances=[[5.1, 3.5, 1.4, 0.2]])
    print(response)
    

3. Deploying ML Models on Azure Machine Learning

Azure Machine Learning (Azure ML) is Microsoft's cloud-based ML platform for training and deploying models.

Why Use Azure ML?

  1.  Integrated with Microsoft tools like Power BI and Azure DevOps.
  2. Supports automated machine learning (AutoML).
  3. Provides secure and scalable inference endpoints.

Steps to Deploy a Model on Azure ML

  1. Save the trained model:

    import joblib
    from sklearn.ensemble import RandomForestClassifier
    
    model = RandomForestClassifier()
    model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0])
    joblib.dump(model, "model.joblib")
    
  2. Register the model on Azure ML:

    az ml model register --name my-ml-model --path model.joblib --workspace-name your-workspace
    
  3. Create an Azure ML endpoint:

    az ml online-endpoint create --name my-ml-endpoint --auth-mode key
    
  4. Deploy the model to the endpoint:

    az ml online-deployment create --name my-ml-deployment --endpoint-name my-ml-endpoint --model my-ml-model
    
  5. Make predictions using the endpoint:

    import requests
    
    response = requests.post(
        "https://your-endpoint-url.azurewebsites.net/predict",
        json={"data": [[5.1, 3.5, 1.4, 0.2]]}
    )
    
    print(response.json())
    

 

3. Edge Deployment

  • In edge computing, models run directly on mobile devices, IoT devices, or embedded systems instead of cloud servers.
  • Useful for applications where low latency and offline functionality are required (e.g., voice assistants, image recognition on phones).

Popular Tools for Edge Deployment

ToolBest ForPlatform Support
TensorFlow LiteMobile & embedded AI modelsAndroid, IoT
CoreMLiOS & macOS applicationsApple devices
ONNX RuntimeCross-platform optimizationWindows, Linux, Android, iOS

1. TensorFlow Lite – Mobile Deployment (Android & IoT)

TensorFlow Lite (TFLite) is an optimized version of TensorFlow for mobile and edge devices. It reduces model size and improves inference speed on limited hardware.

Example: Converting a TensorFlow Model to TFLite

import tensorflow as tf

# Load the trained model
model = tf.keras.models.load_model("model.h5")

# Convert the model to TensorFlow Lite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the TFLite model
with open("model.tflite", "wb") as file:
    file.write(tflite_model)

print("Model converted successfully!")

Using the Converted Model in an Android App

import org.tensorflow.lite.Interpreter;

Interpreter tflite = new Interpreter(loadModelFile());

float[][] input = {{5.1f, 3.5f, 1.4f, 0.2f}};
float[][] output = new float[1][1];

tflite.run(input, output);
System.out.println("Prediction: " + output[0][0]);

📌 TensorFlow Lite reduces model size and improves inference speed on mobile and IoT devices.

2. CoreML – iOS Deployment (Apple Devices)

CoreML is Apple’s framework for running machine learning models directly on iOS/macOS devices, optimized for efficiency.

Example: Converting a TensorFlow Model to CoreML

import coremltools as ct
import tensorflow as tf

# Load the trained Keras model
model = tf.keras.models.load_model("model.h5")

# Convert to CoreML format
mlmodel = ct.convert(model)

# Save the CoreML model
mlmodel.save("model.mlmodel")

print("Model converted successfully!")

Using the CoreML Model in an iOS App (Swift)

import CoreML

let model = try? Model()  // Load the CoreML model
let input = try? MLMultiArray(shape: [4], dataType: .float32)

// Assign values to the input array
input?[0] = 5.1
input?[1] = 3.5
input?[2] = 1.4
input?[3] = 0.2

if let prediction = try? model?.prediction(input: input!) {
    print("Prediction: \(prediction.output)")
}

📌 CoreML allows seamless ML model integration with iOS/macOS applications, enhancing user experience with AI-powered features.

3. ONNX Runtime – Cross-Platform Edge Deployment

ONNX (Open Neural Network Exchange) is a framework-independent format that allows models trained in TensorFlow, PyTorch, and Scikit-learn to run on multiple platforms, including Windows, Linux, Android, and iOS.

Example: Converting a TensorFlow Model to ONNX

import tf2onnx
import tensorflow as tf

# Load the trained model
model = tf.keras.models.load_model("model.h5")

# Convert the model to ONNX format
onnx_model, _ = tf2onnx.convert.from_keras(model)

# Save the ONNX model
with open("model.onnx", "wb") as file:
    file.write(onnx_model.SerializeToString())

print("Model converted successfully!")

Using the ONNX Model for Inference (Python)

import onnxruntime as ort
import numpy as np

# Load the ONNX model
session = ort.InferenceSession("model.onnx")

# Prepare input data
input_data = np.array([[5.1, 3.5, 1.4, 0.2]], dtype=np.float32)

# Run inference
outputs = session.run(None, {"input": input_data})
print("Prediction:", outputs[0])

📌 ONNX allows models to be deployed across different devices with optimized performance.

Purnima
0

You must logged in to post comments.

Related Blogs

Machine Learning February 02 ,2025
Model Monitoring and...
Machine Learning February 02 ,2025
Staying Updated with...
Machine Learning February 02 ,2025
Career Paths in Mach...
Machine Learning February 02 ,2025
Transparency and Int...
Machine Learning February 02 ,2025
Bias and Fairness in...
Machine Learning February 02 ,2025
Ethical Consideratio...
Machine Learning February 02 ,2025
Case Studies and Ind...
Machine Learning February 02 ,2025
Introduction to ML T...
Machine Learning February 02 ,2025
Building a Machine L...
Machine Learning February 02 ,2025
Gradient Boosting in...
Machine Learning February 02 ,2025
AdaBoost for Regres...
Machine Learning February 02 ,2025
Gradient Boosting fo...
Machine Learning February 02 ,2025
Random Forest for Re...
Machine Learning February 02 ,2025
Step-wise Python Imp...
Machine Learning February 02 ,2025
Step-wise Python Imp...
Machine Learning February 02 ,2025
Transfer Learning in...
Machine Learning February 02 ,2025
AdaBoost: A Powerful...
Machine Learning February 02 ,2025
Cross Validation in...
Machine Learning February 02 ,2025
Hyperparameter Tunin...
Machine Learning February 02 ,2025
Model Evaluation and...
Machine Learning February 02 ,2025
Model Evaluation and...
Machine Learning January 01 ,2025
(Cross-validation, C...
Machine Learning January 01 ,2025
Splitting Data into...
Machine Learning January 01 ,2025
Data Normalization a...
Machine Learning January 01 ,2025
Feature Engineering...
Machine Learning January 01 ,2025
Handling Missing Dat...
Machine Learning January 01 ,2025
Understanding Data T...
Machine Learning December 12 ,2024
Brief introduction o...
Get In Touch

123 Street, New York, USA

+012 345 67890

techiefreak87@gmail.com

© Design & Developed by HW Infotech