Deployment Options
Once the model is serialized, it can be deployed using different approaches, depending on the use case.
1. Deployment via REST APIs
Deploying Machine Learning Models via REST APIs
Deploying a machine learning (ML) model as a REST API enables applications, users, or other services to send input data and receive model predictions over HTTP requests. This approach makes ML models accessible and scalable, allowing real-time inference in web applications, mobile apps, or automated pipelines.
Why Deploy ML Models as REST APIs?
- Accessibility: Users can interact with the model without needing to understand the internal ML code.
- Scalability: APIs allow models to be served to multiple clients simultaneously.
- Flexibility: ML APIs can be integrated into various applications, including web apps, mobile apps, and IoT devices.
Common Frameworks for Building ML APIs
Several Python frameworks are used to deploy ML models as APIs:
- FastAPI – Lightweight, fast, and ideal for high-performance applications.
- Flask – Simple and widely used, best suited for small projects.
- Django – Robust and structured, great for large applications with built-in security features.
1. Deploying an ML Model Using FastAPI
FastAPI is a modern, high-performance web framework for building APIs with Python. It is asynchronous by design and significantly faster than Flask.
Installation
pip install fastapi uvicorn joblib numpy
Example: Deploying an ML Model Using FastAPI
from fastapi import FastAPI
import joblib
import numpy as np
# Initialize FastAPI app
app = FastAPI()
# Load trained ML model
model = joblib.load("model.joblib")
# Define prediction endpoint
@app.post("/predict/")
def predict(data: list):
prediction = model.predict(np.array(data).reshape(1, -1))
return {"prediction": prediction.tolist()}
# Run using: uvicorn filename:app --reload
Running the FastAPI Server
Run the following command to start the server:
uvicorn filename:app --reload
FastAPI provides automatic interactive API documentation at:
- Swagger UI: http://127.0.0.1:8000/docs
- Redoc: http://127.0.0.1:8000/redoc
2. Deploying an ML Model Using Flask
Flask is a lightweight and widely used framework for creating APIs. It is easy to set up and well-suited for small-scale applications.
Installation
pip install flask joblib numpy
Example: Deploying an ML Model Using Flask
from flask import Flask, request, jsonify
import joblib
import numpy as np
# Initialize Flask app
app = Flask(__name__)
# Load trained model
model = joblib.load("model.joblib")
# Define prediction endpoint
@app.route("/predict", methods=["POST"])
def predict():
data = request.get_json() # Get input data as JSON
prediction = model.predict(np.array(data).reshape(1, -1))
return jsonify({"prediction": prediction.tolist()})
# Run the server
if __name__ == "__main__":
app.run(debug=True)
Running the Flask Server
Run the script with:
python filename.py
Access the endpoint:
POST http://127.0.0.1:5000/predict
3. Deploying an ML Model Using Django
Django is a full-stack web framework that includes built-in security and database integration. While Flask is more lightweight, Django is useful for large-scale applications.
Installation
pip install django joblib numpy djangorestframework
Steps to Deploy an ML Model with Django
Create a new Django project and app:
django-admin startproject ml_project cd ml_project django-admin startapp ml_api
Add ml_api to INSTALLED_APPS in ml_project/settings.py:
INSTALLED_APPS = [ ... 'rest_framework', 'ml_api', ]
Create API Endpoint in ml_api/views.py:
from django.http import JsonResponse from rest_framework.decorators import api_view import joblib import numpy as np # Load trained ML model model = joblib.load("model.joblib") @api_view(["POST"]) def predict(request): data = request.data.get("data") prediction = model.predict(np.array(data).reshape(1, -1)) return JsonResponse({"prediction": prediction.tolist()})
Define URL Path in ml_api/urls.py:
from django.urls import path from .views import predict urlpatterns = [ path("predict/", predict, name="predict"), ]
Include API URLs in ml_project/urls.py:
from django.contrib import admin from django.urls import path, include urlpatterns = [ path("admin/", admin.site.urls), path("api/", include("ml_api.urls")), ]
Run Django Server:
python manage.py runserver
- The API will be available at: http://127.0.0.1:8000/api/predict/
Send a POST request with JSON input:
{"data": [5.1, 3.5, 1.4, 0.2]}
2. Cloud Deployment
For large-scale applications, deploying machine learning (ML) models on the cloud provides scalability, security, and performance optimization. Cloud platforms allow developers to train, deploy, and serve ML models without managing hardware resources, making it easy to handle high workloads and integrate with existing applications.
Popular Cloud Platforms for ML Deployment
Several cloud platforms provide specialized services for deploying machine learning models:
- AWS SageMaker – Amazon's fully managed service for building, training, and deploying ML models.
- Google AI Platform (Vertex AI) – Google's cloud-based ML service for training and serving models at scale.
- Azure Machine Learning – Microsoft's cloud service for building, deploying, and monitoring ML models.
Let’s explore each platform in detail with practical deployment examples.
1. Deploying ML Models on AWS SageMaker
Why Use AWS SageMaker?
- Fully managed ML service for training, tuning, and deploying models.
- Supports multiple ML frameworks like TensorFlow, PyTorch, and Scikit-learn.
- Offers built-in security and auto-scaling for high-performance workloads.
Steps to Deploy a Model on AWS SageMaker
Train a model locally and save it as a joblib or pickle file:
import joblib from sklearn.ensemble import RandomForestClassifier # Sample training model = RandomForestClassifier() model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]) # Save the trained model joblib.dump(model, "model.joblib")
Upload the model to an Amazon S3 bucket:
aws s3 cp model.joblib s3://your-bucket-name/
Create an AWS SageMaker model using the S3 path:
import boto3 sagemaker = boto3.client("sagemaker") response = sagemaker.create_model( ModelName="MyMLModel", PrimaryContainer={ "Image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/sklearn-inference:latest", "ModelDataUrl": "s3://your-bucket-name/model.joblib", }, ExecutionRoleArn="arn:aws:iam::123456789012:role/SageMakerRole" )
Deploy the model as an endpoint:
response = sagemaker.create_endpoint( EndpointName="MyMLModelEndpoint", EndpointConfigName="MyMLModelConfig" )
Make predictions using the deployed model:
import requests response = requests.post( "https://your-endpoint-url.amazonaws.com", json={"data": [[5.1, 3.5, 1.4, 0.2]]} ) print(response.json())
2. Deploying ML Models on Google AI Platform (Vertex AI)
Google AI Platform (Vertex AI) provides a serverless environment for deploying ML models with auto-scaling capabilities.
Why Use Google AI Platform?
- Supports TensorFlow, Scikit-learn, and PyTorch models.
- Easy integration with Google Cloud Storage (GCS).
- Auto-scaling and real-time prediction support.
Steps to Deploy a Model on Google AI Platform
Save your trained model using TensorFlow or Scikit-learn:
import joblib from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier() model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]) joblib.dump(model, "model.joblib")
Upload the model to Google Cloud Storage (GCS):
gsutil cp model.joblib gs://your-bucket-name/
Create a model on Vertex AI using the GCS path:
gcloud ai models create my-ml-model --region=us-central1
Deploy the model as a prediction endpoint:
gcloud ai endpoints create --display-name=my-ml-endpoint --region=us-central1
Make predictions using the deployed model:
from google.cloud import aiplatform endpoint = aiplatform.Endpoint("projects/YOUR_PROJECT_ID/locations/us-central1/endpoints/YOUR_ENDPOINT_ID") response = endpoint.predict(instances=[[5.1, 3.5, 1.4, 0.2]]) print(response)
3. Deploying ML Models on Azure Machine Learning
Azure Machine Learning (Azure ML) is Microsoft's cloud-based ML platform for training and deploying models.
Why Use Azure ML?
- Integrated with Microsoft tools like Power BI and Azure DevOps.
- Supports automated machine learning (AutoML).
- Provides secure and scalable inference endpoints.
Steps to Deploy a Model on Azure ML
Save the trained model:
import joblib from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier() model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]) joblib.dump(model, "model.joblib")
Register the model on Azure ML:
az ml model register --name my-ml-model --path model.joblib --workspace-name your-workspace
Create an Azure ML endpoint:
az ml online-endpoint create --name my-ml-endpoint --auth-mode key
Deploy the model to the endpoint:
az ml online-deployment create --name my-ml-deployment --endpoint-name my-ml-endpoint --model my-ml-model
Make predictions using the endpoint:
import requests response = requests.post( "https://your-endpoint-url.azurewebsites.net/predict", json={"data": [[5.1, 3.5, 1.4, 0.2]]} ) print(response.json())
3. Edge Deployment
- In edge computing, models run directly on mobile devices, IoT devices, or embedded systems instead of cloud servers.
- Useful for applications where low latency and offline functionality are required (e.g., voice assistants, image recognition on phones).
Popular Tools for Edge Deployment
Tool | Best For | Platform Support |
---|---|---|
TensorFlow Lite | Mobile & embedded AI models | Android, IoT |
CoreML | iOS & macOS applications | Apple devices |
ONNX Runtime | Cross-platform optimization | Windows, Linux, Android, iOS |
1. TensorFlow Lite – Mobile Deployment (Android & IoT)
TensorFlow Lite (TFLite) is an optimized version of TensorFlow for mobile and edge devices. It reduces model size and improves inference speed on limited hardware.
Example: Converting a TensorFlow Model to TFLite
import tensorflow as tf
# Load the trained model
model = tf.keras.models.load_model("model.h5")
# Convert the model to TensorFlow Lite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the TFLite model
with open("model.tflite", "wb") as file:
file.write(tflite_model)
print("Model converted successfully!")
Using the Converted Model in an Android App
import org.tensorflow.lite.Interpreter;
Interpreter tflite = new Interpreter(loadModelFile());
float[][] input = {{5.1f, 3.5f, 1.4f, 0.2f}};
float[][] output = new float[1][1];
tflite.run(input, output);
System.out.println("Prediction: " + output[0][0]);
📌 TensorFlow Lite reduces model size and improves inference speed on mobile and IoT devices.
2. CoreML – iOS Deployment (Apple Devices)
CoreML is Apple’s framework for running machine learning models directly on iOS/macOS devices, optimized for efficiency.
Example: Converting a TensorFlow Model to CoreML
import coremltools as ct
import tensorflow as tf
# Load the trained Keras model
model = tf.keras.models.load_model("model.h5")
# Convert to CoreML format
mlmodel = ct.convert(model)
# Save the CoreML model
mlmodel.save("model.mlmodel")
print("Model converted successfully!")
Using the CoreML Model in an iOS App (Swift)
import CoreML
let model = try? Model() // Load the CoreML model
let input = try? MLMultiArray(shape: [4], dataType: .float32)
// Assign values to the input array
input?[0] = 5.1
input?[1] = 3.5
input?[2] = 1.4
input?[3] = 0.2
if let prediction = try? model?.prediction(input: input!) {
print("Prediction: \(prediction.output)")
}
📌 CoreML allows seamless ML model integration with iOS/macOS applications, enhancing user experience with AI-powered features.
3. ONNX Runtime – Cross-Platform Edge Deployment
ONNX (Open Neural Network Exchange) is a framework-independent format that allows models trained in TensorFlow, PyTorch, and Scikit-learn to run on multiple platforms, including Windows, Linux, Android, and iOS.
Example: Converting a TensorFlow Model to ONNX
import tf2onnx
import tensorflow as tf
# Load the trained model
model = tf.keras.models.load_model("model.h5")
# Convert the model to ONNX format
onnx_model, _ = tf2onnx.convert.from_keras(model)
# Save the ONNX model
with open("model.onnx", "wb") as file:
file.write(onnx_model.SerializeToString())
print("Model converted successfully!")
Using the ONNX Model for Inference (Python)
import onnxruntime as ort
import numpy as np
# Load the ONNX model
session = ort.InferenceSession("model.onnx")
# Prepare input data
input_data = np.array([[5.1, 3.5, 1.4, 0.2]], dtype=np.float32)
# Run inference
outputs = session.run(None, {"input": input_data})
print("Prediction:", outputs[0])
📌 ONNX allows models to be deployed across different devices with optimized performance.