Deploying a Machine Learning model

3 min readAug 18, 2023

Once a model is trained how can we deploy the model for inference . Here I am giving a simple example. In production environment this will be a little different based on the the topology of deployment. But in essence the concept would be same.

ML inference deployment topology example

A typical production environment topology is as above. The client here is a browser or any client device requesting a inference service. The request to a nginx like load balancer service and this gets routed to one of the microservice which runs a ML model rest api. Which does the inference and returns the request to the client.

This above topology is a bare metal scalable inference topology. There are variants of this based on how flexible or complex the system need to be. The microservices could be running on a dockerized environment and this could be running on a Kubernetes cluster. As the load increases the Kubernetes cluster can spin of hundreds or millions of such docker containers. I am not going to elaborate on this topology.

In this document I am going to elaborate more on a simple webservice running behind a flask application serving the inference request.

Here I am taking the widely popular iris dataset create a model out of it after training. Once its trained model is ready its converted into serializable format using the pickle package. This saved model file is what is used for ML deployment. Lets see the python code which does this serialization.

import os
import pandas as pd
from sklearn import linear_model
from sklearn import datasets
import pickle
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection


iris = sklearn.datasets.load_iris()

train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(iris.data, iris.target, train_size=0.90)

# creating and saving some model
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train, labels_train)
pickle.dump(rf, open('iris.pkl', 'wb'))

Here the iris.pkl file is the trained model file in a serialized format. Now this model is used for deployment using a flask application as below

import os
from flask import Flask
from flask import request
from flask import jsonify
import pandas as pd
from sklearn import linear_model
from sklearn import datasets
import pickle
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection

model = pickle.load(open('iris.pkl', 'rb'))

app = Flask(__name__)

@app.route('/isAlive')
def index():
    return "true"

@app.route('/predict', methods=['POST'])
def get_prediction():


    # Get the data from the POST request.
    data = request.get_json(force=True)
    # Make prediction using model loaded from disk as per the data.
    predict_request=[[data['sl'],data['sw'],data['pl'],data['pw']]]
    predict_request=np.array(predict_request)
    print(predict_request)
    prediction = model.predict(predict_request)
    print(prediction)
    # Take the first value of prediction
    output = prediction[0]
    print(output)
    return jsonify(int(output))


if __name__ == '__main__':
    #if os.environ['ENVIRONMENT'] == 'production':
        app.run(port=80,host='0.0.0.0')

This application will listen on port 80 for a REST API call which is a inference request for our iris.pkl model file. Now lets focus on how the client application is written.

import requests
import json
url="http://localhost:80/predict"
data=json.dumps({'sl':3.2,'sw':7.3,'pl':4.5,'pw':2.1})
r=requests.post(url,data)
print(r.json())

Here you can see that sepal length (3.2) , sepal width (7.3) , petal length (4.5) , petal width (2.1) is passed as a REST api call to the listening server application thru port 80 of local host (in this case I am running the server and client on same device , in reality the URL here would be the actual server in which the server is listening) . The server does the inference and replies back and classifies the plant to be 2 and 2 is returned to the client.

Now see the output of both client and server command prompt.

client application output is given as below

Here the client got the answer of classification group 2.

This is a simple example of a ML model inference is done.

Deploying a Machine Learning model

Written by Prasad