Note Access to nlp documentation here
Last updated December 28th 2023.
Note The true scope of this project involves the full implementation and integration of the classifier into a website. This repository details the backend side of machine learning.
This machine learning (ML) project was created for the purpose of deploying a ML-based API model by using the cloud provider Render to set up the environment and host the docker container. The swagger documentation of the API is available here for viewing.
The objective of this repository is to use a prediction model—that will accurately classify texts as spam—using the FastAPI framework and serve as an educational experience. Happy coding!
The original dataset can be found here.
Photo by Hannes Johnson on Unsplash
The "Spam Detection backend" project aims to provide a brief outline of Machine Learning Operations (MLOps) by focusing on certain steps necessary to deploy your ML model. Below I have provided helpful documentation that allowed me to complete this project.
-
Machine Learning Mastery: Save and Load ML Models in Python
This article you will discover how to save and load your model with pickle or joblib. You will then be able to reuse your saved file to make predictions at this stage.
-
Integrating ML classifier with FastAPI
Explanation of the overall backend pipeline for ML models. Granted, I did not follow this article much but the
main.py
file provides a simple overview of how integrating your saved model with FastAPI looks like. -
Building a Machine Learning API in 15 Minutes
Very useful video on how an API project may be deployed! Additionally, you can run the application with
uvicorn app:app --reload
at this stage. -
FastAPI in Containers - Docker
Another helpful tutorial that demonstrates the purpose of Docker and details how to create a Dockerfile. You can build the Docker image and start the container at this stage.
-
Share the Application - Docker
After building your Docker image, you can share it with Docker Hub. The purpose of sharing allows for easy integration into a cloud environment and demonstrates the portablility of containers. You can run your application on a hosted site at this stage.
Note: You may ignore this section if you only interested in deploying the model. The commands below can be copied and run in your terminal to easily simulate my project environment.
To set up the project environment locally, follow these steps:
- Cloning the Repository
git clone https://github.com/weezymatt/Spam-Detection-backend.git
cd spam-detection-backend
- Setting up Virtual Environment
- Windows
python -m venv <virtual-environment-name> venv\Scripts\activate
- Linux and MacOS
python3 -m venv <virtual-environment-name> source env/bin/activate
-
Install the Required Dependencies
The virtual environment will make use of its own pip, so you don't need to use pip3.
pip install -r requirements.txt
Note: You may ignore this section and go to Deployment if you're knowledgeable about APIs. The Jupyter Notebook is useful for using your model to test predictions before makig the app.py file.
There are a few steps required to build your FastAPI app and capture the essence of your model. Here we briefly discuss how to write the app.py
code and the Dockerfile
.
Following the initialization of your virtual environment, we will write the app.py
file and initialize an instance of FastAPI.
app = FastAPI(title="Ham or Spam API", description="API to predict SMS spam")
Load the saved models with joblib. A vectorizer is loaded to follow the same processing steps in the Jupyter Notebook.
model = joblib.load("model/finalized_model.sav")
vectorizer = joblib.load("model/vectorizer.sav")
Define the data format for incoming input.
class request_body(BaseModel):
message : str # A free service for you ONLY!! Please click on the link now! // String value
Process the input sent by the user.
def process_msg(msg):
"""
Replace email address with 'email'
Replace URLS with 'http'
Replace currency symbols with 'moneysymb'
Replace phone numbers with 'phonenumb'
Replace numbers with 'numb'
"""
...
return clean_input
Define the GET
method.
@app.get('/')
def Welcome():
return{'message': 'Welcome to the Spam classifier API!'}
Create the POST
method (this is the meat of your API).
@app.post('/api_predict')
def classify_msg(msg : request_body):
if (not (msg.message)):
raise HTTPException(status_code=400, detail="Please provide a valid message")
# Process the message to fit with the model
dense = process_msg(msg.message)
# classification results
label = model.predict(dense)[0]
# proba = model.predict_proba(dense) // check again after test
# extract the corresponding information
if label == 0:
return {'Answer': "This is a Ham email!"}
else:
return {'Answer': "This is a Spam email!"}
Realistically you have a virtual environment ready and install your dependencies throughout the project then freeze them into a text file.
The requirements.txt
file enables us to recreate all the modules necessary for our application. This is crucial when we write our Dockerfile
later on.
pip freeze > requirements.txt
Deactivate your virtual environment.
deactivate
Here we develop the deployment in stages until we reach the container step where we are able to display the webpage.
-
Open the terminal and navigate to the directory where your
app.py
file is located. -
Run the FastAPI application by using the uvicorn command, specifying the application name. The
--reload
feature is useful for changes to be automatically reflected.uvicorn <application-file>:app --reload
-
After running the uvicorn command, the FastAPI application is up and running on the specific address (i.e. http://localhost:8000) listed. This address represents the API endpoint where we can access our application. We will see the importance of this endpoint during the front-end part of the project.
-
You may open your browser to interact with your deployed FastAPI application. The endpoint acts as an intermediary between requests and responses (Press CTRL+C to quit).
The FastAPI Documentation details the available endpoints, JSON request and response formats, and information specified in your app.py
file. You can access this documentation by adding the /docs to the server (http://localhost:8000/docs).
When deploying an API a common approach is to build a container image. We will be needing to write a Dockerfile for the application.
Dependency Issue: For the Docker container to properly run, an additional file initializing the NLTK stopwords was incorporated into the workflow. This may not be necessary in your process.
FROM python:3.11.5
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
COPY ./initialize.py /code/initialize.py
RUN python3 /code/initialize.py
COPY . .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]
If you are using an ARM-based Mac with Apple Silicon, you will need to rebuild the image to be compatible and push the new image to your repository on Docker Hub. Otherwise the process is rather straightforward. Click here for a solution on Stack Overflow.
Let's build the container image. Use the docker tag
command to give the image a new name.
docker tag <image-title> YOUR-USERNAME/<dockerhub-repo>
Switch to a new driver before your build (we will be following the process for an M1+).
docker buildx create --use
Launch the following command to build your Docker image.
docker buildx build --platform linux/amd64,linux/arm64 -t <tag> .
Push the image to Docker Hub.
docker buildx build --push --platform linux/amd64,linux/arm64 -t docker.io/YOUR-USERNAME/<dockerhub-repo>:latest .
You may run your image to verify it is working and visit the server (http://localhost:8000/docs).
docker run -d --name mycontainer -p 8000:80 <image-title>
There you have it! You can use your saved image on Docker Hub with your cloud environment of choice and start the next step of your application. For the second part of this project involving the front-end piece, please click here.