Basic deployment of a Tensorflow model on Google Kubernetes Engine with Tensorflow Serving

Arslan Ashraf

March 2024

In this guide we will package up a (trivial) Tensorflow model in a Docker image and deploy it on Google Kubernetes Engine (GKE). The purpose of this guide is to demonstrate model deployement on Kubernetes. All of the following commands are intended to run on Google Cloud Platform (GCP).

The model is trained to learn the function f(x) = 3/2 * (x^5) + 5 on the domain [-1, 1]. So the input is a single number as is the output. The demonstration in this guide will be done on GCP's cloud shell environment that comes preinstalled with the Google Cloud CLI as well as kubectl. We will assume the reader is already familiar with the basics of Kubernetes and Tensorflow.

Our first step will be to create a a repository that will house our Docker image on Google Artifacts Repository (GAR). We will export some variables and enable the required Google APIs.

# export the PROJECT environment variable as we will use it later

export PROJECT=$(gcloud config list project --format "value(core.project)")

Below, we create an artifacts repository on Google Artifacts Registry (GAR).

# enable the Artifacts Registry API

gcloud services enable artifactregistry.googleapis.com

# export the REPOSITORY_NAME environment variable

export REPOSITORY_NAME=gar-model-serving-repo

gcloud artifacts repositories create $REPOSITORY_NAME \

--repository-format=docker \

--location=us-east1 \

--description="Repository for storing model serving image" \

--project=$PROJECT

# verify that the artifacts repository was created

gcloud artifacts repositories list

Next, we will build the Docker image and push it to GAR.

See the Dockerfile

# export the image uri environment variable

export IMAGE_URI=us-east1-docker.pkg.dev/$PROJECT/$REPOSITORY_NAME/

model-server

# build the image containing the tensorflow model

docker build -t $IMAGE_URI .

# as a sanity check to ensure the Docker image works as expected

# we will run the image and send a curl request for inference

docker run --rm -it -p 8501:8501 $IMAGE_URI

# the model takes in a single input like this: [0.7], and model name is "basic_model"

curl -d '{"instances": [[0.7], [-0.99]]}' -X POST \

http://localhost:8501/v1/models/basic_model:predict

# output should be

{ "predictions": [[5.34325171], [3.96698904]] }

# let's push this image to Artifact Registry

docker push $IMAGE_URI

With the Docker image on Artifact Registry, we will spin up a Google managed Kubernetes cluster. For simplicity, we will only set up one small instance in the cluster.

# enable the compute and container APIs

gcloud services enable compute.googleapis.com

gcloud services enable container.googleapis.com

# set the default compute zone

gcloud config set compute/zone us-east1-b

# export GKE cluster name variable

export GKE_CLUSTER_NAME=gke-serving-tensorflow-model

# create GKE cluster, note that this takes a few minutes

gcloud container clusters create $GKE_CLUSTER_NAME \

--num-nodes=1 \

--machine-type=e2-small

# verify which compute instances are running

gcloud compute instances list

# when you're finished remember to clean up, one can delete a GKE cluster as follows

gcloud container clusters delete $GKE_CLUSTER_NAME

Once the Kubernetes cluster is up, we can build the necessary Kubernetes objects to deploy our model. We will write two yaml files, one for the Deployment object called deployment.yaml and one for the Service object called service.yaml.

See the deployment.yaml file See the service.yaml file

Now let's apply these files.

kubectl apply -f deployment.yaml

kubectl apply -f service.yaml

We should verify now that our Kubernetes Deployment object has been created with three pods, as we specified in the deployment.yaml file.

Let's also verify that our Service object was created and Kubernetes is indeed serving our Tensorflow model. Head over to the tab in the left pane, Gateways, Services & Ingress and under the Services tab, we should see our LoadBalander object. We also see, under Endpoints, the IP address on which the service object is serving requests.

Let's send a curl request to the IP address listed and see if we get our predictions. Notice that this time we don't use localhost, instead we use the IP address the Kubernetes service object is exposing.

curl -d '{"instances": [[0.7], [-0.99]]}' -X POST \

http://34.75.33.96:8501/v1/models/basic_model:predict

It works! Since this was a demo, let's shut down our resources. First shut down the Kubernetes cluster.

gcloud container clusters delete $GKE_CLUSTER_NAME

Next, delete the artifact repository, with the following command:

gcloud artifacts repositories delete $REPOSITORY_NAME --location=us-east1