Basic deployment of a Tensorflow model on Google Kubernetes Engine with Tensorflow Serving
Arslan Ashraf
March 2024
In this guide we will package up a (trivial) Tensorflow model in a Docker image and deploy it on Google Kubernetes Engine (GKE). The purpose of this guide is to demonstrate model deployement on Kubernetes. All of the following commands are intended to run on Google Cloud Platform (GCP).
The model is trained to learn the function f(x) = 3/2 * (x^5) + 5 on the domain [-1, 1]. So the input is a single number as is the output. The demonstration in this guide will be done on GCP's cloud shell environment that comes preinstalled with the Google Cloud CLI as well as kubectl. We will assume the reader is already familiar with the basics of Kubernetes and Tensorflow.
Our first step will be to create a a repository that will house our Docker image on Google Artifacts Repository (GAR). We will export some variables and enable the required Google APIs.
Below, we create an artifacts repository on Google Artifacts Registry (GAR).
Next, we will build the Docker image and push it to GAR.
With the Docker image on Artifact Registry, we will spin up a Google managed Kubernetes cluster. For simplicity, we will only set up one small instance in the cluster.
Once the Kubernetes cluster is up, we can build the necessary Kubernetes objects to deploy our model. We will write two yaml files, one for the Deployment object called deployment.yaml and one for the Service object called service.yaml.
Now let's apply these files.
We should verify now that our Kubernetes Deployment object has been created with three pods, as we specified in the deployment.yaml file.
Let's also verify that our Service object was created and Kubernetes is indeed serving our Tensorflow model. Head over to the tab in the left pane, Gateways, Services & Ingress and under the Services tab, we should see our LoadBalander object. We also see, under Endpoints, the IP address on which the service object is serving requests.
Let's send a curl request to the IP address listed and see if we get our predictions. Notice that this time we don't use localhost, instead we use the IP address the Kubernetes service object is exposing.
It works! Since this was a demo, let's shut down our resources. First shut down the Kubernetes cluster.
Next, delete the artifact repository, with the following command: