Building Machine Learning Services in Kubernetes

In our last post, we stepped through the process of building a YouTube video scraper and YOLO Video Inference APIs, to try reproducing Christian Marclay’s “The Clock”.

Today, we’ll explore how to build scaleable, reliable deep learning APIs, using Kubernetes and containers.

The Case for Running Deep Learning Projects in Containers

Many of the most exciting machine learning projects have wildly different requirements for setting things up.

Often, academic papers and projects have very specific requirements to build their models. Setting up these requirements can take hours, and figuring out which version of which library conflict where quickly becomes overwhelming.

Containers help address this problem.

Rather than installing the libraries and requirements on your system, containers allow you to list all the specific steps necessary to generate your project’s environment.

So, once you’ve written those instructions, everyone else who wants to use your project just then do a Docker run, and have your entire setup built for them locally, isolated from the rest of their system.

But one of the trickier things with machine learning has been getting access to the GPU for training. Here, things have dramatically improved in the past year.

NVIDIA now releases their own container runtime for Docker, along with base Docker images for the most common deep learning libraries.

In many cases, using NVIDIA’s base image allows you to just add a project’s Github repo and start running models.

The Case for Running Kubernetes

Kubernetes Orchestrating Containers

Kubernetes is a platform for running distributed systems.

It allows you to abstract away the idea of computers, and instead pretend you have an infinite amount of computation available.

It’s especially useful for building systems where you might need to be serving a lot of people suddenly, or when you have a strong requirement for reliability.

But that scalability and reliability comes with a tradeoff. And that tradeoff is more complexity.

If you look for kuberenetes machine learning projects, you’ll see things like Kubeflow, and its long list of 16 (!) potential components.

You’ll quickly see why people assume you’re going to do all your Kubernetes development in the cloud. There are a lot of services that the project comes with.

And it can be overwhelming to have to learn a single new piece of software, let alone picking and choosing among 16 components before you even begin.

And that’s really the biggest drawback to using Kubernetes in general. It adds a lot of moving pieces. This is because Kubernetes was build to live in a world with infinite (cloud level) computational resources available.

So let’s begin with something simple, building and running machine learning services locally.

For that, we’ll use Ubuntu’s microk8s.

Developing Machine Learning Services Locally with microk8s

By default, Kubernetes is meant to be run across a cluster of machines. It’s job is to collect definitions of what services should be running, and make sure to run them across a cluster of machines in a semi-intelligent way.

So for development, you need to have a version of Kubernetes dedicated to local development. The best choice for local development, including GPU acceleration, is Ubuntu’s microk8s.

Unlike other platforms, microk8s installs via a snap, and runs natively. It keeps both docker and kubernetes namespaced, and isolated from the rest of your environment.

For example, to bring up the microk8s cluster and build an image:

$ microk8s.start
Started.
$ cd ~/Development/firefox-splinter-docker
$ microk8s.docker build -t firefox-splinter:latest .

This builds an image in the firefox-splinter directory, names it firefox-splinter:latest, and makes it available for Kubernetes.

You can then create a Kubernetes Service, and run it with a defined YAML file. Notice how our image name is the same as our tag:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: scraperapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: scraperapp
  template:
    metadata:
      labels:
        app: scraperapp
      annotations:
        com.datadoghq.com.ad.logs: '[{"source": "python", "service": "scraper-service"}]'
    spec:
      containers:
      - name: scraperapp
        image: firefox-splinter:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5005
        volumeMounts:
        - name: videos
          mountPath: /downloads
        env:
        - name: DD_AGENT_HOST
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: DOGSTATSD_HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: DD_LOGS_INJECTION
          value: 'true'
        - name: DATADOG_SERVICE_NAME
          value: 'scraper-service'
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-user
              key: token
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-password
              key: token
      volumes:
        - hostPath:
            path: /media/stankley/Samsung_T5/downloaded-videos
          name: videos
---
apiVersion: v1
kind: Service
metadata:
  name: scraperapp
spec:
  selector:
    app: scraperapp
  ports:
  - name: http
    protocol: TCP
    port: 5005
  type: NodePort

Now, there are a few things going on in the YAML defined above. We’ve defined the ports that need to be opened on the host, (or Node), mounted a volume to keep files, defined secrets to pull from, and added Datadog environment variables for observability.

(I work at Datadog, a platform to monitor systems and their health. I’ve added these to help with the development workflow. Datadog gives me logs, traces, and metrics on the software as I build it locally. We’ll get more into that later.)

One thing to point out is the valueFrom: secretKeyRef. In Kubernetes, you can defined secrets. In this case, we want to have a username and password available for our application, but we don’t want it stored in our code where we can accidentally send it to our source control.

Kubernetes lets you defined things with names like this:

$ microk8s.kubectl create secret generic postgres-user --from-literal=token=<POSTGRES_USER>

This way, your entire cluster has the secrets available, and you can just pull directly from them.

Deploying Applications in Kubernetes

Now that we’ve seen our services are both a container image, and a service definition, we can look at what a development process actually looks like.

I mentioned using Datadog, but you can also use a free, open source alternative like Prometheus to get insights into your cluster. Any sort of observability becomes crucial as we move from one service to many, as in a Kubernetes cluster. Otherwise, it quickly becomes too many variables to try and model systems in your head.

So, from scratch, the first thing we do is check out our two services from Github, and bring up our databases and observability systems. Next, we build our images for our containers, and then apply their YAML files:

$ microk8s.kubectl apply -f postgres_deploy.yaml
$ microk8s.kubectl apply -f datadog-agent.yaml
$ microk8s.kubectl apply -f scraper_service.yaml
$ microk8s.kubectl apply node-exporter-daemonset.yaml
$ microk8s.kubectl apply -f inference_service.yaml

Let’s see what our containers look like after applying those services:

$ microk8s.kubectl get pods
NAME                            READY   STATUS                       RESTARTS   AGE
datadog-agent-wg8pq             0/1     CreateContainerConfigError   0          19s
inferenceapp-577474547b-fqdw9   0/1     ImagePullBackOff             0          31s
postgres-5f857bc8d4-9h7gb       0/1     CreateContainerConfigError   0          13s
scraperapp-6dbc864566-dqmf8     0/1     ErrImagePull                 0          8s

Oh no! It looks like we’re not pulling the images because we haven’t built them yet. Let’s do that now. And it seems the Datadog Agent and PostgreSQL containers need our secrets set:

$ cd ffmpeg-pytorch/
$ microk8s.docker build -t ffmpegpytorch:latest .
$ cd ../firefox-splinter
$ microk8s.docker build -t firefox-splinter:latest .
$ microk8s.kubectl create secret generic postgres-user --from-literal=token=<POSTGRES_USER>
$ microk8s.kubectl create secret generic postgres-password --from-literal=token=<POSTGRES_PASS>
$ microk8s.kubectl create secret generic datadog-api --from-literal=token=<DATADOG_API_KEY>

These both will take a while on first run. But luckily, Docker caches each line in your Dockerfile, allowing you to make edits towards the end of your Dockerfile, and reuse the existing, already built parts.

Let’s see if the pods now come up. We should see all the services we’ve applied now:

$ microk8s.kubectl get services
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
inferenceapp   NodePort    10.152.183.139   <none>        5007:30241/TCP   43d
kubernetes     ClusterIP   10.152.183.1     <none>        443/TCP          63d
postgres       ClusterIP   10.152.183.68    <none>        5432/TCP         39d
scraperapp     NodePort    10.152.183.141   <none>        5005:32301/TCP   62d

We’re still missing the NVDIA statistics exporter service. That’s because we haven’t labeled our node as being a GPU instance. So let’s do that now:

$ microk8s.kubectl get nodes
NAME       STATUS   ROLES    AGE     VERSION
stankley   Ready    <none>   4h40m   v1.13.4
$ kubectl label nodes stankley hardware-type=NVIDIAGPU
$ microk8s.kubectl get pods
NAME                            READY   STATUS              RESTARTS   AGE
datadog-agent-wg8pq             1/1     Running             0          4h30m
inferenceapp-577474547b-fqdw9   1/1     Running             0          4h30m
node-exporter-rcvv8             0/2     ContainerCreating   0          6s
postgres-5f857bc8d4-9h7gb       1/1     Running             0          4h30m
scraperapp-6dbc864566-dqmf8     1/1     Running             0          4h29m

Great! Now we’ve got our full cluster up and running. Let’s look at the services we’ve defined:

$ $ microk8s.kubectl get services
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
inferenceapp   NodePort    10.152.183.183   <none>        5007:31718/TCP   4h33m
kubernetes     ClusterIP   10.152.183.1     <none>        443/TCP          4h44m
postgres       ClusterIP   10.152.183.159   <none>        5432/TCP         4h33m
scraperapp     NodePort    10.152.183.232   <none>        5005:30638/TCP   4h33m

Because of our defined ports, we can now visit http://10.152.183.139:5007 and http://10.152.173.141:5005, and see our running services.

Out of the box, the way we get our services to talk to one another is via environment variables set within our running services. Remember that our services themselves are running in microk8s’ Docker.

Let’s try connecting to a running pod, and see what environment variables are set.

$ microk8s.docker ps | grep scraperapp
3fc81e553d46        361bdaeac05e                       "ddtrace-run flask r…"   4 hours ago         Up 4 hours                              k8s_scraperapp_scraperapp-6dbc864566-dqmf8_default_ddcc57a1-4d79-11e9-9986-74d435e3d1c3_0
d0cd931d00da        k8s.gcr.io/pause:3.1               "/pause"                 5 hours ago         Up 5 hours                              k8s_POD_scraperapp-6dbc864566-dqmf8_default_ddcc57a1-4d79-11e9-9986-74d435e3d1c3_
$ microk8s.docker exec -it 3fc81e553d46 /bin/bash
root@scraperapp-6dbc864566-dqmf8:/#  env | grep INFERENCE
INFERENCEAPP_PORT_5007_TCP_ADDR=10.152.183.183
INFERENCEAPP_PORT=tcp://10.152.183.183:5007
INFERENCEAPP_SERVICE_PORT=5007
INFERENCEAPP_PORT_5007_TCP_PORT=5007
INFERENCEAPP_SERVICE_HOST=10.152.183.183
INFERENCEAPP_PORT_5007_TCP=tcp://10.152.183.183:5007
INFERENCEAPP_PORT_5007_TCP_PROTO=tcp
INFERENCEAPP_SERVICE_PORT_HTTP=5007

Great! We can see that Kubernetes has added an environment variable pointing to the IP address for the Inference service. So within our web application, we can pull that environment variable to be able to talk to a Pod scheduled to run in that service.

In Python, the code to grab a url for the Inference server might look like this:

import os

inference_url = f"http://{os.environ['INFERENCEAPP_SERVICE_HOST']}:{os.environ['INFERENCEAPP_SERVICE_PORT_HTTP']}"

But now that we know how to connect our pods, how do we go about actually doing development? How do we “push” code changes into our Kubernetes cluster?

Pushing Code Changes By Building New Containers

Back when we were first spinning up our cluster, Kubernetes couldn’t find our Docker images, and so it had an error, and couldn’t spin up our Pod.

Once we created an image with the proper tag, Kubernetes noticed it, picked it up, and deployed it.

For local development, we can rely on the imagePullPolicy being set to Always, and use that to deploy new images to our cluster.

We then simply do a Docker build of our image container with new code, followed by a pod delete of the existing pod.

$ microk8s.docker build -t ffmpegpytorch:latest .
$ microk8s.kubectl get pods
NAME                            READY   STATUS             RESTARTS   AGE
datadog-agent-wg8pq             1/1     Running            0          5h30m
inferenceapp-577474547b-fqdw9   1/1     Running            0          5h30m
node-exporter-rcvv8             2/2     Running            16         60m
postgres-5f857bc8d4-9h7gb       1/1     Running            0          5h30m
scraperapp-6dbc864566-dqmf8     1/1     Running            0          5h30m
$ microk8s.kubectl delete pod inferenceapp-577474547b-fqdw9
pod "inferenceapp-577474547b-fqdw9" deleted
$ microk8s.kubectl get pods | grep inference
inferenceapp-577474547b-k5wdp   1/1     Running            0          38s

We should now be able to hit our service URL and see our code changes posted. But building and pushing images adds latency to the development cycle. And so there are projects like Telepresence, which allow you to connect to a running cluster, and run your local code as part of it.

Thinking Through Machine Learning Architectures

So we’ve seen a bunch of tools, and so far, they’ve only seemed to create more questions and complexity than they answer.

And that’s mostly the process of working with Kubernetes for now. As you start building things, you add more components to abstract away ideas, or include features.

For example, we’ve seen how to create and deploy traditional APIs, but what about graphs of computation? In our Recreating “The Clock” post, we suggested first doing a video inference to detect the places of clocks, followed by then trying to detect the actual times featured on the clocks.

How do we go about coordinating doing one thing, and then another?

For that, we can use something like Kubeflow’s piplelines. It allows you to build a graph of everything you want to happen to your data.

And inside of Kubeflow, there are a few tools to just deploy a trained model to Kubernetes, and get back an API. So we wouldn’t even need to think about building APIs, we’d just send an image and receive and inference back.

Seldon Core takes this a step further, and gives you A/B testing, along with outlier detection, so you can do more analytics on the results of your inferences.

The idea of trained models just doing inference on images is very similar to lambdas, or serverless functions. This sort of a workflow makes a lot of sense, passing messages from one service to the next, and saving data elsewhere.

Where to Go From Here

Although this post touched on a lot of different things, I hope it gave you an idea for how much the development process changes within Kubernetes, and what some of the benefits might be.

I like to think of Kubernetes as a platform that moves us from Software As a Service to Software As a Utility. We’re moving from the software equivalent of a burger stand to an electrical utility for a county. The complexity tradeoffs rise substantially, but so does the potential for building things on top of your software.

Remember, the repos to build this Kubernetes cluster are both available on Github. Feel free to create an issue if something comes up for you, or you have a question.

In the next post, we’ll build the final part of our Deep Learning service, and try to detect clock times based upon clock images. If you’re interested in following along, I suggest subscribing below.

If you’re still learning Python and Pygame, or you want a visual introduction to programming, check out my book, Make Art with Python. The first three chapters are free and online here.

Finally, feel free to share this post with your friends. It helps me to continue making these tutorials .

Updated: