Kubeflow Intro — Machine Learning on Kubernetes

Daniel Chernenkov
5 min readDec 14, 2021


If you’ve been following the tech news, it’s a little hard to miss the buzz around Kubernetes and Kubeflow.
In this short guide, we are going to discuss some of Kubernetes’s and Kubeflow’s basic concepts, what sort of problems they solve, and how you can use them in your development endeavors.

Bigabid based on huge data-sets with multiple structures that grows each day and becomes to be a bottleneck for our training processes of machine learning models.
As the data grows — as also the time for our models to be built, as part of our solution we have started to look around for new way to distribute our building blocks for Machine Learning models and give us the ability to build highly available, scalable software that’s also easy to manage and update.

Kubeflow: Machine Learning on Kubernetes — YouTube

Introduction to Kubeflow — YouTube

What is Kubernetes

Kubernetes, also known as K8S, is a portable, extensible, open-source platform for managing containerized workloads and services. It is used for automating deployment, scaling, and managing containerized applications and has a large, continually growing ecosystem. Google open-sourced the project in 2014, and it quickly became one of the most promising projects in the recent wave of progress in the DevOps scene.
To fully understand its utility, let’s take a short trip down memory lane to the early days of deployment.

In the traditional deployment age, organizations used to run applications via physical servers, which made it impossible to define resource boundaries. If multiple apps run on a physical server, there will be instances where one of them would take up most of the resources and cause the others to perform poorly. A solution would have been running each application on separate physical servers — but such a move would generate huge costs.

Then, in the virtualized deployment age, a solution that allows running multiple Virtual Machines via the same server’s CPU was introduced. Virtualization enabled isolating applications between Virtual Machines and didn’t allow applications to access another application’s information freely. This led to improved use of resources, better scalability, reduced hardware costs, and the list could go on.

Then came the container deployment age. Containers resemble VMs, but they have the option to share the Operating System between the applications — making them lightweight. A container has its memory, file system, CPU share and is portable across OS distributions and clouds. Of course, containers have a wide array of extra benefits:

  • They allow continuous development, integration, and deployment;
  • They allow agile application creation and deployment;
  • They isolate resources and improve efficiency;
  • They allow breaking down applications into independent pieces that can be managed and deployed dynamically;
  • They allow cloud and OS distribution portability.

Kubernetes provides developers with a framework that enables them to run distributed systems smoothly, handling the scaling and failover, providing deployment patterns, and streamlining the process from start to finish. Some of the areas where Kubernetes is extremely useful are:

  • Automating rollouts and rollbacks;
  • Orchestrating the storage;
  • Load balancing and service discovery;
  • Automating bin packing;
  • Self-healing containers;
  • Secret and configuration management;

What is Kubeflow and what is it used for

Kubeflow was initially created to provide a more straightforward way to run TensorFlow jobs on Kubernetes, and it was based on a TensorFlow Extended pipeline. Then, it was extended to support various architectures and clouds to be used as a machine learning pipeline framework. It’s the machine learning toolkit for Kubernetes, in fewer words.

Kubeflow was launched as a response to two huge IT trends that were beginning to earn the spotlight: cloud-native architectures and data science/machine learning. Kubeflow runs on Kubernetes clusters locally or in the cloud, harnessing the power of training machine learning models on multiple devices and speeding up the process.

Kubeflow’s tools can help a developer/engineer easily build machine learning models, analyze their performance, refine hyper-parameters, deploy models to production, and manage to compute power.

Kubeflow is comprised of three main features that simplify machine learning: composability, scalability, and portability. Of course, developers can build containerized machine learning pipelines on Kubernetes without Kubeflow — but it helps standardize the process and make it more efficient. Kubeflow also makes it easier to configure implementations to use hardware accelerators without tweaking the code.

Kubeflow can be used for the multi-cloud framework, monitoring tools, workflow management, documentation, and model deployment.

Kubeflow integrates Tensorboard, which provides all the tools needed to visualize the machine learning process and prevent failure. The ability to monitor the training model process enables developers and engineers to refine the model’s parameters in real-time, save resources, accelerate building time via rapid iteration.

Image source: What is Kubeflow? Machine Learning Basics with Kubeflow — BMC Software | Blogs

There are multiple manners to deploy a model via Kubeflow, starting with the KFserving tool, which runs on multiple machine learning frameworks like PyTorch, SciKit Learn, and TensorFlow.

How Kubeflow elevates an ML workflow

Constructing, training and deploying ML systems is an iterative process consisting of several stages. One needs to evaluate the output of various stages of the ML workflow and apply changes to the model and parameters when necessary. The diagram below showcases how Kubeflow contributes to each stage:

The next diagram shows an actual example of a specific ML workflow that can be used to train and serve a model trained on the MNIST dataset:


In the race to build highly available, scalable software that’s also easy to manage and update, containers are quickly becoming a viable option for more and more companies across many different types of industries.
As more organizations look toward container-driven infrastructure, open-source tools like Kubernetes (K8s) have rapidly become industry standards for automating containerized applications’ deployment, management, and scaling.

This onslaught of interest in K8s has spawned the creation of newer platforms like Kubeflow, which aims to make the process of managing K8s even easier by abstracting most of the work away from users and letting them focus on their data science projects.




Daniel Chernenkov

I specialize in leading complex software projects and system designs, coordinating with diverse, international teams.