Updates to Performance and Scalability in Kubernetes 1.3 -- 2,000 node 60,000 pod clusters

Thursday, July 07, 2016

Updates to Performance and Scalability in Kubernetes 1.3 -- 2,000 node 60,000 pod clusters

We are proud to announce that with the release of version 1.3, Kubernetes now supports 2000-node clusters with even better end-to-end pod startup time. The latency of our API calls are within our one-second Service Level Objective (SLO) and most of them are even an order of magnitude better than that. It is possible to run larger deployments than a 2,000 node cluster, but performance may be degraded and it may not meet our strict SLO.

In this blog post we discuss the detailed performance results from Kubernetes 1.3 and what changes we made from version 1.2 to achieve these results. We also describe Kubemark, a performance testing tool that we’ve integrated into our continuous testing framework to detect performance and scalability regressions.

Evaluation Methodology

We have described our test scenarios in a previous blog post. The biggest change since the 1.2 release is that in our API responsiveness tests we now create and use multiple namespaces. In particular for the 2000-node/60000 pod cluster tests we create 8 namespaces. The change was done because we believe that users of such very large clusters are likely to use many namespaces, certainly at least 8 in the cluster in total.

Metrics from Kubernetes 1.3

So, what is the performance of Kubernetes version 1.3? The following graph shows the end-to-end pod startup latency with a 2000 and 1000 node cluster. For comparison we show the same metric from Kubernetes 1.2 with a 1000-node cluster.

The next graphs show API response latency for a v1.3 2000-node cluster.

How did we achieve these improvements?

The biggest change that we made for scalability in Kubernetes 1.3 was adding an efficient Protocol Buffer-based serialization format to the API as an alternative to JSON. It is primarily intended for communication between Kubernetes control plane components, but all API server clients can use this format. All Kubernetes control plane components now use it for their communication, but the system continues to support JSON for backward compatibility.

We didn’t change the format in which we store cluster state in etcd to Protocol Buffers yet, as we’re still working on the upgrade mechanism. But we’re very close to having this ready, and we expect to switch the storage format to Protocol Buffers in Kubernetes 1.4. Our experiments show that this should reduce pod startup end-to-end latency by another 30%.

How do we test Kubernetes at scale?

Spawning clusters with 2000 nodes is expensive and time-consuming. While we need to do this at least once for each release to collect real-world performance and scalability data, we also need a lighter-weight mechanism that can allow us to quickly evaluate our ideas for different performance improvements, and that we can run continuously to detect performance regressions. To address this need we created a tool call “Kubemark.”

What is “Kubemark”?

Kubemark is a performance testing tool which allows users to run experiments on emulated clusters. We use it for measuring performance in large clusters.

A Kubemark cluster consists of two parts: a real master node running the normal master components, and a set of “hollow” nodes. The prefix “hollow” means an implementation/instantiation of a component with some “moving parts” mocked out. The best example is hollow-kubelet, which pretends to be an ordinary Kubelet, but doesn’t start any containers or mount any volumes. It just claims it does, so from master components’ perspective it behaves like a real Kubelet.

Since we want a Kubemark cluster to be as similar to a real cluster as possible, we use the real Kubelet code with an injected fake Docker client. Similarly hollow-proxy (KubeProxy equivalent) reuses the real KubeProxy code with injected no-op Proxier interface (to avoid mutating iptables).

Thanks to those changes

many hollow-nodes can run on a single machine, because they are not modifying the environment in which they are running
without real containers running and the need for a container runtime (e.g. Docker), we can run up to 14 hollow-nodes on a 1-core machine.
yet hollow-nodes generate roughly the same load on the API server as their “whole” counterparts, so they provide a realistic load for performance testing [the only fundamental difference is that we are not simulating any errors that can happens in reality (e.g. failing containers) - adding support for this is a potential extension to the framework in the future]

How do we set up Kubemark clusters?

To create a Kubemark cluster we use the power the Kubernetes itself gives us - we run Kubemark clusters on Kubernetes. Let’s describe this in detail.

In order to create a N-node Kubemark cluster, we:

create a regular Kubernetes cluster where we can run N hollow-nodes [e.g. to create 2000-node Kubemark cluster, we create a regular Kubernetes cluster with 22 8-core nodes]
create a dedicated VM, where we start all master components for our Kubemark cluster (etcd, apiserver, controllers, scheduler, …).
schedule N “hollow-node” pods on the base Kubernetes cluster. Those hollow-nodes are configured to talk to the Kubemark API server running on the dedicated VM
finally, we create addon pods (currently just Heapster) by scheduling them on the base cluster and configuring them to talk to the Kubemark API server Once this done, you have a usable Kubemark cluster that you can run your (performance) tests on. We have scripts for doing all of this on Google Compute Engine (GCE). For more details, take a look at our guide.

One thing worth mentioning here is that while running Kubemark, underneath we’re also testing Kubernetes correctness. Obviously your Kubemark cluster will not work correctly if the base Kubernetes cluster under it doesn’t work.

Performance measured in real clusters vs Kubemark

Crucially, the performance of Kubemark clusters is mostly similar to the performance of real clusters. For the pod startup end-to-end latency, as shown in the graph below, the difference is negligible:

For the API-responsiveness, the differences are higher, though generally less than 2x. However, trends are exactly the same: an improvement/regression in a real cluster is visible as a similar percentage drop/increase in metrics in Kubemark.

Conclusion

We continue to improve the performance and scalability of Kubernetes. In this blog post we
showed that the 1.3 release scales to 2000 nodes while meeting our responsiveness SLOs
explained the major change we made to improve scalability from the 1.2 release, and
described Kubemark, our emulation framework that allows us to quickly evaluate the performance impact of code changes, both when experimenting with performance improvement ideas and to detect regressions as part of our continuous testing infrastructure.

Please join our community and help us build the future of Kubernetes! If you’re particularly interested in scalability, participate by:

chatting with us on our Slack channel
joining the scalability Special Interest Group, which meets every Thursday at 9 AM Pacific Time on this SIG-Scale Hangout

For more information about the Kubernetes project, visit kubernetes.io and follow us on Twitter @Kubernetesio.

– Wojciech Tyczynski, Software Engineer, Google

Introducing kustomize; Template-free Configuration Customization for Kubernetes May 29
Getting to Know Kubevirt May 22
Gardener - The Kubernetes Botanist May 17
Docs are Migrating from Jekyll to Hugo May 5
Announcing Kubeflow 0.1 May 4
Current State of Policy in Kubernetes May 2
Developing on Kubernetes May 1
Zero-downtime Deployment in Kubernetes with Jenkins Apr 30
Kubernetes Community - Top of the Open Source Charts in 2017 Apr 25
Local Persistent Volumes for Kubernetes Goes Beta Apr 13
Container Storage Interface (CSI) for Kubernetes Goes Beta Apr 10
Fixing the Subpath Volume Vulnerability in Kubernetes Apr 4
Principles of Container-based Application Design Mar 15
Expanding User Support with Office Hours Mar 14
How to Integrate RollingUpdate Strategy for TPR in Kubernetes Mar 13
Apache Spark 2.3 with Native Kubernetes Support Mar 6
Kubernetes: First Beta Version of Kubernetes 1.10 is Here Mar 2
Reporting Errors from Control Plane to Applications Using Kubernetes Events Jan 25
Core Workloads API GA Jan 15
Introducing client-go version 6 Jan 12
Extensible Admission is Beta Jan 11
Introducing Container Storage Interface (CSI) Alpha for Kubernetes Jan 10
Kubernetes v1.9 releases beta support for Windows Server Containers Jan 9
Five Days of Kubernetes 1.9 Jan 8

Creating a Raspberry Pi cluster running Kubernetes, the installation (Part 2) Dec 22
Managing Kubernetes Pods, Services and Replication Controllers with Puppet Dec 17
How Weave built a multi-deployment solution for Scope using Kubernetes Dec 12
Creating a Raspberry Pi cluster running Kubernetes, the shopping list (Part 1) Nov 25
Monitoring Kubernetes with Sysdig Nov 19
One million requests per second: Dependable and dynamic distributed systems at scale Nov 11
Kubernetes 1.1 Performance upgrades, improved tooling and a growing community Nov 9
Kubernetes as Foundation for Cloud Native PaaS Nov 3
Some things you didn’t know about kubectl Oct 28
Kubernetes Performance Measurements and Roadmap Sep 10
Using Kubernetes Namespaces to Manage Environments Aug 28
Weekly Kubernetes Community Hangout Notes - July 31 2015 Aug 4
The Growing Kubernetes Ecosystem Jul 24
Weekly Kubernetes Community Hangout Notes - July 17 2015 Jul 23
Strong, Simple SSL for Kubernetes Services Jul 14
Weekly Kubernetes Community Hangout Notes - July 10 2015 Jul 13
Announcing the First Kubernetes Enterprise Training Course Jul 8
Kubernetes 1.0 Launch Event at OSCON Jul 2
How did the Quake demo from DockerCon Work? Jul 2
The Distributed System ToolKit: Patterns for Composite Containers Jun 29
Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh Jun 26
Cluster Level Logging with Kubernetes Jun 11
Weekly Kubernetes Community Hangout Notes - May 22 2015 Jun 2
Kubernetes on OpenStack May 19
Weekly Kubernetes Community Hangout Notes - May 15 2015 May 18
Docker and Kubernetes and AppC May 18
Kubernetes Release: 0.17.0 May 15
Resource Usage Monitoring in Kubernetes May 12
Weekly Kubernetes Community Hangout Notes - May 1 2015 May 11
Kubernetes Release: 0.16.0 May 11
AppC Support for Kubernetes through RKT May 4
Weekly Kubernetes Community Hangout Notes - April 24 2015 Apr 30
Borg: The Predecessor to Kubernetes Apr 23
Kubernetes and the Mesosphere DCOS Apr 22
Weekly Kubernetes Community Hangout Notes - April 17 2015 Apr 17
Kubernetes Release: 0.15.0 Apr 16
Introducing Kubernetes API Version v1beta3 Apr 16
Weekly Kubernetes Community Hangout Notes - April 10 2015 Apr 11
Faster than a speeding Latte Apr 6
Weekly Kubernetes Community Hangout Notes - April 3 2015 Apr 4
Paricipate in a Kubernetes User Experience Study Mar 31
Weekly Kubernetes Community Hangout Notes - March 27 2015 Mar 28
Kubernetes Gathering Videos Mar 23
Welcome to the Kubernetes Blog! Mar 20

0001

Kubernetes 1.10: Stabilizing Storage, Security, and Networking Jan 1
Migrating the Kubernetes blog Jan 1
Kubernetes Application Survey 2018 Results Jan 1
Current State of Policy in Kubernetes Jan 1
Kubernetes Containerd Integration Goes GA Jan 1
Say Hello to Discuss Kubernetes Jan 1

Kubernetes Blog

Thursday, July 07, 2016

Updates to Performance and Scalability in Kubernetes 1.3 -- 2,000 node 60,000 pod clusters

« Prev

Next >>

2018

2017

2016

2015

0001