Thursday, February 11, 2016
ShareThis: Kubernetes In Production
Today’s guest blog post is by Juan Valencia, Technical Lead at ShareThis, a service that helps website publishers drive engagement and consumer sharing behavior across social networks.
ShareThis has grown tremendously since its first days as a tiny widget that allowed you to share to your favorite social services. It now serves over 4.5 million domains per month, helping publishers create a more authentic digital experience.
Fast growth came with a price. We leveraged technical debt to scale fast and to grow our products, particularly when it came to infrastructure. As our company expanded, the infrastructure costs mounted as well - both in terms of inefficient utilization and in terms of people costs. About 1 year ago, it became clear something needed to change.
TL;DRKubernetes has been a key component for us to reduce technical debt in our infrastructure by:
- Fostering the Adoption of Docker
- Simplifying Container Management
- Onboarding Developers On Infrastructure
- Unlocking Continuous Integration and Delivery
We accomplished this by radically adopting Kubernetes and switching our DevOps team to a Cloud Platform team that worked in terms of containers and microservices. This included creating some tools to get around our own legacy debt.
The Problem
Alas, the cloud was new and we were young. We started with a traditional data-center mindset. We managed all of our own services: MySQL, Cassandra, Aerospike, Memcache, you name it. We set up VM’s just like you would traditional servers, installed our applications on them, and managed them in Nagios or Ganglia.
Unfortunately, this way of thinking was antithetical to a cloud-centric approach. Instead of thinking in terms of services, we were thinking in terms of servers. Instead of using modern cloud approaches such as autoscaling, microservices, or even managed VM’s, we were thinking in terms of scripted setups, server deployments, and avoiding vendor lock-in.
These ways of thinking were not bad per se, they were simply inefficient. They weren’t taking advantage of the changes to the cloud that were happening very quickly. It also meant that when changes needed to take place, we were treating those changes as big slow changes to a datacenter, rather than small fast changes to the cloud.
The Solution
Kubernetes As A Tool To Foster Docker Adoption
As Docker became more of a force in our industry, engineers at ShareThis also started experimenting with it to good effect. It soon became obvious that we needed to have a working container for every app in our company just so we could simplify testing in our development environment.
Some apps moved quickly into Docker because they were simple and had few dependencies. For those that had small dependencies, we were able to manage using Fig (Fig was the original name of Docker Compose). Still, many of our data pipelines or interdependent apps were too gnarly to be directly dockerized. We still wanted to do it, but Docker was not enough.
In late 2015, we were frustrated enough with our legacy infrastructure that we finally bit the bullet. We evaluated Docker’s tools, ECS, Kubernetes, and Mesosphere. It was quickly obvious that Kubernetes was in a more stable and user friendly state than its competitors for our infrastructure. As a company, we could solidify our infrastructure on Docker by simply setting the goal of having all of our infrastructure on Kubernetes.
Engineers were skeptical at first. However, once they saw applications scale effortlessly into hundreds of instances per application, they were hooked. Now, not only was there the pain points driving us forward into Docker and by extension Kubernetes, but there was genuine excitement for the technology pulling us in. This has allowed us to make an incredibly difficult migration fairly quickly. We now run Kubernetes in multiple regions on about 65 large VMs and increasing to over 100 in the next couple months. Our Kubernetes cluster currently processes 800 million requests per day with the plan to process over 2 billion requests per day in the coming months.
Kubernetes As A Tool To Manage Containers
Our earliest use of Docker was promising for development, but not so much so for production. The biggest friction point was the inability to manage Docker components at scale. Knowing which containers were running where, what version of a deployment was running, what state an app was in, how to manage subnets and VPCs, etc, plagued any chance of it going to production. The tooling required would have been substantial.
When you look at Kubernetes, there are several key features that were immediately attractive:
- It is easy to install on AWS (where all our apps were running)
- There is a direct path from a Dockerfile to a replication controller through a yaml/json file
- Pods are able to scale in number easily
- We can easily scale the number of VM’s running on AWS in a Kubernetes cluster
- Rolling deployments and rollback are built into the tooling
- Each pod gets monitored through health checks
- Service endpoints are managed by the tool
- There is an active and vibrant community
Unfortunately, one of the biggest pain points was that the tooling didn’t solve our existing legacy infrastructure, it just provided an infrastructure to move onto. There were still a variety of network quirks which disallowed us from directly moving our applications onto a new VPC. In addition, the reworking of so many applications required developers to jump onto problems that have classically been solved by sys admins and operations teams.
Kubernetes As A Tool For Onboarding Developers On Infrastructure
When we decided to make the switch from what was essentially a Chef-run setup to Kubernetes, I do not think we understood all of the pain points that we would hit. We ran our servers in a variety of different ways in a variety of different network configurations that were considerably different than the clean setup that you find on a fresh Kubernetes VPC.
In production we ran in both AWS VPCs and AWS classic across multiple regions. This means that we managed several subnets with different access controls across different applications. Our most recent applications were also very secure, having no public endpoints. This meant that we had a combination of VPC peering, network address translation (NAT), and proxies running in varied configurations.
In the Kubernetes world, there’s only the VPC. All the pods can theoretically talk to each other, and services endpoints are explicitly defined. It’s easy for the developer to gloss over some of the details and it removes the need for operations (mostly).
We made the decision to convert all of our infrastructure / DevOps developers into application developers (really!). We had already started hiring them on the basis of their development skills rather than their operational skills anyway, so perhaps that is not as wild as it sounds.
We then made the decision to onboard our entire engineering organization onto Operations. Developers are flexible, they enjoy challenges, and they enjoy learning. It was remarkable. After 1 month, our organization went from having a few DevOps folks, to having every engineer capable of modifying our architecture.
The training ground for onboarding on networking, productionization, problem solving, root cause analysis, etc, was getting Kubernetes into prod at scale. After the first month, I was biting my nails and worrying about our choices. After 2 months, it looked like it might some day be viable. After 3 months, we were deploying 10 times per week. After 4 months, 40 apps per week. Only 30% of our apps have been migrated, yet the gains are not only remarkable, they are astounding. Kubernetes allowed us to go from an infrastructure-is-slowing-us-down-ugh! organization, to an infrastructure-is-speeding-us-up-yay! organization.
Kubernetes As A Means To Unlock Continuous Integration And Delivery
How did we get to 40+ deployments per week? Put simply, continuous integration and deployment (CI/CD) came as a byproduct of our migration. Our first application in Kubernetes was Jenkins, and every app that went in also was added to Jenkins. As we moved forward, we made Jenkins more automatic until pods were being added and taken from Kubernetes faster than we could keep track.
Interestingly, our problems with scaling are now about wanting to push out too many changes at once and people having to wait until their turn. Our goal is to get 100 deployments per week through the new infrastructure. This is achievable if we can continue to execute on our migration and on our commitment to a CI/CD process on Kubernetes and Jenkins.
Next Steps
We need to finish our migration. At this point the problems are mostly solved, the biggest difficulties are in the tedium of the task at hand. To move things out of our legacy infrastructure meant changing the network configurations to allow access to and from the Kubernetes VPC and across the regions. This is still a very real pain, and one we continue to address.
Some services do not play well in Kubernetes – think stateful distributed databases. Luckily, we can usually migrate those to a 3rd party who will manage it for us. At the end of this migration, we will only be running pods on Kubernetes. Our infrastructure will become much simpler.
All these changes do not come for free; committing our entire infrastructure to Kubernetes means that we need to have Kubernetes experts. Our team has been unblocked in terms of infrastructure and they are busy adding business value through application development (as they should). However, we do not (yet) have committed engineers to stay up to date with changes to Kubernetes and cloud computing.
As such, we have transferred one engineer to a new “cloud platform team” and will hire a couple of others (have I mentioned we’re hiring!). They will be responsible for developing tools that we can use to interface well with Kubernetes and manage all of our cloud resources. In addition, they will be working in the Kubernetes source code, part of Kubernetes SIGs, and ideally, pushing code into the open source project.
Summary
All in all, while the move to Kubernetes initially seemed daunting, it was far less complicated and disruptive than we thought. And the reward at the other end was a company that could respond as fast as our customers wanted.Editor’s note: at a recent Kubernetes meetup, the team at ShareThis gave a talk about their production use of Kubernetes. Video is embedded below.
- Introducing kustomize; Template-free Configuration Customization for Kubernetes May 29
- Getting to Know Kubevirt May 22
- Gardener - The Kubernetes Botanist May 17
- Docs are Migrating from Jekyll to Hugo May 5
- Announcing Kubeflow 0.1 May 4
- Current State of Policy in Kubernetes May 2
- Developing on Kubernetes May 1
- Zero-downtime Deployment in Kubernetes with Jenkins Apr 30
- Kubernetes Community - Top of the Open Source Charts in 2017 Apr 25
- Local Persistent Volumes for Kubernetes Goes Beta Apr 13
- Container Storage Interface (CSI) for Kubernetes Goes Beta Apr 10
- Fixing the Subpath Volume Vulnerability in Kubernetes Apr 4
- Principles of Container-based Application Design Mar 15
- Expanding User Support with Office Hours Mar 14
- How to Integrate RollingUpdate Strategy for TPR in Kubernetes Mar 13
- Apache Spark 2.3 with Native Kubernetes Support Mar 6
- Kubernetes: First Beta Version of Kubernetes 1.10 is Here Mar 2
- Reporting Errors from Control Plane to Applications Using Kubernetes Events Jan 25
- Core Workloads API GA Jan 15
- Introducing client-go version 6 Jan 12
- Extensible Admission is Beta Jan 11
- Introducing Container Storage Interface (CSI) Alpha for Kubernetes Jan 10
- Kubernetes v1.9 releases beta support for Windows Server Containers Jan 9
- Five Days of Kubernetes 1.9 Jan 8
- Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes Dec 21
- Kubernetes 1.9: Apps Workloads GA and Expanded Ecosystem Dec 15
- Using eBPF in Kubernetes Dec 7
- PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes Dec 6
- Autoscaling in Kubernetes Nov 17
- Certified Kubernetes Conformance Program: Launch Celebration Round Up Nov 16
- Kubernetes is Still Hard (for Developers) Nov 15
- Securing Software Supply Chain with Grafeas Nov 3
- Containerd Brings More Container Runtime Options for Kubernetes Nov 2
- Kubernetes the Easy Way Nov 1
- Enforcing Network Policies in Kubernetes Oct 30
- Using RBAC, Generally Available in Kubernetes v1.8 Oct 28
- It Takes a Village to Raise a Kubernetes Oct 26
- kubeadm v1.8 Released: Introducing Easy Upgrades for Kubernetes Clusters Oct 25
- Five Days of Kubernetes 1.8 Oct 24
- Introducing Software Certification for Kubernetes Oct 19
- Request Routing and Policy Management with the Istio Service Mesh Oct 10
- Kubernetes Community Steering Committee Election Results Oct 5
- Kubernetes 1.8: Security, Workloads and Feature Depth Sep 29
- Kubernetes StatefulSets & DaemonSets Updates Sep 27
- Introducing the Resource Management Working Group Sep 21
- Windows Networking at Parity with Linux for Kubernetes Sep 8
- Kubernetes Meets High-Performance Computing Aug 22
- High Performance Networking with EC2 Virtual Private Clouds Aug 11
- Kompose Helps Developers Move Docker Compose Files to Kubernetes Aug 10
- Happy Second Birthday: A Kubernetes Retrospective Jul 28
- How Watson Health Cloud Deploys Applications with Kubernetes Jul 14
- Kubernetes 1.7: Security Hardening, Stateful Application Updates and Extensibility Jun 30
- Draft: Kubernetes container development made easy May 31
- Managing microservices with the Istio service mesh May 31
- Kubespray Ansible Playbooks foster Collaborative Kubernetes Ops May 19
- Kubernetes: a monitoring guide May 19
- Dancing at the Lip of a Volcano: The Kubernetes Security Process - Explained May 18
- How Bitmovin is Doing Multi-Stage Canary Deployments with Kubernetes in the Cloud and On-Prem Apr 21
- RBAC Support in Kubernetes Apr 6
- Configuring Private DNS Zones and Upstream Nameservers in Kubernetes Apr 4
- Advanced Scheduling in Kubernetes Mar 31
- Scalability updates in Kubernetes 1.6: 5,000 node and 150,000 pod clusters Mar 30
- Five Days of Kubernetes 1.6 Mar 29
- Dynamic Provisioning and Storage Classes in Kubernetes Mar 29
- Kubernetes 1.6: Multi-user, Multi-workloads at Scale Mar 28
- The K8sPort: Engaging Kubernetes Community One Activity at a Time Mar 24
- Deploying PostgreSQL Clusters using StatefulSets Feb 24
- Containers as a Service, the foundation for next generation PaaS Feb 21
- Inside JD.com's Shift to Kubernetes from OpenStack Feb 10
- Run Deep Learning with PaddlePaddle on Kubernetes Feb 8
- Highly Available Kubernetes Clusters Feb 2
- Running MongoDB on Kubernetes with StatefulSets Jan 30
- Fission: Serverless Functions as a Service for Kubernetes Jan 30
- How we run Kubernetes in Kubernetes aka Kubeception Jan 20
- Scaling Kubernetes deployments with Policy-Based Networking Jan 19
- A Stronger Foundation for Creating and Managing Kubernetes Clusters Jan 12
- Kubernetes UX Survey Infographic Jan 9
- Kubernetes supports OpenAPI Dec 23
- Cluster Federation in Kubernetes 1.5 Dec 22
- Windows Server Support Comes to Kubernetes Dec 21
- StatefulSet: Run and Scale Stateful Applications Easily in Kubernetes Dec 20
- Introducing Container Runtime Interface (CRI) in Kubernetes Dec 19
- Five Days of Kubernetes 1.5 Dec 19
- Kubernetes 1.5: Supporting Production Workloads Dec 13
- From Network Policies to Security Policies Dec 8
- Kompose: a tool to go from Docker-compose to Kubernetes Nov 22
- Kubernetes Containers Logging and Monitoring with Sematext Nov 18
- Visualize Kubelet Performance with Node Dashboard Nov 17
- CNCF Partners With The Linux Foundation To Launch New Kubernetes Certification, Training and Managed Service Provider Program Nov 8
- Modernizing the Skytap Cloud Micro-Service Architecture with Kubernetes Nov 7
- Bringing Kubernetes Support to Azure Container Service Nov 7
- Tail Kubernetes with Stern Oct 31
- Introducing Kubernetes Service Partners program and a redesigned Partners page Oct 31
- How We Architected and Run Kubernetes on OpenStack at Scale at Yahoo! JAPAN Oct 24
- Building Globally Distributed Services using Kubernetes Cluster Federation Oct 14
- Helm Charts: making it simple to package and deploy common applications on Kubernetes Oct 10
- Dynamic Provisioning and Storage Classes in Kubernetes Oct 7
- How we improved Kubernetes Dashboard UI in 1.4 for your production needs Oct 3
- How we made Kubernetes insanely easy to install Sep 28
- How Qbox Saved 50% per Month on AWS Bills Using Kubernetes and Supergiant Sep 27
- Kubernetes 1.4: Making it easy to run on Kubernetes anywhere Sep 26
- High performance network policies in Kubernetes clusters Sep 21
- Creating a PostgreSQL Cluster using Helm Sep 9
- Deploying to Multiple Kubernetes Clusters with kit Sep 6
- Cloud Native Application Interfaces Sep 1
- Security Best Practices for Kubernetes Deployment Aug 31
- Scaling Stateful Applications using Kubernetes Pet Sets and FlexVolumes with Datera Elastic Data Fabric Aug 29
- SIG Apps: build apps for and operate them in Kubernetes Aug 16
- Kubernetes Namespaces: use cases and insights Aug 16
- Create a Couchbase cluster using Kubernetes Aug 15
- Challenges of a Remotely Managed, On-Premises, Bare-Metal Kubernetes Cluster Aug 2
- Why OpenStack's embrace of Kubernetes is great for both communities Jul 26
- The Bet on Kubernetes, a Red Hat Perspective Jul 21
- Happy Birthday Kubernetes. Oh, the places you’ll go! Jul 21
- A Very Happy Birthday Kubernetes Jul 21
- Bringing End-to-End Kubernetes Testing to Azure (Part 2) Jul 18
- Steering an Automation Platform at Wercker with Kubernetes Jul 15
- Dashboard - Full Featured Web Interface for Kubernetes Jul 15
- Cross Cluster Services - Achieving Higher Availability for your Kubernetes Applications Jul 14
- Citrix + Kubernetes = A Home Run Jul 14
- Thousand Instances of Cassandra using Kubernetes Pet Set Jul 13
- Stateful Applications in Containers!? Kubernetes 1.3 Says “Yes!” Jul 13
- Kubernetes in Rancher: the further evolution Jul 12
- Autoscaling in Kubernetes Jul 12
- rktnetes brings rkt container engine to Kubernetes Jul 11
- Minikube: easily run Kubernetes locally Jul 11
- Five Days of Kubernetes 1.3 Jul 11
- Updates to Performance and Scalability in Kubernetes 1.3 -- 2,000 node 60,000 pod clusters Jul 7
- Kubernetes 1.3: Bridging Cloud Native and Enterprise Workloads Jul 6
- Container Design Patterns Jun 21
- The Illustrated Children's Guide to Kubernetes Jun 9
- Bringing End-to-End Kubernetes Testing to Azure (Part 1) Jun 6
- Hypernetes: Bringing Security and Multi-tenancy to Kubernetes May 24
- CoreOS Fest 2016: CoreOS and Kubernetes Community meet in Berlin (& San Francisco) May 3
- Introducing the Kubernetes OpenStack Special Interest Group Apr 22
- SIG-UI: the place for building awesome user interfaces for Kubernetes Apr 20
- SIG-ClusterOps: Promote operability and interoperability of Kubernetes clusters Apr 19
- SIG-Networking: Kubernetes Network Policy APIs Coming in 1.3 Apr 18
- How to deploy secure, auditable, and reproducible Kubernetes clusters on AWS Apr 15
- Container survey results - March 2016 Apr 8
- Adding Support for Kubernetes in Rancher Apr 8
- Configuration management with Containers Apr 4
- Using Deployment objects with Kubernetes 1.2 Apr 1
- Kubernetes 1.2 and simplifying advanced networking with Ingress Mar 31
- Using Spark and Zeppelin to process big data on Kubernetes 1.2 Mar 30
- Building highly available applications using Kubernetes new multi-zone clusters (a.k.a. 'Ubernetes Lite') Mar 29
- AppFormix: Helping Enterprises Operationalize Kubernetes Mar 29
- How container metadata changes your point of view Mar 28
- Five Days of Kubernetes 1.2 Mar 28
- 1000 nodes and beyond: updates to Kubernetes performance and scalability in 1.2 Mar 28
- Scaling neural network image classification using Kubernetes with TensorFlow Serving Mar 23
- Kubernetes 1.2: Even more performance upgrades, plus easier application deployment and management Mar 17
- Kubernetes in the Enterprise with Fujitsu’s Cloud Load Control Mar 11
- ElasticBox introduces ElasticKube to help manage Kubernetes within the enterprise Mar 11
- State of the Container World, February 2016 Mar 1
- Kubernetes Community Meeting Notes - 20160225 Mar 1
- KubeCon EU 2016: Kubernetes Community in London Feb 24
- Kubernetes Community Meeting Notes - 20160218 Feb 23
- Kubernetes Community Meeting Notes - 20160211 Feb 16
- ShareThis: Kubernetes In Production Feb 11
- Kubernetes Community Meeting Notes - 20160204 Feb 9
- Kubernetes Community Meeting Notes - 20160128 Feb 2
- State of the Container World, January 2016 Feb 1
- Kubernetes Community Meeting Notes - 20160121 Jan 28
- Kubernetes Community Meeting Notes - 20160114 Jan 28
- Why Kubernetes doesn’t use libnetwork Jan 14
- Simple leader election with Kubernetes and Docker Jan 11
- Creating a Raspberry Pi cluster running Kubernetes, the installation (Part 2) Dec 22
- Managing Kubernetes Pods, Services and Replication Controllers with Puppet Dec 17
- How Weave built a multi-deployment solution for Scope using Kubernetes Dec 12
- Creating a Raspberry Pi cluster running Kubernetes, the shopping list (Part 1) Nov 25
- Monitoring Kubernetes with Sysdig Nov 19
- One million requests per second: Dependable and dynamic distributed systems at scale Nov 11
- Kubernetes 1.1 Performance upgrades, improved tooling and a growing community Nov 9
- Kubernetes as Foundation for Cloud Native PaaS Nov 3
- Some things you didn’t know about kubectl Oct 28
- Kubernetes Performance Measurements and Roadmap Sep 10
- Using Kubernetes Namespaces to Manage Environments Aug 28
- Weekly Kubernetes Community Hangout Notes - July 31 2015 Aug 4
- The Growing Kubernetes Ecosystem Jul 24
- Weekly Kubernetes Community Hangout Notes - July 17 2015 Jul 23
- Strong, Simple SSL for Kubernetes Services Jul 14
- Weekly Kubernetes Community Hangout Notes - July 10 2015 Jul 13
- Announcing the First Kubernetes Enterprise Training Course Jul 8
- Kubernetes 1.0 Launch Event at OSCON Jul 2
- How did the Quake demo from DockerCon Work? Jul 2
- The Distributed System ToolKit: Patterns for Composite Containers Jun 29
- Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh Jun 26
- Cluster Level Logging with Kubernetes Jun 11
- Weekly Kubernetes Community Hangout Notes - May 22 2015 Jun 2
- Kubernetes on OpenStack May 19
- Weekly Kubernetes Community Hangout Notes - May 15 2015 May 18
- Docker and Kubernetes and AppC May 18
- Kubernetes Release: 0.17.0 May 15
- Resource Usage Monitoring in Kubernetes May 12
- Weekly Kubernetes Community Hangout Notes - May 1 2015 May 11
- Kubernetes Release: 0.16.0 May 11
- AppC Support for Kubernetes through RKT May 4
- Weekly Kubernetes Community Hangout Notes - April 24 2015 Apr 30
- Borg: The Predecessor to Kubernetes Apr 23
- Kubernetes and the Mesosphere DCOS Apr 22
- Weekly Kubernetes Community Hangout Notes - April 17 2015 Apr 17
- Kubernetes Release: 0.15.0 Apr 16
- Introducing Kubernetes API Version v1beta3 Apr 16
- Weekly Kubernetes Community Hangout Notes - April 10 2015 Apr 11
- Faster than a speeding Latte Apr 6
- Weekly Kubernetes Community Hangout Notes - April 3 2015 Apr 4
- Paricipate in a Kubernetes User Experience Study Mar 31
- Weekly Kubernetes Community Hangout Notes - March 27 2015 Mar 28
- Kubernetes Gathering Videos Mar 23
- Welcome to the Kubernetes Blog! Mar 20