What’s the best way to increase performance and reduce complexity in the next generation of cloud system software? Is it time for a new cloud OS?
The world of cloud system software is going through an evolution right now, with VMs being joined by containers and unikernels as packing and deployment mechanisms. On the data center management level, OpenStack is starting to achieve some deployment. Meanwhile, MesosSphere is developing Mesos into a so-called “Data Center Operating System” while Kubernetes and Docker’s brand new eponymously named “Datacenter” all offer “container management”. There’s lots of small startups in the container management space too. None of these efforts (with the possible exception of unikernels) are looking at fundamentally changing the underlying system software architecture. They’re all based on layering “management” software that runs as an application on top of the Linux-hypervisor stack.
While both Linux and hypervisors have evolved over the years, neither was designed with the current server hardware in mind. Today’s servers are distributed systems in miniature, with tens (soon to be hundreds) of cores connected to memory and i/o buses, sometimes through switches, and then to memory and NICs, with NICs whose performance, in the next generation, will be 100Gbs. Soon, nonvolatile memory like Intel’s 3D-Xpoint will require rethinking the relationship between memory and storage at the system software level too. Ericsson’s planned HDS 8000 series pooled NIC enhancement take this trend to the next level, removing networking from the rack and pooling NICs. Add to that the high performance, low latency data center network and hyperconverged storage and you have a situation where lots of complex abstractions are needed and, more importantly, are often exposed to the programmer. If infrastructure is defined in software then programmers will need to deal with programming infrastructure.
Containers were supposed to simplify cloud software development: just download Docker, develop your microservice using components from the Docker Hub, and with one command deploy into the cloud. But networking was never a strong point in containers and inserting virtual networks and all the other machinery exposed to the programmer in multitenant data centers is likely to lead to complication. While the new crop of container management frameworks will help, they are still an extra layer of overhead that the programmer must deal with, that they didn’t have to when deploying in an enterprise data center or on a single server.
Given all this activity, maybe now is a good time to ask the question: do we need a new cloud operating system (cloudOS)? The arguments for a new cloudOS are mainly based on performance, scalability, and ease of programming and deployment. On the performance and scalability side, Beley has shown that it is possible to achieve an over 10x improvement in connection scalability, as measured by number of messages per section as a function of the number of open connections, above what Linux provides with his IX server operating system. In fact, Figure 4 from this paper shows both the 10Gbps and 40Gbps NICs maxing out at around the same level, so in practice a server would not get any better connection scalability with a 40Gbps NIC than a 10Gbps one. These numbers are ominous given the next generation of 100 Gbps NICs is already on the horizon. Similar order of magnitude performance improvements are seen for multi-core scalability and connection churn.
The ease of programmability and deployment argument was supposed to be answered by containers, but that’s changed as containers have moved into multitenant data centers. The proliferation of container management frameworks suggests that container deployment will require a similar level of software defined infrastructure programming as VMs. Ideally, it would be nice to have a platform that did for distributed applications in multitenant datacenters what containers did for single server applications in enterprise applications. Belay’s IX work was focused replacing the hypervisor-guest OS or single-server Linux operating system with a more performant single-server operating system. An complementary approach is a distributed operating system, where applications don’t need to know whether they are deployed on a single server or across a distributed, multi-data center cloud.
Distributed operating systems were a hot research topic in academia 25 years ago, in the late 1980s and early 1990s. Perhaps best known is Tannenbaum’s Amoeba. Distributed operating systems feature distribution transparent abstractions, both at the system level and the application level. At the systems level, distributed virtual memory and process migration (similar to VM migration) hide distribution from the higher levels of the operating system software. At the application level, remote procedure calls and interprocess messaging hide distribution from applications. Distribution transparency only works well in situations where latency is low enough, which means that if the parts of the application happen to be distributed across multiple data centers and communication is in the 10s to 100s of milliseconds, the application programmer will probably want some control over the communication. And, of course, some application classes require the ability to control where the parts of the application are located such as NFV, where the parts of the application are constructing a location dependent network topology. Distributed operating systems replace virtual private clouds and tenant networks with the abstraction of a time-share account, in which each tenant sees the data center as one gigantic computer on which they have an account with compute, networking and storage quotas.
Schwarzkopf discusses a few reasons why distributed operating systems never caught on commercially in the 1990’s despite the enormous amount of academic research done on them. One reason was that the application mix of word processors and spreadsheets in the 1990’s wasn’t inherently distributed, unlike Hadoop, map-reduce, and other data center applications today. Another was that in the 1990’s, processor performance was scaling rapidly while networking performance was lagging. Processor performance scaling basically maxed out in the mid-2000s due to thermal concerns, but networking performance continues to scale as photonics penetrates deeper into the server. Finally, as discussed above the servers themselves are considerably more complex than the scaled up PCs of the 1990s.
Regardless of how the container management situation pans out, my feeling is that container management is unlikely to be the last word in cloud system software. The performance gap between what 40- and 100Gbps NICs require and what the Linux networking stack can deliver and support for the next generation nonvolatile memory technologies both indicate that Linux and the hypervisors are facing a tall order in the coming years. Looking at how to rearchitect them to better support these next generation technologies is a minimum requirement. Extending that work to the application programming environment, perhaps returning to the lessons of distributed OS work from the early 90’s to simply application deployment and reduce the need for programmers to program infrastructure, would be an additional plus. Just like microservices and containers has led to a movement toward cloud native applications, perhaps now is the time for looking at a cloud-native operating system.