With the size of centralized cloud deployments continuing to increase and with distributed and edge cloud deployments becoming a reality, how can cloud management platforms scale regardless of infrastructure size and shape? Can performance levels be maintained as scale increases? Centralized and hierarchical approaches have historically dominated but distributed approaches based on peer-to-peer (P2P) technologies are highly relevant to bring into the conversation.
With the increase in datacenter capacity and the development of distributed cloud deployment scenarios like edge computing, the scalability requirements on cloud management software have become more pronounced. Already today, in, for example, a single service provider environment, we might easily be talking about deployment scenarios with tens of thousands of servers in over 1000 locations, for example, central offices, radio base station and CPEs (Customer-Premises Equipment), among others).
With this in mind, cloud management software needs to be able to scale with the number of resources it manages and users it services while maintaining the needed performance level such as service deployment time and availability.
The scalability problem
Most commercially available cloud management software are centralized from a software architecture point of view, and this often restricts the number of resources that the software can manage while maintaining reasonable performance. Performance studies have shown that widely adopted cloud platforms, despite continuous efforts, continuously face scalability challenges. For example, according to VMware, VMware vCenter Server does not seem to be able to manage beyond 1000 compute servers. OpenStack and Kubernetes are also known to have faced challenges with regards to massive scalability.
The scalability limitations of such centralized architectures are clearly understood, and approaches to handle the scalability limitations are already in the works. For instance, Kubernetes is working on a solution that allows up to 100 clusters to be federated. In OpenStack, hierarchical approaches have been applied to achieve further scalability. This includes the concepts of Availability Zones, Regions and Cells, which may be used to increase scalability, depending on the specific deployment scenario.
The P2P approach
Though centralized and hierarchical approaches have been dominant, we believe that distributed approaches based on P2P technologies can make significant contributions towards building scalable systems. Figure 1 lists some of the fundamental pros and cons of hierarchical and P2P approaches. To this end, we have been exploring P2P model of cloud deployment for scaling of cloud management software.
Figure 1: Hierarchical vs P2P – some fundamental pros and cons
We started off with two design principles:
- Avoid centralized components as much possible
- Minimize or, if possible, avoid any changes to the existing software
With these principles in mind, first we divide up the cloud into several small but independent and self-sufficient deployments – we refer to these as cloudlets. We then associate an agent with each cloudlet that is capable of transparently giving access to the resources of the entire cloud through the use of smart P2P techniques. This results in a solution that abstracts multiple cloud management software instances as a single instance to the users of the cloud. Figure 2 depicts this scenario.
Figure 2: Cloudlets with associated agent providing access to remote cloudlets through P2P techniques
The solution suits very well both centralized cloud deployment scenarios where one needs to manage hundreds or thousands of servers within one geographical location, and distributed cloud deployment scenarios where the cloud deployment consists of large numbers of geographically distributed servers.
The OpenStack use case
Conceptually, the solution can be used to scale many of the cloud platforms on the market. As a proof of concept, we have applied it to the OpenStack cloud platform.
We performed tests for a system size of 64 cloudlets (1 cloudlet = 1 OpenStack deployment with 1 controller node and 1 compute node) while increasing the number of concurrent Virtual Machine (VM) boot requests. With this, we intended to understand the system performance while increasing the demand on the system. As a yardstick, we use a single standard OpenStack setup that has equivalent compute capacity as the P2P system.
Results show us that, at the performance limits of both systems, the P2P system can boot four times more VMs in about half the time needed for the standard system, while maintaining an error rate that is less than one-third of the standard.
Side note: We do know that 64 cloudlets is not an impressive number to show. This number is the result of the amount of resources that were available to us at the time of testing. We are working on more interesting numbers. Nevertheless, this number still proves the main point that the P2P approach is feasible, features very good scalability and that it outperforms the standard platform on several fronts.
As cloud continues to expand in shape and size, new ways to manage the cloud need to be explored. We believe distributed approaches and namely P2P technologies have a lot to contribute to how current and future cloud setups are managed in a scalable and robust manner.
Further details behind this work can be found in the paper “Re-designing Cloud Platforms for Massive Scale using a P2P Architecture”
And of course, feel free to reach out to us if you want to know more – either here in the comments field or in the Ericsson Research LinkedIn group