Paving the way to telco-grade PaaS

Most of the challenges that need to be overcome to make PaaS suitable for telco relate to the need for additional features that telco applications typically require in terms of latency, security, and availability. PaaS provides the building blocks for automated testing, continuous deployment, as well as supporting the DevOps approach, and as such simplifies the development process and reduces time to market.

Magazine

#ericssontechnologyreview

2016-05-26

Authors: Edvard Drake, Ibtissam El Khayat, Raphaël Quinet, Einar Wennmyr, and Jacky Wu

Download PDF

IaaS – infrastructure as a service
MMTEL – multimedia telephony
PaaS – platform as a service
SaaS – software as a service
SCTP – Stream Control Transmission Protocol
UDP – User Datagram Protocol
VNF – Virtualized Network Function

Independent of business, ways of working, or even technology adoption, the pressure on modern industries to shorten time to market through rapid development cycles is constant. The concepts of platform as a service (PaaS) and microservices – which have been gaining traction in the IT world – are deeply rooted in this need to cut development times. And the benefits are equally important in the telco domain. But there are gaps that need to be closed before PaaS is suitable for telco. Most of the challenges relate to the need for additional features that telco applications typically require. Once PaaS is telco approved, new applications will need to follow a number of design patterns, so that the full advantages of the platform-as-a-service approach can be realized.

PaaS is a cloud service model that allows developers to build, run, and manage applications in a way that best suits their business needs, and most significantly, in a way that is independent of the underlying hardware or software infrastructure. Typically, PaaS enables developers to deploy code on top of a software stack that includes a runtime environment for one or several programming languages, an operating system, and basic services to build upon. PaaS provides the building blocks for automated testing, continuous deployment, as well as supporting the DevOps approach, and as such simplifies the development process and reduces time to market. In the stack of cloud service models, shown in Figure 1, PaaS fits in between software as a service (SaaS) (which targets users with licensed software offerings) and infrastructure as a service (IaaS) (which addresses the management and sharing of hardware resources).

Figure 1: Cloud service models (from the point of view of the service consumer)

PaaS works with various cloud models: public, private, or hybrid. The hybrid model can, for example, be used by enterprises and telecom service providers to optimally combine the different handling needs of sensitive and non-sensitive workloads, where the common management interface enables some to be deployed on a private cloud and others on a public cloud – as shown in Figure 2. Latency-sensitive workloads, for example, or tasks that require security or control for proprietary data can be deployed on premises in a private cloud, while non-sensitive workloads can be deployed in a public cloud, maximizing agility and optimizing costs.

Figure 2: Deploying PaaS workloads in different types of clouds

Depending on the level of automation and integration provided, PaaS solutions can be further divided into two categories: structured and unstructured. Unstructured platforms leverage basic container technologies or public PaaS offerings and are usually managed or monitored with homegrown tools. Technology-centric companies tend to favor such unstructured platforms, as they facilitate development and maintenance of solutions customized to meet business needs.

Structured platforms, on the other hand, come with built-in features such as orchestration, monitoring, governance, load balancing, and high availability. These characteristics make structured platforms suitable for enterprises or telecom service providers, and are the reason behind Ericsson’s focus on structured PaaS.

The benefits brought by PaaS

What benefits PaaS can offer vary from business to business and from one application to the next, depending on whether it has been specifically designed for PaaS or whether it simply runs in a PaaS environment. The PaaS approach is well suited to application developers and vendors, but it can also be of great value to other users such as system integrators and service operators. Some of the concepts used in PaaS, such as multiple application instances and component-based architecture, are established approaches in the telco domain. To keep the complexity of components at a manageable level, the telco domain has a long-standing tradition of modular design. However, designing applications specifically for PaaS increases the number of benefits for the different user groups.

Benefits for application developers

PaaS enables developers to focus on the business logic of their applications, as it frees them from the concerns associated with setting up the necessary foundation for deployment, testing, adaptation, and rollout. In doing so, PaaS enables innovation acceleration and rapid time to market.

Applications designed to run in a PaaS environment are likely to be less complex and consume less resources than their traditionally-programmed counterparts, as they do not need to re-implement the services that are provided by the platform. As a result, a PaaS application takes less time to start up than applications deployed on a full software stack. The simplified nature of PaaS applications brings benefits in terms of scalability, especially for those that are stateless. Designing an application for PaaS with loosely-coupled internal and external interfaces makes it easier to manage life cycles for the components of an application and for the services they use in an independent manner. Deploying components that are loosely coupled not only simplifies an upgrade, it also reduces the complexity of validating an upgrade. Combined with the freedom to choose the programming language and runtime environment best suited to the task at hand, loose-coupling enables components to be replaced at any time with a different implementation – even in a different language – which in turn supports the gradual introduction of new technologies. The PaaS framework provides common ways to expose and bind to services, which simplifies the deployment of new services. Service gateways and brokers can also expose external services, so they can be used by applications running inside or outside the PaaS environment. The ease of integration of new services brought about by PaaS contributes to faster innovation, which is one of the model’s primary benefits.

Benefits for system integrators

Some of the benefits that apply to developers also apply to system integrators. Loosely-coupled services and independent life cycles, for example, can simplify the testing and upgrade of components, as these tasks can be carried out separately. And the common binding and service exposure framework facilitates the integration of new services.

Benefits for service operators

A PaaS-designed application can scale quickly and easily with flexible workload deployment, which leads to optimal use of hardware resources. However, care should be taken when dealing with applications designed with large numbers of lightweight components that need to communicate with each other, to ensure that workload deployments do not negatively impact performance.

In general, security assurance and governance both benefit when applications run on a common framework that provides collective application management and supports intra-service communication. For example, the platform approach removes the need to manage masses of ad hoc security solutions and the rules governing how they apply to applications.

How do microservices contribute?

The software industry is currently experiencing a rise in the use of microservices and microservices architecture. And while PaaS and microservices are two separate concepts, viewing PaaS in combination with microservices and other concepts like containers and DevOps, can substantially increase the leverage of each of them.

Microservices is an architectural pattern and an approach to development. Essentially, this approach builds applications from (or deconstructs existing applications into) small parts – each with a single and well-defined purpose. To communicate, the parts (or microservices) use language- and technology-agnostic network protocols, and each part can be developed, maintained, deployed, executed, upgraded, and scaled independently. Technology choices are specific to the microservice and each microservice should be owned by a small team of developers to minimize the overhead of intra-team communication.

Overall, the ability to develop parts in an independent way enables rapid progress, allowing development to keep pace with market demands, and facilitates scaling of development. Decoupling and independency between microservices is fundamental to a microservices architecture. Independence supports scaling over multiple teams because it enables many small teams to work in parallel, with clear responsibilities, a large degree of freedom, and minimal interaction. Decoupling also enables the different parts of the system to evolve at their own pace. Avoiding dependencies enables technology choices to be made on a per-microservice basis. As new technologies become available, they can be implemented appropriately without the need for a synchronized cross-microservice upgrade. As a result, each microservice can evolve at the right pace in a way that is most appropriate for a particular service: an efficient system that lends itself to the creation of ever-improving services.

While the advantages of a microservices architecture are apparent, in practice, this approach poses a number of significant challenges. To start with, the well-known fallacies of distributed computing [1] should be avoided. To perform a given task, a number of microservices are invoked sequentially, each of which contribute significantly to overall latency, making it more difficult to predict the the overall latency of a service. So, assuming, for example, that bandwidth is infinite, or that latency is zero can result in costly redesign work. Challenges include the overall complexity, both in development and in runtime, of a large, highly distributed systems. The ability to test a system is equally challenging, particularly when it comes to additional complex failure scenarios.

One way – and maybe the only way – to overcome the challenges surrounding latency is to accept that some parts of the system need to be designed with miniservices, which are larger than microservices to allow for minimizing sequential communication, and consequently keep latency within required boundaries. When this approach is applied to appropriate parts of an application, the scope of what can be addressed with a microservices architecture can be broadened. And even if the application of miniservices can compromise some targets, the overall advantages gained tend to be fairly considerable.

Telco-grade and PaaS support

Existing PaaS environments have been designed to suit commodity IT applications, and as such, have not been built to meet the availability and latency requirements that are characteristic of telco applications. But as new types of IT applications emerge, such as IoT for vehicles, placing stringent latency demands on communication links, telco-like requirements are beginning to appear in the IT space. Enabling PaaS environments to meet telco requirements will benefit emerging IT applications, telco applications, as well as traditional IT applications – as long as the cost of providing common IT applications does not rise. In other words, adding a telco requirement, like five-nines availability, to the PaaS environment needs to be done in such a way that it doesn’t impact performance or increase the setup complexity for other applications. What then are typical telco requirements? Without constituting a formal definition, telco applications tend to be characterized by latency and availability (TL 9000 for example).

When it comes to latency, a requirement labelled soft real-time (outside of TL 9000) has been set for telco applications. This requirement is usually set to ensure that 95 percent of operations can be handled within a predefined latency budget – which for IMS is typically around 50ms per node in the traffic flow.

When it comes to availability, telco applications require five-nines availability. For PaaS environments, this level of availability implies that applications must be available 99.999 percent of the time, which corresponds to a maximum of five minutes of downtime in a year, including time for upgrades and other administrative operations.

When it comes to measuring availability, an application is deemed partly unavailable if it is out of service for more than 15s, with a loss of capacity in excess of 20 percent. If O&M functionality is lost, the loss is counted as 10 percent downtime for the whole node, even if the traffic part of the node is working.

To meet availability requirements, applications must be able to handle internal software and hardware faults without crashing. This is usually referred to as high availability support. A number of other factors also affect an application’s availability, helping to determine whether or not it will attain the five-nines goal. Today’s PaaS environments provide the necessary support in some cases, whereas development is still required in others.

To make PaaS telco grade, meeting availability requirements is not the only gap that needs to be closed; a number of others including logging support and security also need to be addressed:

Automation of management operations

System outages are often caused by human error during manual operation. Automating procedures minimizes this type of error, leading to improved availability. Many management operations in modern PaaS environments are already being automated, but further improvements are possible – such as single-command installation of an application that consists of a number of microservices.

Monitoring support

Through continuous monitoring of applications and the infrastructure, potential problems can be detected and rectified before they cause an outage. Today’s PaaS environments include some basic monitoring support (typically checking whether or not an application is executing) but more advanced mechanisms are needed, especially for a microservice architecture, to handle scenarios where applications are in deadlock, or to cope with an internal crash, for example.

Logging support

When a fault does occur, the outage time until normal operations can be regained has a significant impact on overall availability. Access to information about the error is the key to rapid resolution. A presentation of the logs from a PaaS application – including services used – is needed to facilitate proper fault analysis. Existing PaaS solutions do not yet support such a feature.

Security

The use of role-based access to restrict actions on applications and the infrastructure reduces outages caused by a lack of the right kind of competence. This type of security measure also protects applications from malicious access, which reduces the risk of downtime. Current PaaS environments provide a degree of support in terms of access control, but additional backing is needed in the form of data encryption solutions, protocols that guarantee separation of traffic, and certificate handling, for example.

Tenant isolation

Strong support for tenant isolation covering all aspects of the infrastructure (compute, network, and storage) minimizes the impact a fault in one tenant application has on other tenants sharing the same physical resources. Current PaaS environments have sufficient support for tenant isolation.

Upgrade support

To ensure that capacity losses are kept below the 20 percent limit, application and infrastructure upgrades need to be carried out without affecting operation. For stateless applications, upgrades to individual instances of the application can take place without causing a disturbance. It must also be possible to upgrade the PaaS environment itself without affecting running applications.

Backup and restore

Saving the state of an application and corresponding software version enables an application to be restored quickly following a major failure, minimizing downtime.

Quick restart time

The ability to restart quickly reduces the outage time following a major problem. This capability is currently supported for stateless PaaS applications.

Independent restart

To maintain independence of infrastructure and application availability, restarting a PaaS environment in the event of an internal fault, upgrade, or other administrative process should not necessitate the restart of applications.

Network protocol support

Telco applications require support for a wider set of network protocols than typical IT applications. For example, UDP and SCTP are typical of the type of protocols that telco applications rely on, which are not found in traditional IT environments. Adding support for such protocols will expand the set of applications that can be deployed.

Alarms

Longer outages can be avoided by issuing an alarm indicating both that a problem has arisen and how to address it. This information needs to be collected in a consistent way from the application and the PaaS services being used.

Performance counters

A system that supports a rich set of performance counters preempts problems before they arise, and takes proactive measures to avoid outages. This information needs to be collected in a consistent way from the application and the PaaS services being used.

Trace support

If a fault occurs during operation, troubleshooting needs to be fast to minimize outage time. The trace output of applications is needed to facilitate proper troubleshooting. This information needs to be collected in a consistent way from the application and the PaaS services being used.

Soft real-time

The networking solution in the PaaS environment must be highly efficient with short round-trip times to fulfill the latency requirements of telco applications. Closing this gap is a significant improvement as doing so will increase the possibility of deploying a microservices architecture (one of the biggest concerns over such an architecture is the latency induced by multiple network hops).

Achieving telco on PaaS

This list of wanted features highlights the gaps that need to be closed to make PaaS suitable for all telco applications. But once these features have been added, what opportunities are likely to arise?

Building a new PaaS application

When writing a new application based on microservices architecture to run on PaaS, new design principles are needed to maximize the benefits. Some of these principles are borrowed from the twelve-factor app [2]. Among these principles, telco microservices need to:

encapsulate a well-bound and well-defined function with a clear business need
have separate life cycles – so they can be developed, delivered, installed, and upgraded independently
be stateless – by storing data that needs to persist in a stateful backing service
adhere to decoupled communication – using well-defined interfaces, network protocols, and asynchronous messages
adhere to the share-nothing principle – data is not shared between the different instances of a microservice, enabling microservices to be scaled out by adding more processes
be built for failure – to manage service failures or underperformance by supervising responses and throttling the requests, for instance.
discover peers dynamically – by using the platform service discovery function to discover other microservices during runtime
be technology agnostic – so that the choice of technology adopted for one microservice does not affect the choice of another
fail fast: instead of implementing complex recovery mechanisms when severe errors occur, mini/microservice instances should fail with a clear error message stating the reason for the failure
achieve high availability – by running in multiple instances, allowing one instance to take over if another one fails, and reading up state from the external store.

These principles should be complemented by state-of-the-art microservices design patterns [2].

Extending mature applications

The issue of handling legacy applications inevitably arises during discussions over the design of new applications. When evaluating the portability of mature applications to PaaS, a number of considerations need to be taken into account. Generally speaking, rewriting applications is not the best approach because it tends to be costly/time-consuming, it blocks the addition of new features, and opens the door for competitors to gain market share in the meantime. If the main reason for the adoption of PaaS is to gain development speed, redesign will not generate any market value if no new features need to be developed.

There are certain opportunities that PaaS adoption can take advantage of without the need for application redesign. If a mature application requires a new feature or if a market adaptation is called for, the feature could be implemented as a microservice on PaaS and accessed from the legacy application. This is indeed the recommended approach for shifting monolithic applications to a microservices architecture: to break out individual services one by one. When this approach can be taken, the best of both the IaaS and PaaS worlds can be captured: the performance of the existing application remains stable, while new features benefit from the rapid development PaaS offers, freeing up the design team to develop new features with additional business value – as illustrated in Figure 3.

Figure 3: Extension of telco applications in PaaS

Naturally, communication needs to be enabled between the main application and the new microservices. Some telco applications already have predefined interfaces that can be used to extend the node functionalities, such as the Parlay interface for the MMTEL application server. However, when no such interface exists, the main application and the microservice should be decoupled using an anticorruption layer. All patterns used for migrating from monolithic to microservices such as facades, adapters, and translators can be applied in this process [3]. Such new satellite microservices need to be managed through the application to avoid fragmentation – which would incur additional costs.

The proposed approach brings all the benefits of PaaS and microservices without the transformation cost. It also has the advantage of gradually allowing teams to experience new technologies and new ways of working. In case of non-adoption owing to a performance degradation or a mismatch with the way of working in the organization, reverting to the original development method would have little impact. Going to market with such an approach would also create the opportunity for early feedback from the customer.

The next steps

Telco applications deployed in PaaS – in particular those specifically designed for PaaS environments – can benefit from shorter development, deployment, and testing cycles than the traditional software stack. Telco PaaS is characterized by short time to market and rapid innovation, enabling developers to focus on the business logic of their applications and building lightweight software modules that maximize reuse of platform features and available services. Application scalability and the introduction of new technologies are both facilitated by the independent life cycles of application components and the services they use. PaaS increases the portability of the applications to several IaaS solutions, and thus helps reduce the number of cloud execution platforms that need to be supported.

Improving PaaS solutions to support the additional requirements of telco applications can bring benefits to users outside the telco domain, making PaaS more suitable for other domains that have stringent requirements in terms of, say, latency and security – such as in the banking or medical sectors.

However, as outlined, a number of gaps and challenges need to be overcome before the combination of PaaS and microservices can reveal its full potential. Issues like the maturity of existing PaaS platforms, the possibility to perform troubleshooting in a highly distributed system, testing that involves a large number of independent services, the ability to predict latency (or maintain it within set limits), management support, and closing the gaps relating to telco-grade applications are just some of the challenges that need to be overcome.

Designing a microservice-based application for PaaS requires applications to follow a number of recommended design patterns to achieve a robust, functional application. PaaS can be leveraged without a full redesign and rewrite of the current telco applications, extending an existing application by deploying new features as extensions in PaaS, while the rest of the application remains outside.

The potential benefits of a mini- or microservice-based architecture deployed in a PaaS environment are significant. As such, Ericsson will follow the steps along the path to telco-grade PaaS.

References

The fallacies of distributed computing.
Sam Newman, 2015, Building Microservices (O'Reilly).
Matt Stine, 2015, Migrating to Cloud-Native Application Architectures (O'Reilly).

Edvard Drake

is an expert in the area of hardware and software platform technologies and an OSS/BSS Implementation Architect in Business Unit Support Solutions (BUSS). He has more than 20 years of experience at Ericsson, ranging from AXE-10 exchanges to open source and commercial innovation. He holds a B.Sc. in software engineering from Umeå University, Sweden.

Ibtissam El Khayat

joined Ericsson in 2008 after having been a researcher in academia and consultant in the telecom industry. Over the years, she has worked in different areas such as communication protocol design, utility and transport areas, and eMBMS. Currently, she works with 5G and cloud technology in her role at BUCI DUNC S&T. She holds a Ph.D. in computer science from University of Liège, Belgium.

Raphaël Quinet

is a master systems designer at Development Unit Network Functions & Cloud, Systems & Technology. He has more than 20 years of experience at Ericsson, starting in Research, optimizing the performance of web traffic over mobile networks, then service-oriented architecture and since 2010 cloud management, virtualization and containers for telco services. He holds a degree in electrical engineering from the University of Liège, Belgium.

Einar Wennmyr

is an expert in implementation architecture and is Chief Architect for TEA Implementation Architecture at Group Function Technology (GFT). He has about 35 years of experience at Ericsson, ranging from AXE-10, AXE-N, TSP Dicos, CBA, ETOS and lately with cloud technology and the impacts it has on software architecture. He graduated from Chalmers University of Technology and also holds an M.Sc. from the University of Southern California.

Jacky Wu

is a senior specialist at Development Unit Network Functions & Cloud, Systems & Technology. He has around 18 years of experience at Ericsson, ranging from Mobile Softswitch, CBA, cloud technology and the impacts it has on telco products. He holds a degree in electronic engineering from Shanghai Jiao Tong University, China.