Disruptive telecom concepts like NFV promise agility and high velocity of service innovation and deployment. These requirements call for a novel management paradigm beyond traditional telco workflows and processes. Together with our partners in the EU-FP7-funded UNIFY project, we tackle these challenges by taking inspiration from DevOps ideas widely applied at scale in data centers. Discuss with us our recently released SP-DevOps Toolkit, and how it helps showcase ideas that could usher a next generation of tools for telecom services in software-defined infrastructures.
A few weeks ago, the UNIFY project announced the first release of the SP-DevOps Toolkit at the Network Function Virtualization Research Group (NFVRG) meeting held at IETF94 in Yokohama. For us at ER Cloud Technology, this is quite significant, since it provides a first shareable result from the workpackage we coordinated from the inception of the project.
What is it all about?
The SP-DevOps Toolkit is an assembly of tools and support functions which enable developers and operators to efficiently observe, verify and troubleshoot software-defined infrastructures. The release of open-source tools will allow the community at large to experiment with solutions developed in the project. Specifically, this first release includes the following components developed by partners in the UNIFY project, all addressing various challenges identified in the project: DoubleDecker, a very flexible and scalable communication services for virtual monitoring functions; EPOXIDE, a framework for (semi) automated troubleshooting tasks; RateMon, a scalable distributed congestion detector; and AutoTPG, a data plane verification tool for SDN/OpenFlow.
Now, what does this have to do with DevOps?
To explain how this collection of tools will ease actual pain points of operators, allow me to go back and start with our original motivation for this work. Considering operational requirements for upcoming telecom concepts like NFV (powered by SDN and Cloud virtualization), we realized that current management frameworks (e.g. eTOM and ITIL) will not suffice. In addition to traditional carrier-grade requirements posed on telecommunication networks (high-availability and fast recovery, scalability, quality of service), the value proposition of software defined infrastructures promises higher service dynamicity (e.g., scaling or migration of VNFs) and higher service velocity (e.g., increased deployment frequencies and faster Time-to-Market). To reach these additional requirements, novel fulfillment and assurance mechanism are required. Besides a common orchestration framework for cloud and carrier network resources, it will be inevitable to resort to programmable components and integrated quality assurance steps, making it possible to minimize failure rates and maximize predictability and efficiency of automated operational processes.
Since many web companies are successfully tackling comparable requirements for their software development within data centers, we took IT DevOps methods as inspiration for our work. Now, unfortunately, “DevOps” cannot be broken down into one single method that can simply be applied in a new setting. We interpret DevOps rather as a set of guiding principles, which need to be tailored to service provider environments. We argue that telecommunication resources, compared to data centers, exhibit higher degrees of distribution, i.e. they spread over wide geographical areas. This also leads to lower levels of redundancy, making it harder to meet carrier-grade requirements such as high availability or strict latency. Furthermore, telecom resources range over multiple network and data center domains, and are typically consisting of highly heterogeneous hardware and software. Finally, DevOps involves changed ways of working by bringing developing, operating and quality assurance teams together. Here IT players differ significantly from service provides, which are likely to have business boundaries between these different teams (network functions are developed by vendors, parts of the network might be operated by sub-contractors, etc.) and thus are required to find more indirect ways to bring these “silos” closer to each other.
So, what is the resulting SP-DevOps concept?
In our activities carried out in UNIFY, we focus on technical processes shared by different roles during the service lifecycle in a virtualized telecom network: Observability, Troubleshooting, and Verification. As SP-DevOps roles, we identified the Service Developer who assembles the service graph for a particular category of services similarly to the more classical operator role, the VNF Developer who programs virtual network functions and is associated to the classical equipment vendor role, and the Operator, whose role is to ensure that a set of performance indicators associated to a service are met when the service is deployed on a virtual infrastructure within the domain of a telecom provider.
And what is the role of the SP-DevOps Toolkit?
We have compared various data center DevOps tools (from Chef and Puppet to Splunk and Elastic Search) with the main challenges we identified, and realized that there are shortcomings. Some tools are good at automating configuration tasks that could be performed from a Command Line Interface, but integrating other tools that require interaction in a terminal is not possible. Others are good at moving around and analyzing large amounts of data, but we found out that not all that data needs to be transported and stored all the time. We also found out that when it comes to OpenFlow-compatible software-defined networks, certain types of functionality, related in particular to verification of flow rules and network functions, is still lacking from the both academia and industry.
In the tools released in with the SP-DevOps Toolkit, UNIFY project partners approach all the above issues to give a glimpse on how next-generation DevOps tools could improve the life of developers and operators:
- DoubleDecker is a support function that provides a very flexible and scalable communication services for virtual monitoring functions. It allows for observability data to be disseminated to network controllers, orchestrators, network management systems and, for the first time, directly to virtual network functions that could make use of it. It also allows for defining aggregation and filtering rules and isolates between tenants such that data put on the bus by one tenant cannot be subscribed for by the neighbors
- EPOXIDE enables integrating and automating troubleshooting tasks by troubleshooting graphs. It enables the developer to define sequences of tools to be used for debugging particular problems with the network or simple virtual network functions, while having the results presented in the same terminal window
- RateMon is a scalable distributed congestion detector for software-defined networks. It shows how short-term congestion could be predicted at data center scales with just a few Mb of traffic, access to switch port counters and clever software that can notify other network functions through DoubleDecker
- AutoTPG is a verification tool that actively probes the FlowMatch part of OpenFlow rules. It takes care of the tedious work of identifying bugs in the forwarding path when aggregate flow descriptors are used by automatically generating the test cases adapted to the rules to verify, executing the tests and reporting the results
Ok! So what’s next?
In addition to the tools of the first SP-DevOps Toolkit release, we are developing further tools and functions, targeting more challenges identified. Some of these tools are developed by ER Cloud Technologies and will be available internally to Ericsson only, others are developed by ER CT together with UNIFY partners and will be part of a second SP-DevOps toolkit release in April 2016. Some notable examples are listed below, and also depicted in the figure above:
- Recursive Query Engine, a support function that interprets a specifically defined query language to efficiently collect and aggregate monitoring information from distributed software-defined infrastructure
- MEASURE, a generic syntax to define abstract monitoring requests (intents) and translate them into concrete configurations and placements of monitoring functions.
- EPLE, an efficient packet loss estimator for SDN, realized using OpenFlow
- Service verification, a set of modules integrated into deployment and orchestration processes, enabling formal verification of service graph definitions and configurations
As you might note, each tool in isolation solves only one or a few of the specific problems and challenges in the vast space of operations and management. However, taken together, and integrated into automated processes, they will form viable technical solutions for service developers and operators, putting them at least one step closer to actually harvesting the service agility, dynamicity and velocity promised by software defined infrastructure.
We are currently defining and implementing concrete SP-DevOps processes for verification, observability and troubleshooting tasks, each utilizing combinations of the tools discussed in this blog (see also the architecture figure above). These SP-DevOps processes will be showcased at the end of the UNIFY project in April 2016, so stay tuned to see SP-DevOps getting real!
Wolfgang John and the UNIFY SP-DevOps team at Ericsson Research and spread all over Europe