How to overcome the challenge with probing in cloud native 5G Core
They say that data is the new oil, and gaining access to actionable data is becoming increasingly important for service providers in managing their operations and it’s key to managing and troubleshooting the network more efficiently. It’s also the foundation for creating customer insights for marketing and sales, automation and the journey towards zero-touch networks. This is the story of how software (SW) probes were born in Ericsson’s dual-mode 5G Core.
We have provided data for data consumers for a long time
In Ericsson packet core products, we’ve been providing quality data connected to subscriber behavior for a long time. We started already in 2008 developing event data which is meta data to describe details about a specific mobility, session or payload session on a subscriber granularity level.
We promoted events in the feature Event Based Monitoring (EBM) and we also developed interface captures with the feature Integrated Traffic Capture (ITC). These were static captures (with file extension .pcap) as troubleshooting source data and we even developed a tool, Core Network Operations Manager (CNOM) to steer and monitor those features.
The whole area has been a success for packet core and so other core products have joined over the years to develop event data and join the CNOM family for efficient troubleshooting. So we have all data available, but the probing business went elsewhere. I think one major reason for why service providers are contracting probe vendors is because they have a complete end-to-end system with good support for a multi-vendor environment.
Probing is increasingly becoming a burden
As a Strategic Product Manager responsible for the packet core operations and management area, a year or so ago, I started to hear stories from some of our pioneering 5G customers that their probe cost was increasing heavily in existing 4G network. It was mainly footprint related because of increased traffic in the networks, but also the management of the equipment and encrypted interface traffic.
With networks moving into cloud and introducing resource orchestration and automation, it was unsustainable to integrate tapping and probe equipment that were not adopted to cloud orchestration with instantiation, scaling and load sharing of VNFs. With a more dynamic network it was becoming difficult to auto detect interfaces and to handle load balance and scaling of the probes.
A new approach to probing
This spurred an idea within our team: why not investigate if we could fulfill the customer need in this area with a different approach. We knew we had all the data available, so why not provide it tailored for easy integration, so service providers could get rid of interface tapping and probing equipment, but keep the assurance system that they’ve put effort into integrating and training their personnel with. We did a quick study, mainly looking at the technical aspects of combining subscriber data from events and 24/7 streamed interface raw packet data and our ability to provide it to any data consumer of service provider choice.
The conclusion was that it was feasible, and we felt that the stars were aligning. There was a general problem area because of the high increase of signaling and payload boost coming from LTE and smartphones. We also knew that 5G was expected to bring ten times more payload compared with LTE, as well as increased signaling brought by the service based architecture (SBA) concept. In addition, the new 5G architecture, with a lot of focus on security enhancements and encrypted interfaces, and cloud native products poses completely new challenges to capture data without breaching security.
Testing our approach
A move into the probing business would be something new. Based on our new ideas on how to resolve current and upcoming challenges, we approached some of our customers to investigate their interest in discussing our ideas. The reaction was positive and we started up a Proof of Concept (POC) together with a large service provider in North America and their main provider of probes. We simply modified our MME node to provide a constant data stream of interface raw data from some interfaces and pushed that towards the third party probe vendor correlator in the network. The streamed data was captured successfully and fed into real-time monitoring. The customer was very happy about the result and the cooperation with the third party probe vendor was a good exercise to evaluate what was needed from our solution. We learnt that probe vendors realize the challenge of tapping 5G networks and are eager to cooperate with vendors of core.
Verifying our approach
We now felt we had enough proof points to start a deeper study into all the relevant areas, ranging from business cases, to solution technology, security and product considerations. We could see a business case and considering the product it was good timing to review data collection since the whole Ericsson Telecom Core area is in transition, going from virtual products (VM based) into container based products. Kubernetes itself changes how product internal networking is designed, which impacts the interface tapping. As a result, we had plenty of study items. The whole probing area later became a real challenge that needed to be solved from an industry perspective.
During the study we constantly kept in contact with the potential consumers of our SW probe data. This was enabled by cooperation with some selected customers and their current assurance systems partners. These involved Ericsson’s own system Telecom Analytics, but also several of the larger players in this area the 3P probe vendors.
We learnt how the different technology worked in a network and what the weak points were in the current solutions. We soon realized that there were some challenges that existing probe vendors faced. They openly proposed that their tapping agents could be located as sidecars to business logic pods, which is a twin pod to an existing business logic pod that shares memory and process capabilities in a Kubernetes cluster network. The main reason was that they had realized that the SBA architecture (signaling between network functions, for example, AMF towards SMF) was transport layer security (TLS) encrypted, meaning it was not possible to tap the data and extract readable signaling messages without access to the decrypted messages in the pod itself.
Everyone we talked to came with the same solution: “If we can be a sidecar agent to your pods, we will take care of everything as before.” The security improvements are fundamental in 5G and we started to look into what it would mean to have a third party agent running in our business logic cluster and what it would mean for lifecycle management, robustness and security. We simply couldn’t see a workable solution allowing that approach. What if the agent crashed, or consumed too much memory? It would impact the core node stability. And if we would approve to integrate agents we would need to integrate and test over 20 agents in our systems. This was our first important finding and we decided and made it clear for everyone that we wanted to cooperate around the data consumer interface allowing third party to be data consumers, but not agents within our products.
Developing the solution
From this point we started to define the feature SW probe and created a simple architecture with three blocks: first, the data source functions located in the business logic pods, second, a central probe controller located in each cloud native product and third, a processing function to host brokering features with the interface towards the consumers. We also created new names for the data sources to switch focus from pure troubleshooting to fulfilling the need of a complete assurance and monitoring system.
- Evolution of EBM feature we now name Event reporting
- Evolution of ITC we now called vTap
- Data collector/aggregator function
- Configuration handler
- Filtering, masking, encryption, data correlation
- Interface (multiple data streams)
Since it’s so challenging to do efficient tapping of interface data in a 5G network running in a cloud native deployment, we realized the network function vendor has a large responsibility in providing data to the assurance and monitoring systems, since no one else can do this in a secure and cost efficient way, considering aspects like footprint, elasticity and bandwidth savings. The solution must meet the market need from start and support a smooth integration.
Bringing the solution to market
This was a little bit of the story around how we came up with the idea of Ericsson SW probes, which we are now launching as part of value packages for the different products in Ericsson dual-mode 5G Core. We believe this is the way forward to ensure secure and efficient access to actionable data in a cloud native 5G/4G Core with multiple benefits. Just to mention one, our guidance for improvements of Total Cost of Ownership (TCO) is reductions of up to 60 percent in Capex and up to 90 percent in OPEX, compared to current solutions.
Interested to learn more about Ericsson SW probes? Download our ”Securing the 5G experience with Software probes” paper.Download paper
Learn more about software probes