10 min read

MAY 22, 2025

Download PDF

Authors

H. Bergström, J. Sachs, L. Boström, R. Wang, S. Sorrentino, J. Vikberg

Dependable networks: from best-effort to guaranteed performance

Growing reliance on mobile networks in the society-, mission- and business-critical segments – which each include a range of different use cases – are generating a variety of new requirements on connectivity, particularly with respect to time criticality and resilience to failures.

CTO Erik Ekudden on network dependablity:

The CTO´s view

After several decades of global mobile network deployments, most of our planet is now covered by mobile network services both outdoors and indoors. At the same time, the capabilities of mobile networks have seen an unparalleled development; over the last 30 years the achievable data rate of mobile devices has increased by a factor of 1 million. Approximately every 10 years a new generation of mobile network technology has been developed and standardized, sparking a fresh cycle of innovation.

Over time, the types of mobile connected devices have also diversified significantly. Mobile devices today range from smartphones to connected sensors, vehicles, remoteoperated machines and devices carried by first responders. Users expect them to work anywhere and anytime, putting increasingly stringent demands on connectivity.

A wide range of applications run over mobile networks, including immersive multi-modal interactions, situational assessment of critical situations, and the monitoring and control of critical infrastructure. The need for dependable connectivity services will only grow as society and enterprises adopt cyber-physical interactions [1] and critical applications ranging from public safety and rail to defense and utilities increasingly rely on mobile network support.

Dependable networks will also support the delivery of new monetizable services such as dependable differentiated connectivity.

Definition of a dependable network

A dependable network has the ability to provide service-delivery guarantees ensuring that it will perform as and when required [2]. More specifically, according to DETERMINISTIC6G, a dependable network shall quantitatively ascertain that it will deliver the required service performance for the connectivity services according to the key performance indicator(s) that were agreed [3].

Dependable network characteristics

Dependable networks provide quantifiably dependable connectivity services for critical applications. Critical applications are typically distributed and networked applications that require a minimum connectivity service performance level for the applications to fulfill their tasks. A critical application is characterized by having one or more critical key performance indicators (KPIs) – that is, performance metrics for which best-effort service delivery by the network is insufficient. If the minimum connectivity service performance (critical KPI) is not met, the application may perceive this as a network failure and may fail in its operation, which could lead to the user experiencing potentially severe financial or reputational loss.

Many critical services are also time-critical, which means that the consistent and timely delivery of a message from the sender to the receiver within a delay bound is required in order for the application to perform. Control applications where delayed control commands or sensor reports lead to control-loop instability or to the control application failing to achieve its critical control task are good examples of timecritical applications. Such failures can, for example, cause a remote-controlled machine to not perform as desired or cause a safety function to trigger an emergency stop of a control system.

A wide variety of organizations, including police forces, media organizations and corporate professionals, also require dependable connectivity services to operate effectively. For example, field reporters need guaranteed uplink (UL) capacity and quality for video streams. With the help of a network application programming interface (API), they can request the required levels of data rate, latency, availability and reliability from the network dynamically through a network slice.

In a network that provides best-effort service delivery, it is difficult for users to estimate the extent to which one can rely on the network to deliver the required performance for a critical application to work. This motivates certain types of subscribers such as commercial enterprises and governmental organizations to establish Service Level Agreements (SLAs) with communication service providers (CSPs) to guarantee satisfactory application performance by ensuring that minimum performance levels are achieved.

Service Level Agreements

An SLA is a contractual agreement between two parties: a service provider and a service consumer. The purpose of an SLA is to define the conditions of the service performance that a service provider is obligated to meet in relation to one or more agreed KPIs. The specific, measurable targets in the SLA are called Service Level Objectives (SLOs), which set out measurable targets on what the customer can expect from the service and help the service provider set internal goals for the service delivery. The quantifiable metrics used to measure the SLO are called Service Level Indicators (SLIs), which could be different well-defined KPIs of the service performance.

The table in Figure 1 explains the relationship between SLAs, SLOs and SLIs.

	SLA	SLO	SLI
Definition	A formal agreement between a service provider and a customer	The specific, measurable target value for a service level	The quantifiable metrics used to measure the SLO
Accountability	The service provider and customer are both accountable for meeting the SLA	The service provider is responsible for meeting the SLOs	Technical teams and monitoring systems track and analyze SLIs
Example	Connectivity service availability commitments with specified data rates and latency	Achieving 99.9% connectivity service availability, at the agreed data rate and latency	Data rate, latency, availability metrics

Figure 1: Definition of SLAs, SLOs and SLIs

Availability and reliability

For a critical application, a service request could include a definition of the traffic characteristics of the application, the required quality of service (QoS) performance levels (such as latency and data rate), as well as the required availability and reliability levels. The availability and reliability go hand in hand: the availability reflects the network’s ability to meet the agreed QoS performance levels under given conditions, while the reliability reflects how consistently the network succeeds in meeting those requirements.

From a dependability perspective, reliability is just as important as availability, as experiencing frequent shorter network failures could potentially be worse for a critical application than experiencing fewer and longer outages, even though the experienced cumulative service downtime (as reflected by the availability level) is the same. This is especially relevant for a critical application that includes safety precautions such as triggering an emergency stop in case of service downtime. For example, even if the QoS performance level is quickly reestablished, a device controlled by a critical application may be stopped for a longer duration caused by the extra time required to recover from the emergency stop.

Dependable differentiated connectivity

A dependable network provides a dependable connectivity service, which means that it needs to ensure that the traffic is handled appropriately, according to the agreed SLA to achieve the required performance, including times with high load and times when the network is being used by others. When the demands on availability and reliability are high, the need for network robustness [4] increases as the network must also consider swift mitigation or prevention of potential network failure cases stemming from failures in network components, power supply and so on, which can cause connectivity service interruption. This approach ensures that critical users can depend on the connectivity service even in more challenging times.

Differentiated connectivity [5] is a framework proposal that enables mass-market and enterprise applications to make better use of the capabilities of a 5G standalone network. Differentiated connectivity leverages a dependable connectivity service for the traffic associated with the critical applications with specific connectivity needs, as depicted in Figure 2. Users and applications have the opportunity to request a certain performance (data rate and/or latency) from the network. Such a service request may be restricted to a certain time period and/or a certain geographical area. The network has the ability to provide a targeted/agreed performance level to specific individual users and applications based on an SLA for a requested service (using a network API, for example) or through a subscription with their CSP. The wide range of possibilities provided by differentiated connectivity will spur application creators to develop applications with specific performance requirements.

Figure 2: Dependable differentiated connectivity

Network support for time-critical communication (TCC) [6] also provides important capabilities to enable service differentiation. The right combination of TCC tools, resilient deployment, resiliency tools and observability is expected to play an important role in enabling CSPs to offer dependable connectivity services defined by SLAs associated with network APIs or based on premium subscriptions.

Achieving network dependability – a comprehensive approach

Dependability is a system property that cannot be achieved by individual features, but rather by using a composite toolbox. Figure 3 illustrates the important functional areas in a mobile network, with examples of dependability-related functionalities, and the importance of resilient network planning and¨deployment that are necessary to achieve dependability.

Figure 3: Functional areas in a mobile network involved in delivering E2E application critical connectivity

Observability

When providing a dependable network, it is not only important to guarantee a certain service, but also to be able to provide the means to validate that the service is delivered as promised. While observability can be used for multiple purposes, in this context we will focus on using it to monitor SLA fulfillment for the more critical connectivity services. Network observability can facilitate intent-based management and automation of the connectivity services (based on artificial intelligence and machine learning, for example, and used for digital twins), user obligations as defined in the SLA (such as device capabilities and geographical limitations), troubleshooting and performance predictions in the CSP’s network.

With respect to SLA fulfillment, one of our key findings is that the current industry focus on per-node in-service performance (referred to as SONE – system outage network element – in the TL9000 [7] and marked as such in Figure 3) needs to be expanded with new observability solutions to reflect the performance of a specific connectivity service for specific user equipment (UE) and QoS flow. The scope of this observability is the same as the scope of the CSPprovided connectivity service; that is, from the UE modem to the user plane (UP) function in the CSP network for the critical applications. Any critical KPIs observed are different for different connectivity services and depend on what is included in the SLAs between the CSP and its customers. They may include downlink (DL)/UL data rate, DL/UL latency, connectivity service availability and reliability, and for TCC use cases also bounded latency and delay variation. The KPIs must also be observed when there are different failures within the network.

Because many critical KPIs are UP and QoS related, the UP for the different connectivity services in the radio access network (RAN) and the core network (CN) – and the related traffic-handling network functions (NFs) – are central to the observability of SLA fulfillment. It is also important, however, to observe the control plane (CP) in the network, as it may impact UP availability and reliability, as well as other UP KPIs. It is beneficial that UE support network observability particularly for UL performance, radio-coverage-related measurements and UE positioning, for example. In addition to the RAN and CN, observation of the transport network and cloud infrastructure/platform is also needed, mainly in the context of their impact on the connectivity services.

In the case of a customer who complains that a specific connectivity service provided by a CSP is not performing as promised in an SLA based on application-level measurements, for example, the CSP can troubleshoot using observability at UE and QoS flow level. These results will demonstrate whether or not the network delivered dependable connectivity for the application according to the agreed SLA. Certain types of observability can be achieved by means of proprietary algorithms and accuracy. The scope and complexity of some measurements can be improved by means of standard enablers. Thus, 6G standardization must include observability enhancements to support dependable connectivity services.

Observability on UE and QoS flow level can also be used for intent-based management and automation of connectivity services. In this case, the observability applies to the fulfillment of intents that describe the connectivity services and the relations between them, and the determination of whether it is necessary to dynamically change the related network configuration.

In addition to determining the level of performance delivered by the network, observability for performance prediction is also desirable. Performance prediction provides an estimate that is useful when deciding if new requests for connectivity or SLAs can be accepted, as well as the levels of availability and reliability that can be expected for a certain performance level in a certain geographical area. Predicted performance further helps to assure that ongoing QoS flows achieve their performance targets via appropriate network configuration and resource management.

Time-critical communication tools

The network may apply TCC tools to mitigate different sources of delay and interruption to ensure timely delivery of packets in a healthy 5G network [6]. These tools will continue to improve in 5G Advanced and 6G. Depending on the targeted performance and the required availability and reliability, TCC tools may also need to be complemented by selected resiliency tools to ensure that the connectivity service maintains sufficient performance during network degradations or failures.

Network planning and resilient deployment

Most, if not all, resiliency tools require planning to ensure proper resource provisioning with appropriate margins for redundancy [4]. Depending on the network characteristics and target dependability requirements, redundancy may be required in radio resources, in RAN infrastructure, in the core infrastructure, in the transport network and/or in the cloud infrastructure on which those functions are executed, as shown in Figure 4. Redundancy and resilient resource provisioning are typically also required in accessory functions such as power supplies.

RAN (coverage and radio resource redundancy, resilience)	Core (session resilience and network redundancy)	Transport (flexible mesh topology)	Compute (cloud-native deployment, redundant infrastructure)
Deployment planning (network digital twins, SLA planning tools)

Figure 4: The most important aspects of redundancy for resilient deployment

The best deployment redundancy solution for a given scenario depends on local constraints and on performance, availability and reliability requirements. For example, full network redundancy may be suitable for a confined deployment, while wide-area deployments may need to utilize other tools that make greater use of resource mobility and automation. Different solutions for compute redundancy can be made available depending on the scenario, such as providing a redundant NF at another compute location by utilizing redundant data centers, for example. Planning tool capabilities will need to evolve to support dependability-aware dimensioning and planning of network infrastructure. This could also be achieved by using network digital twins that enable the evaluation of network capabilities with high fidelity before they are actually deployed.

Resiliency tools in the radio-access, core and transport networks

High levels of availability and reliability can only be achieved with deployment resilience in the RAN, the CN and the transport network. Resiliency tools need to be applied across relevant segments of the CSP network related to the dependable connectivity service. Different network entities will require different solutions. In certain cases, an NF can be duplicated in an active-active fashion, delivering high availability while balancing it against cost, energy and resource efficiency. In other cases, it is possible to deliver a finer and more efficient level of redundancy (by duplicating only specific instances in a cloud-native application, for example) or by utilizing resource mobility and automated recovery to avoid service unavailability.

Radio resource redundancy is a prerequisite for many resilience solutions that may prevent, mitigate and/or enable recovery from network failures. A deployment may provide radio redundancy by means of different frequencies, such as by using an overlapping combination of a low-band frequency layer deployed to provide coverage with a denser mid-band frequency layer deployed to provide capacity in localized areas. The UE may be able to use different frequency layers alternatively, triggering mobility to change from one carrier to another when a failure is detected or predicted, for example.

From a RAN resilience perspective, it is desirable for a UE to be within coverage of multiple radio units that are geographically separated and served by different basebands. This improves protection against both radio unit and baseband restarts, and also failures of RAN sites. This is important regardless of whether the frequency layers are used alternatively or simultaneously.

Resilient mobility solutions will enable swift transition between functioning resources, including times when the network is operating in a degraded mode and some nodes/components are experiencing issues. As a result, critical users will be able to avoid the long interruption times that can otherwise be caused by the resource changes required to recover from a service-impacting event.

The 3GPP (3rd Generation Partnership Project) 5G standard provides mechanisms like conditional handover and conditional lower-layer triggered mobility that can be useful in mitigating certain failures. The 6G standardization is expected to include both network-triggered and UEtriggered solutions in the first release. Resilient mobility triggers should encompass relevant network failure cases and service degradations as well as failure predictions. This, in combination with a service interruption similar to a normal handover, will lead to a more predictable connectivity service availability than having to wait for radio link failure and cell reselection procedures to be triggered after an issue has occurred.

For baseband resilience, the key capabilities are implementation-specific, based on factors such as node internal resilience and cloud-native design principles (shown in Figure 4), as well as early detection and prediction of failures to trigger resilient mobility for the critical users.

In the CN, NFs are normally deployed in geographical or local redundancy pools with active-active or active-standby mode to provide network redundancy. 3GPP defined an NF set function for the CP of core NFs to provide session resilience in case of NF level outage. The CP NFs that support the NF set function can be deployed across geographical data centers. If one NF instance in the NF set fails, the active sessions can be taken over by alternative NF instances in the same NF set, and the connectivity service can continue. There is no such NF set concept for 3GPP UP functions. UP session resilience can, however, be realized by appropriate implementation according to the active-standby model; that is, the IP sessions can be taken over by the standby instance, which becomes the new active node when UP failover happens.

Besides network-level redundancy, NFs designed according to cloud-native principles have better internal resilience for software failure or infrastructure disturbance (see Figure 4), thus improving overall network robustness. Resiliency tools are also needed in the transport network, which interconnects mobile NFs, nodes and sites. The transport network is divided into two sub-domains: the backhaul between RAN sites and CN sites, and the fronthaul between antenna sites and RAN sites. The routers, switches and cables for each connection of transport domains must be deployed in a redundant manner. Satellite and microwave links can also be used to improve transport redundancy.

Conclusion

Mobile connectivity has become a core characteristic of society and business all around the globe. Regardless of whether it is used by a human or an autonomous machine, virtually any application today requires a connectivity link in order to perform effectively. In cases where the application is performing a business-, mission- or society-critical task, assurance is needed that the connectivity service will deliver the minimum level of performance required by the application based on critical KPIs on latency and data rate. Dependable networks are also an important enabler of new services such as dependable differentiated connectivity. With this in mind, Ericsson believes that a comprehensive approach to ensuring network dependability is a critically important aspect of our work to develop the networks of the future.

3GPP – 3rd Generation Partnership Project
API – Application Programming Interface
CN – Core Network
CP – Control Plane
CSP – Communication Service Provider
DL – Downlink
E2E – End-to-End
KPI – Key Performance Indicator
NF – Network Function
QoS – Quality of Service
RAN – Radio Access Network
SLA – Service Level Agreement
SLI – Service Level Indicator
SLO – Service Level Objective
SONE – System Outage Network Element
TCC – Time-Critical Communication
UE – User Equipment
UL – Uplink
UP – User Plane

References

Ericsson white paper, Co-creating a cyber-physical world, July 2024.

International Electrotechnical Commission (IEC), dependability, IEV ref 192-01-22, International Electrotechnical Vocabulary Online, (accessed January 15, 2025).

DETERMINISTIC6G, D1.3 Report on dependable service design, project report, December 2024.

Ericsson Technology Review, Robustness evolution: Building robust critical networks with the 5G System, November 11, 2021, Vikberg, J.; Hall. G.; Cagenius, T.; Wang, R.; Schultz, J.

Ericsson white paper, Differentiated connectivity: Unleashing the full potential of 5G, February 2025.

Ericsson Technology Review, Critical IoT connectivity: Ideal for time-critical communications, June 2, 2020, Alriksson, F.; Boström, L.; Sachs, J.; Wang, E.; Zaidi, A.

The Telecom Quality Management System, TL 9000.

Authors

Hans Bergström

joined Ericsson in 1989. He currently serves as the company’s director of architecture evolution, researching the emerging trend of network availability and resilient high-performing networks. With more than 30 years of global experience in different capacities from customer engagements to architectural and systems design, his background spans segments including business communication, networks, data centers and wireless communication from 3G to 6G. Bergström holds a B.Sc. in computer science from Linnaeus University in Växjö, Sweden.

Joachim Sachs

is a senior expert at Ericsson Research with more than 25 years of experience in mobile telecommunication from 2G to 6G since joining Ericsson in 1997. His research interests include 5G and 6G mobile networks for enterprise and industrial use cases, including cross-industry research collaborations. Sachs serves as the co-chair of the technical committee on communication networks and systems of the German VDE information technology society and co-chair of the working group resilient-by-design of the 6G platform Germany. He holds a Ph.D. in electrical engineering from Technical University Berlin in Germany.

Lisa Boström

joined Ericsson in 2006. She works as a researcher in Ericsson Radio Networks, where she does research and concept development for 5G and 6G, with a primary focus on industry verticals and critical communications. She also has a background within radio networks engineering and standardization. Boström holds an M.Sc. in media engineering from Luleå University of Technology in Sweden.

Richard Wang

is an expert in core network robustness. He joined Ericsson in 2009 and has worked in different technology areas such as mobile-services switching, Evolved Packet Core (EPC), virtual EPC solution, the 5G Core (5GC) solution and core network robustness. He currently focuses on 5G core network robustness and 6G long-term evolution. Wang holds a Ph.D. in control theory and control engineering from Shanghai Jiao Tong University in China.

Stefano Sorrentino

joined Ericsson in 2008. He currently works as a principal researcher at Ericsson Research, where he is technical coordinator for a 6G-focused project dealing with connectivity differentiation and networks dependability. He has previously worked with verticals support by mobile networks, including automotive and public safety. Sorrentino holds an M.Sc. in telecommunications from Politecnico di Milano in Italy.

Jari Vikberg

is a senior expert in network architecture and the chief network architect at Ericsson’s CTO office. He joined the company in 1993 and has both broad and deep technology competence covering network architectures for all generations of radio access and packet-core networks. Vikberg holds an M.Sc. in computer science from the University of Helsinki in Finland.

Authors

Dependable networks: from best-effort to guaranteed performance

Dependable network characteristics

Service Level Agreements

Availability and reliability

Dependable differentiated connectivity

Achieving network dependability – a comprehensive approach

Observability

Time-critical communication tools

Network planning and resilient deployment

Resiliency tools in the radio-access, core and transport networks

Conclusion

Further reading

Journey to mission critical 4G and 5G

Transforming industries

Energy utilities

5G for rail communications

Public safety

Making manufacturing smarter

Warehousing and logistics

Transforming the future of Defense digitalization

References

Authors

Hans Bergström

Joachim Sachs

Lisa Boström

Richard Wang

Stefano Sorrentino

Jari Vikberg

Authors

Dependable networks: from best-effort to guaranteed performance

Dependable network characteristics

Service Level Agreements

Availability and reliability

Dependable differentiated connectivity

Achieving network dependability – a comprehensive approach

Observability

Time-critical communication tools

Network planning and resilient deployment

Resiliency tools in the radio-access, core and transport networks

Conclusion

Terms and abbreviations

Further reading

References

Authors

Hans Bergström

Joachim Sachs

Lisa Boström

Richard Wang

Stefano Sorrentino

Jari Vikberg