The network platform is characterized by the architecture which can instantaneously meet any application needs
Instead of the prevailing notion of a single and monolithic network serving multiple purposes, network slicing allows to build logical networks on top of a common and shared infrastructure layer. Network slicing is, though outlined as a “5G concept” in the industry, not tied to 5G and variants of network slicing are feasible already with earlier technologies.
The network slicing paradigm relies on a few enabling technologies. Most important are Network Function Virtualization (NFV), Distributed Cloud, and Software Defined Networks (SDN). A high degree of automation and orchestration is essential, along with equal flexibility in the BSS and service exposure/enablement layer, to create a matching business flexibility.
In addition, network slicing relies on enablers in the underlying network. In EPC-based networks (either 4G or 5G NSA), slicing can for example make use of traditional mechanisms like PLMN ids and APN (Access Point Names), which are supported in 2-4G. With 4G/EPC, the DECOR (Dedicated Core) standard can be used for better control and scalability of network slicing. These are possible to combine with functions such as Radio Resource Partitioning, based on SPID and/or PLMN-ID, to get entry level “end to end” network slicing.
5G Core standards i.e. used with 5G SA, brings additional values. With 5G Core, a device can simultaneously be connected to multiple slices supporting new use cases. Furthermore, 5G Core comes with new signaling functions (Network Slice Selection Function) and procedures that extends to the device. These allows for a better control of how devices connect to slices and for e2e monitoring using common identifiers in RAN and core. Network slicing also extends into IMS and UDM allowing these to be partitioned per slice.
Network slices and 5G services typically involve multiple network domains:
- Order Management & Monetization in the BSS domain makes it possible to monetize the Network Slices and the hosted services. It provides a flexible layer of capabilities for billing, exposure, ordering etc. enabling new business models customized to the needs for 5G customers.
- Service Orchestration, Assurance and Analytics, Resource orchestration and VNF Management together with Domain Managers, provides the fundamental capabilities for orchestration and automation, enabled by model driven orchestration and observability, necessary to reap the business benefits of Network Slicing, i.e. TTM/TTC as well as OPEX. It also has a crucial role in service assurance.
- Connectivity, EPC and 5G Core are key modules/building blocks of network slices deployed to suit individual business/service needs. They also provide key capabilities for steering traffic and sessions (into Slices) and for controlling access to Slices. In LTE and NG-RAN, a NW Slice provides additional information on the policy for handling traffic in addition to the regular QoS for radio bearers. LTE and NG-RAN may serve several NW Slices and each NW Slice may (or may not) contain users with different QoS requirements. RAN may choose to map a single NW Slice to a unique RAN Partition or map several NW Slices to the same RAN Partition. The decision on which resources to isolate and which to share between the RAN Partitions impacts how the RAN service requirement for the NW Slice can be fulfilled.
- Cloud Infrastructure and Transport together with VIMs and SDNs are key enablers providing the fundamental capabilities for infrastructure sharing and programmability.
Different functions, which also gives the selection mechanism between different network slices, can be deployed and combined to support required use case/s as exemplified in Figure 4. Several hybrid cases will exist based on specifics of the use-case to be supported. The upper show slice supporting the massive machine type communication (MTC) 4G case viable for foreseeable future. The slice supporting local critical MTC using 5GS will certainly be expanded in many deployments with LTE, EPC.
The networked society will be built on unprecedented connectivity and the ability to access cloud services from anywhere in the world. This connectivity brings with it an evolving threat landscape including an increased risk from a myriad of low tech IoT (Internet of Things) devices which can threaten the network integrity.
Security can be divided in to six different general functions to be used to protect a systems asset.
The six security functions are managed by either the security management function and/or the identity management function. The functions contain the following functionality:
- Data protection - Confidentiality and Integrity protection of data in Transit, in Process and at Rest
- Network Security - Firewall, Security Gateway, Traffic separation, Intrusion Detection
- Hardware and Platform Security - Trusted execution and storage
- Logging, Monitoring and Analytics - Generation and collection of security event logs used for monitoring and analytics
- Key and Certificate Management - Secure key and certificate generation, provisioning and deletion
- Authentication and Authorization - Identification, authentication and authorization of humans and machines (services, containers, resources, …)
Security is needed in all network domains and use case areas. Below are the illustrations describing how the security functions are used in the 5G RAN deployment architecture and in the 5G core Service Based Architecture.
The 5G RAN, in the above deployment view, is divided between radio and local access sites where the F1 interfaces are deployed between sites. The security functions are deployed based on what the standards require and based on the threats and risks there are on the RAN deployment.
The security functions are shown in two of the 5G core functions for simplicity. Some of the security functions are implemented per node (from bare metal to virtualized like VM/container).
The cloud security architecture addresses security challenges derived from multi-tenancy, divided responsibility and the dynamic environment. The security management & orchestration layer dynamically deploys and adjusts security functions, policies and related configurations of customers and providers, e.g. in IT cloud and telecom cloud deployments.
Historically (and current) way of implementing security is to protect against the identified threats on the systems. Still a risk-based approach must be taken and mitigate the risks that can’t be accepted.
There is architecture aspect on the National Institute of Standards and Technology (NIST) cybersecurity framework - where and how the different mechanisms are implemented.
Instead of just only trying to protect against the known threats, systems shall also have the possibility to detect if the system is under attack. Based on the attack, it shall respond and recover to normal state after the mitigation.
The detection if a system is under attack, can be done in different places of the architecture, depending on what information is needed to detect the attack. A node itself can have functionality to identify an attack, but if the node is under attack the detection mechanism may not work properly. The advantage of being able to detect in the node itself is that logs and other data don’t have to be sent to an external system for MI/analytics.
In some cases, an attack can’t be detected based on information from one source only and an external system will have to collect logs and other data feed from different sources to perform MI/analytics. Below architecture describes a system/function internal and external detect/respond loop.
To optimize the network, one must have detection functions on different levels from a single node/function/product to domain level and the central Security Management level. Logs and other data feeds to be used to detect attacks always must be sent to a collection system, but it can be pre-processed to limit the amount of data to be transferred and stored.
Many of the use cases that can be seen from different industries are driving a need to deploy more and more software functions close to the device or the data source, thereby enabling computing at the edge of the networks. Those software functions are both operator network functions that are part of the network transformation but also non-operator applications that implement those use cases. The location depends on the use case and the business value it provides. The main drivers of edge computing are low latency, high bandwidth, self-contained private or dedicated deployments and resilient and data governed computing and storage.
The distributed cloud provides an execution environment for cloud application optimization across multiple sites, managed as one solution and perceived as such by the applications. It is based on SDN, NFV and 3GPP edge computing. The execution environment may be formed by multiple heterogeneous instances of software with end-to-end service orchestration and external exposure and management interfaces. This enables multi-access and multi-cloud capabilities and unlocks networks as open platforms for application innovations.
A layered network architecture is applied inside and between data centers utilizing software defined networking capabilities of the solution.
Management include orchestration and placement support as well as monitoring and automation. The same orchestration functions, supporting information model and descriptors are used for operator network functions and non-operator applications. Orchestration cover the request and allocation of resources (placement support), compute and networking, the deployment of the SW components and the establishment of the connectivity according to the constrains and requirements of the applications. A smart resource inventory is required to enable proper application deployment in a distributed cloud. It is the common topology awareness at OSS level, shared with assurance, that enable closed loop automation and smart placement functions.
Application connectivity is supported for both EPS and 5GS. When the applications require to be placed below existing SGi/N6 anchor points, local breakout (LBO) of traffic flows related to the application is supported. This can be achieved by distributing SGi/N6, deploying core network user plane function supporting LBO.
Transport network across sites is set as part of the planning process while connectivity between the required instances will be created as a dynamic overlay on top of that networking infrastructure at instantiation time, depending on the service definition.
To make assets consumable to the applications, three APIs are exposed (and made available through an SDK for application development):
- infrastructure management APIs responsible of requesting resources for runtime (it may include network slices), tenant mgmt., etc. that will support different infrastructure stacks.
- application LCM APIs, including connectivity interaction (LBO, UPF anchoring point, etc.). Applications shall not interact directly with the LBO APIs but instead will request certain requirements (in the application descriptor or at execution time) that the service orchestrator will translate into placement and network connectivity, including LBO configuration.
- and network/application APIs based on network exposure functions.
A distributed cloud comprehends cloud instances with various level of independency, that are managed efficiently, being perceived by the user as one cloud instance although internally it consists of a multitude of virtual infrastructure managers.
Support for the virtualization layers (containers and VMs) and the multitude of deployment options for them impose a rationalization of combinations and options. VMs and containers will be deployed side by side with support for strict tenant separation. Interworking between the virtualization solutions will be supported and will take place on the boundaries between the different instances.
Most sites will host at least one full stack of a virtual infrastructure manager providing IaaS (Infrastructure as a Service) or CaaS (Container as a Service). Very small sites with strict resource limitations may not contain the full management stack but kind of management agents to support remote control, for example using federation, distribution, or other approaches.
Service exposure will be critical to the success of solutions that rely on edge computing, network slicing and distributed cloud. Without it, the growing number of functions, nodes, configurations and individual offerings that those solutions entail represents a significant risk of increased operational expenditure. The key benefit of service exposure in this respect is that it makes it possible to use application programming interfaces (APIs) to connect automation flows and artificial intelligence (AI) processes across organizational, technology, business-to-business (B2B) and other borders, thereby avoiding costly manual handling. AI and analytics-based services are particularly good candidates for exposure and external monetization.
Service exposure is also a crucial part of automation making it possible to automate processes that runs over borders, e.g. processes that runs over B2B interfaces, over organization borders in an enterprise or over different technology domains.
There are three main types of service exposure in a telecom environment:
- Network monitoring
Examples include network publishing information as real-time statuses, event streams, reports, statistics, analytic insights and so on
- Network control and configuration
Involves requesting control services that directly interact with the network traffic or request configuration changes.
- Payload interfaces
Examples include messaging and local breakout to interact with the data streams through local breakout for optimization
The functional architecture for service exposure is built around four customer scenarios. In the case of internal consumers, applications for monitoring, optimization and internal information sharing operate under the control and ownership of the enterprise itself. In the case of B2C, consumers directly use services via web or app support. B2C examples include call control and self-service management of preferences and subscriptions. The B2B scenario consists of partners that use services such as messaging and IoT communication to support their business. The B2B2X scenario is made up of more complex value chains such as mobile virtual network operators, web scale, gaming, automotive and telco cloud through web-scale APIs.
The functional architecture for service exposure is divided into three layers that each act as a framework for the realization. Domain-specific functionality and knowledge are applied and added to the framework as configurations, scripts, plug-ins, models and so on. For example, the access control framework delivers the building blocks for specializing the access controls for a specific area.
The abstraction and resource layer are responsible for communicating with the assets. If some assets are located outside the enterprise – at a supplier or partner facility in a federation scenario, for example – B2B functionality will also be included in this layer.
The business and service logic layer are responsible for transformation and composition – that is, when there is a need to raise the abstraction level of a service to create combined services.
The exposed service execution APIs and exposed management layer are responsible for making the service discoverable and reachable for the consumer. This is done through the API gateway, with the support of portal, SDK and API management.
Business support systems (BSS) and operations support systems (OSS) play a double role in this architecture. Firstly, they serve as resources that can expose their values – OSS can provide analytics insights, for example, and BSS can provide “charging on behalf of” functionality. At the same time, OSS are responsible for managing service exposure in all assurance, configuration, accounting, performance, security and LCM aspects, such as the discovery, ordering and charging of a service.
One of the key characteristics of the architecture presented is that the service exposure framework life cycle is decoupled from the exposed services, which makes it possible to support both short- and long-tail exposed services. This is realized through the inclusion and exposure of new services through configuration, plug-ins and the possibility to extend the framework.
Another key characteristic to note is that it is possible to deploy common exposure functions both in a distributed way and individually – in combination with other microservices for efficiency reasons, for example. Typical cases are distributed cloud with edge computing and web-scale scenarios such as download/upload/streaming where the edge site and terminal are involved in the optimization.
In the Operator B case in the picture is Service Exposure used to expose services in a full B2B context. Here is the BSS integration and support required to handle all commercial aspects of the exposure and LCM of Customer, Contracts, Orders, Services etc. and Charging and Billing. Operator B also acquires services from a Supplier, also here are B2B commercial support deployed.
Operator A deploys the Service Exposure both at the Central Site and at the Edge Site e.g. for latency or payload requirements. Services are exposed to operator A own APP/VNFs, limiting the need to B2B support. However, some of the APP are owned by an external partner in a hosting scenario, both centrally and at the edge, this drives the full B2B support deployed for the externally owned Apps.
Aggregator in the picture deploys the Service Exposure for creating services as a combination more than one supplier. UDN and Web-Scale integration both falls into this category. This is also a B2B interface with specific requirements as exposure to the customer is done through the aggregator, e.g. advertising or discovery of services are done through the Web-Scalers portals.
Operator A and Operator B also have a federation relationship handled by the Service Exposure where both parties are on the same level in the value chain of the ecosystem. Here is a subset of the B2B support deployed.
AI/ML technologies open new possibilities to enable the network to provide machine created intelligence and to use this intelligence for system automation and optimization as well as insight creation both on node and network level as well as on higher OSS/BSS levels. New use cases based on AI/ML will have an impact on the network functionality and on the network architecture. The two main areas of AI/ML that concern the network are Machine Learning and Machine Reasoning, both with different purpose and system requirements.
Machine learning and machine reasoning can both be used to build intelligent logic, but they have different approaches. Machine learning is typically used for learning a complex function from vast amounts of data – for example system anomaly detection, KPI trend forecast, user behavior clustering, alarm/event correlation using supervised and unsupervised learning, as well as policy optimization using reinforcement learning.
Machine reasoning, on the other hand, implements abstract thinking as a computational system. Such a system contains a knowledge base storing declarative and procedural knowledge and a reasoning engine employing logical techniques such as deduction and induction to generate conclusions.
The architecture and logical placement of functionality in the network is influenced by several factors such as who owns the equipment and where the corresponding equipment needs to be placed due to bandwidth or regulatory requirements. It may also be influenced by bandwidth limitations for transfer of data. Different deployment options will impact the architecture like centralized or decentralized inference, model training and update mechanisms.
Data management from data collection, ingest, transport, storage to parsing, cleaning and feature selection is an essential part for the development and training and update of ML models and therefore an important aspect in the network architecture. 3GPP is standardizing new functions in the Service Based Architecture, the MDAF (SA5) and NWDAF (SA2).
In general, there are many data sources. Management systems and functions such as service exposure are examples of data sources.
The data sources generate data or insights which shall be brought in for further analysis or presentation. To ensure that data is only collected once, a data routing and distribution mechanism which can feed data from one source to multiple consumers is needed. The ingest and transport function is followed by the functions needed for storage, processing and presentation/visualization.
Data in transit always needs to be encrypted and may also need to be encrypted at rest depending on the type of data. The integrity of the data needs to be ensured, and data will need to be transferred through gateways and firewalls. Access to data pipeline data and functions will be protected by secure keys and certificates, and subject to authentication and authorization.
In a large distributed system, decisions are made at different places and different granular levels. Some decisions are based on local data and governed by tight control loops with low latency. Other decisions are more strategic, affect the system globally and are made based on data collected from many different sources. Decisions made at a higher global level may also need real time response in critical cases such as power-grid failures, cascading node failures, and so on. An intelligent system that automates such large and complex systems must necessarily reflect the distributed nature and support the management topology.
Data generated at the edge, in a device or network edge node, will at times need to be processed in place. Data may not economically be transferred to a centralized cloud; there may be laws governing where data can reside and there can be privacy or security implications of data transfer. The scale of decisions in these cases is restricted to a small domain, so the algorithms and computing power necessary are usually fast and light.
However, local models can be based on incomplete and biased statistics which may lead to loss of performance. There is a need to leverage the scale of distribution, make appropriate abstractions of local models and transfer the insight to other local models. Learning about global data patterns from multiple networked devices or nodes without access to the actual data is also possible. A recent approach is so called federated learning: learn local models based on local data patterns, send the local models to centralized cloud, average them and send back the average model to all devices.
Global decision making, on the other hand, relies on knowledge of the global state and global data patterns – that is, patterns of events occurring across local nodes. The global state is built by combining the knowledge from multiple components and using it to make decisions.
A common distributed and decentralized paradigm is required to make the best use of local and global data and models and determine how to distribute learning and reasoning across nodes to fulfill extreme latency requirements. Such paradigms themselves may be built using artificial intelligence to incorporate features of self- configuration, self-optimization and self-healing to support network planning, provisioning and assurance.
Our vision is a network that is easily controlled through higher level business intents, has a high degree of automation and self-optimization, and continuously learns to improve over time. The network is robust and resilient to unforeseen events and provides the highest level of security to all its users. We call this the zero-touch network.
This changes the approach to management, from managing networks manually to managing automation.
A key part of managing automation is the control loop paradigm, as shown in Figure 16. This is achieved by designing the insights and policies required for a use case, and the decisions that need to be made. Insights are the outcome when data is processed by analytics. Policies represent the rules governing the decisions to be made by the control loop. The decisions are actuated by COM (Control, Orchestration, Management), which interacts with production domains by requesting actions such as update, configure or heal. The control loops are implemented by multiple OSS functions, can act in a hierarchical way and even delegating to the cloud infrastructure that is becoming more capable.
By leveraging analytics, AI and policy, control loops become adaptive and are central to assuring and optimizing the deployed services and resources
Figure 17 describes the major functional components, that shall be used to realize automation. This view captures a design-time set of functions (studio-suite) and a run-time set of functions (all others) that will interwork to realize the automated life-cycle management of network functions and services. Machine learning capabilities are present in the assurance and analytics domain.
The drive for automation will address several aspects of the business processes spanning from automating parts of or complete flows to removal of complete steps by providing automation in the network domains. Machine learning will be an essential part as well as advanced analytics.
Automated management in Figure 18 is addressed in a pattern which is described through the application of the tools of Control, Orchestration, Management, Analytics and Policy to closed loops in:
- Infrastructure incl. at least transport, compute and storage
- VNF and cloud runtime environment
- E2e service and network slicing
- User experience
The three key values are:
- Rapid service introduction and lifecycle management
- Dynamic lifecycle management of networks
- Automation of business and operational processes.
These values are achieved through:
- An architectural pattern for advanced automated management.
- A wide policy framework for policy definition, deployment and execution of service and resource policies
- Analytics based on a common architecture for policy and AI/ML enabled assurance.