The AI standard for 5G RAN: what it is, why it’s needed, and how to get there
- Cross-vendor interoperability of AI/ML technologies in 5G RAN is key to unleashing the future of AI across networks, a process made difficult by the black-box nature of AI technologies.
- A new, ground-breaking 3GPP work item now takes us a crucial step closer to fully normative specifications for AI/ML processes in 5G RAN.
The impact of artificial intelligence and machine learning (AI/ML) is growing across every field of technology. Yet very little is said about how these techniques work where multiple machines collaborate across different vendors.
For example, how does such a multi-vendor setup impact the requirements placed on interfaces and data exchanges? And can we be sure that the output of AI/ML algorithms is interpretable by each machine receiving it?
At first glance, this may not seem so important in systems where the use of AI/ML is limited to a machine, also known as a network node, that issues a simple query and receives an “augmented” AI/ML response in return.
But imagine if that system is a 5G network. In this case, that system brings together multiple data sources, AI/ML inference functions, and AI/ML training functions all from different vendors. This requires precise rules that can regulate the communication and behavior of each party involved in the creation and consumption of AI/ML byproducts.
3GPP, the standards organization for mobile networks, has tackled this problem by working to specify new procedures that can support AI/ML in 5G RAN systems. Today, this is being explored by the working group responsible for 5G RAN architecture.
In this blog post we take a closer look at this pioneering work, something that is the first of its kind in 3GPP.
Standardizing AI/ML in 5G RAN: the journey begins
For the first time in its history, 3GPP recently issued a ground-breaking technical report that lays out which information would need to be exchanged among nodes and functions of a 5G RAN to support AI/ML based optimization. The report focuses on three reference use cases: load balancing, mobility optimization and network energy saving.
The report also reflects the agreements in the 3GPP RAN3 working group on issues such as:
- Inputs – which sets of inputs would an AI/ML model need to be able to infer actions and predictions that would enable optimal performance
- Feedback – which feedback would such models need to receive to perfect their inference process and retrain models to achieve higher inference accuracy
- Outputs – which outputs would AI/ML models generate to enable the receivers to take actions for optimization and failure prevention
A journey to open the black box…at least partially
AI/ML technologies are infamously regarded as black box technologies, and their processes are often self-contained. This means that each implementation will often use its own ‘secret’ ingredients to optimize its performance.
And while it may seem absurd to map such black-box processes, particularly at the output stage, the aim of the study was to explore how to make at least parts of the AI/ML ‘recipe’ known and acknowledged by all major parties in the industry. This could then facilitate improved cross-vendor AI/ML performance.
The reason for such a bold move is the ambition to achieve a fully interoperable 5G RAN infrastructure where different nodes and functions can cooperate to achieve AI/ML supported processes. A rather complex task if we consider that it requires each piece of information involved in the AI/ML process to be ‘explainable’ and unequivocally interpreted by the nodes and functions that need it for their correct operations.
The final goal would be to enable AI/ML to predict how conditions, performances, measurements and other factors will change in the future and to derive actions that contribute towards failure avoidance and system optimization for conditions yet to materialize. Quite a big shift of paradigm when comparing to rule-based techniques historically assumed when developing 3GPP specifications.
With the end of the study, 3GPP set a tough and ambitious journey ahead of itself:
to create normative specifications to be followed across the globe that would enable AI/ML processes to be supported with full interoperability among all the components of a 5G RAN system.
Today, the 3GPP RAN3 working group is currently developing a work item to derive such specifications.
The road to a new standard: where we are today
Formal activities to develop the new 3GPP work item on AI/ML support in 5G RAN began at the start of 3GPP Rel-18, a standard expected to be released later this year, and being a first of its kind has gathered a whole new set of skills into 3GPP.
New delegates working with and in larger feature teams have been populating the 3GPP meetings to participate in a topic that has attracted high interest. More than 20 companies. including chipset vendors, network vendors, network operators, and research institutions, have been actively taking part in discussions on AI/ML. Whenever the topic is discussed in a 3GPP meeting room, an atmosphere of both trepidation and excitement takes over the room.
3GPP explained: the difference between items, specifications and standards
Study item: | This constitutes the research phase of a project, where 3GPP members explore whether a new idea or technology is feasible and worth pursuing. |
Work item: | In simple terms, this is the 3GPP version of a to-do list, comprising all the tasks, challenges and projects that need to be overcome in order to develop the technical solution. |
Technical specification: | This comprises a detailed set of instructions or rules that describes how a particular 3GPP technology or system should be designed and implemented. |
Standard: |
The final, agreed-upon version of a set of 3GPP specifications are transposed into a standard by 3GPPs organization partners (for example ETSI). All members must adhere to the standard to ensure that different devices and networks all over the world can work together seamlessly. In summary, 3GPP starts with ideas in ‘study items’, turns those ideas into detailed plans in ‘technical specifications’, and when everyone agrees on how things should work, it becomes a ‘3GPP standard’ that everyone follows. ‘Work items’ comprise all ongoing active tasks that relate to any given technical ambition. |
Common principles that could define the new standard
The work item was framed around the agreements taken during the previous study item. These agreements laid a number of ‘High Level Principles’, setting the boundaries for the processes and the level of details on which the specifications should focus. During the work item phase, a few other principles were added to create a solid framework, including:
- AI/ML algorithms and models are implementation specific and out of standardization scope: this principle was agreed to make sure that, despite a standardized AI/ML framework, R&D on the main engine of AI/ML (namely AI/ML algorithms) would not be constrained. The latter ensures the establishment of a framework that fosters competition among solution providers.
- Solutions development should focus on use case specific AI/ML functionalities and on the definition of the corresponding types of inputs/outputs/feedback information required: namely the center focus of the work is on the identification of a commonly-agreed minimum set of data that is needed/produced by an AI/ML algorithm to fulfil the functional requirements of pertaining use cases, as well as how to transfer needed information among AI/ML functions, so as to enable the desired AI/ML functionality.
- Model training and model inference functions should be able to request data needed for AI/ML purposes, if needed: this principle establishes a subscription-based framework where data is signaled only if requested, therefore minimizing signaling and processing load and containing data ingestion
- Signaling procedures used for the exchange of AI/ML related information are use case and data type agnostic: Namely the intended usage of the data exchanged via these procedures (for example: input, output, feedback) and the use case for which data is exchanged is not indicated in the procedures and it remains a matter of implementation.
Defining data types of input, output, and feedback
From the same study item, the work item on AI/ML for 5G RAN inherited to a large extent and further developed the list of inputs/outputs and feedback information identified for each use case. The set of inputs/output/feedback information in question does not constitute a fixed set that each implementation should conform to, but rather a minimum set that may be used by AI/ML models, without limiting the possibility of using any other available data.
For the use cases of reference, which we remind are load balancing, mobility optimization and network energy saving, some of the inputs/outputs/feedback data types agreed to be introduced so far are:
Possible inputs for the AI/ML model
- Measured and predicted resource status information per neighbor cells and own cells: measured/predicted PRB utilization, measured/predicted Number of Active UEs, measured/predicted Number of RRC connections.
These metrics are used to understand the current resource status of cells in a neighborhood as well as the resource utilization prediction at such cells. These inputs may help an AI/ML model to infer appropriate load balancing, energy saving (for example: offloading for cell deactivation) and mobility decisions.
- Measured own and neighbors’ energy cost: namely an indexed representation of a metric mapping to the energy consumed by a RAN node.
These metrics enable a next generation RAN (NG-RAN) node to monitor both its own node level energy consumption as well as that of neighbors and therefore infer how actions such as traffic offloading or cell shaping affect energy consumption in a neighborhood of cells. The objective is to infer actions that lead to the minimization of energy costs across a wide NG-RAN nodes neighborhood.
- Predicted UE trajectory: representing the predicted series of cells the UE will move through within a mobility target RAN node.
Predicted UE trajectories help in inferring optimal mobility decisions and radio resource management policies. The network becomes capable of adapting its resource allocation and handover decisions on the bases of where UEs are foreseen to move. Measured UE trajectories allow for AI/ML techniques like reinforcement learning, where trajectory predictions accuracy can be improved by adjusting models on the basis of the measured accuracy of the prediction.
Possible (non-limiting) outputs inferred by the AI/ML model
- Predicted resource status information per neighbor cells and own cells: predicted physical resource block (PRB) utilization, predicted number of active user equipment (UEs), predicted number of radio resource control (RRC) connections.
- Predicted UE trajectory: representing the predicted series of cells the UE will move through within a mobility target RAN node.
The two metrics above were listed as inputs, but they could be the result of AI/ML inference and therefore classified as outputs. It is interesting at this point to highlight how a piece of information can have a multifaceted role. A prediction can be derived by AI/ML inference, hence being an AI/ML output, but it can serve as an input to another AI/ML model receiving it, hence it would be an input. This logic is at the basis of the principle stating that signaling procedures are data type agnostic. - Mobility actions (for example: handovers) planned to optimize for example load distribution or energy efficiency.
- Energy saving actions (for example: traffic offloading, cell de-activation) planned to optimize energy consumption within a neighborhood of RAN nodes/cells.
Possible (non-limiting) feedback information used by the AI/ML model to optimize its inferring process
- UE performance feedback: namely UE throughput, packet delay and packet error rate measured at a mobility target cell and reported to the source RAN node after a handover occurs.
- Measured UE trajectory: namely the series of cells the UE moved through within a mobility target RAN node, collected by the mobility target and reported to the source RAN node.
The information above can be considered as ‘feedback’ in AI/ML learning techniques where models are adapted on the basis of a reward derived from the measured goodness of the inferred actions or predictions.
As an example, the UE performance feedback could act as a reward to mobility actions: if the UE performance at the target cell is satisfactory, the reward for the mobility action is positive, otherwise it would be negative, calling for for example model retraining. It should however be noted that the UE performance feedback could also be used as an input, for example an AI/ML model wanting to infer whether it is beneficial to deactivate a cell and offload all its UEs to a neighbor cell, may take the UE performance feedback from the neighbor cell into account as an input if both energy consumption and UE performance want to be maximized.
All of the above and more data types are currently under the process of being defined in their encoding and semantics within 3GPP. They are at the heart of enabling AI/ML algorithms to properly function.
Defining data collection, distribution, consumption, and training
With regard to data collection, distribution, consumption, and training, the framework on AI/ML has taken inspiration from procedures defined already in 4G for the reporting of resource status information (for example: radio resource utilization, transport network capacity) and it has enhanced them to fit the needs of AI/ML processes.
Ericsson has been one of the main promoters of the signaling procedures adopted so far in the work on AI/ML. The solution consists of a handshake procedure (also known as a Class 1 procedure) where a 5G RAN node (that consumes the data) can request to other 5G RAN nodes to report data, for example, measurements, predictions or feedback information. While this type of procedure was used in the past to configure periodic reporting of metrics, the novelty of the procedure adopted is that it can be used to also configure event-based reporting. Namely, such a ‘handshake’ would be able, at the same time, to configure periodic reporting of certain data as well as to define the events upon which other data would be reported to deduce the impacts of specific actions (for example: handovers) or events on the overall system.
Together with this Class 1 procedure, a Class 2 procedure was introduced. This procedure serves the purpose of reporting the requested data from the node that produces the data to that which consumes the data. Data will be reported periodically, if requested so, or they will be reported after a given event and for the duration of a specific time window, if the consuming node has so requested.
As it can be appreciated, the procedures put in place by RAN3 in support of AI/ML are not trivial. Many aspects remain to be clarified and further explored.
One such aspect is whether the procedures so far agreed should be enhanced with the possibility of configuring multiple events of different kinds. This would enable the configuration of a single measurement context between data consumer node and data producer node, within which all needed data can be reported. An alternative approach would be to setup a measurement context for every data the consumer might need. This however would incur maintaining numerous contexts as well as triggering higher signalling, which results in memory and processing overload.
Other aspects currently under intense discussions in RAN3 concern how to achieve continuous time series of data collected from the same UE across mobility and RRC state transition.
In the area of training, our vision is that it should be possible to generate training data that are consistently representing the network performance as seen from the ‘eyes’ of a single UE. According to this line of thought, multiple data sets, each from one UE, would be available as training data to be fed to a model training function. This would remove complex data normalization processes to be applied ahead of model training, as each time series of data would be collected from the same UE and therefore not subject to the measurements ‘jumps’ that would incur if data was collected by different UEs with different capabilities and implementations.
As we described above the new procedure can be used to configure event-based reporting. In order to provide the flexibility one needs to selectively obtain the data when needed, one easy way would be to support events based on threshold conditions to trigger data collection. Such events are easy to define and also enable efficient collection of data by avoiding unnecessary data exchange. Not only that, but it can also reduce the processing and storing requirements of the node by allowing selective retrieval of information that is needed for AI/ML model optimization. This is because the data is collected only when the fulfilment conditions of the defined events are met and therefore the use of resources needed to transfer and process such data is optimized.
What to expect in the coming years
The ongoing work surely constitutes a milestone in defining the next stage of 5G, namely 5G Advanced. It is also setting the groundwork for future mobile networks standardization and will surely trigger more work to extend solutions based on AI/ML in different directions.
It is clear that the objectives of the work 3GPP is carrying out are very ambitious. It is also likely that some of the goals within the Rel-18 work item will not be achieved in full before its release later this year and that they will naturally be moved to the next coming release, namely Rel-19.
Possible use cases for future work items
This includes enhancements of the AI/ML support for energy saving, where Rel-19 may tackle solutions to enable predictions of energy costs as a consequence of intended energy saving actions, or AI/ML support for mobility optimization, where optimization involving dual connectivity may also be taken into account.
Rel-19 may also host new studies concerning new use cases to be supported by AI/ML. Among them, one interesting use case is dynamic cell shaping, where cells are dynamically shaped to achieve optimal radio performance, load distribution and energy efficiency. Cell shape changes may have a cascade impact on the network configuration and performance. For example, a change of cell shape may change the neighbor cell relations used to trigger UE mobility, or it may require adaptation of cell coverage in other cells to avoid coverage holes or high inter cell interference. For these reasons this use case would benefit from AI/ML support. AI/ML algorithms may be able to take multiple inputs into account to infer the best cell shaping actions that maximizes network performance.
If we broaden our views, another step towards system wide performance optimization would be to use AI/ML not only for RAN-level optimization but also for UE performance optimization. A clear example of how this could occur would be to enhance the network energy saving use case into a use case that optimizes also the UE energy consumption. In line with this view, UEs could provide a measure of their consumed energy for example, as a consequence of specific NG-RAN reconfiguration actions. This would lead to a process where the impact of each energy saving action is scored also on the basis of its impact on the UE. The advantage could be longer battery life for end users, while maintaining optimal performance and energy consumption at the network side . There are of course many more areas towards which the work on AI/ML in 3GPP RAN3 could be directed.
An exciting time for everyone in tech
The enthusiasm and focus that all parties in 3GPP RAN3 are showing towards this topic is surely a sign of the fact that the only limitation to develop the topic further is time. Indeed, the introduction of AI/ML in NG-RAN systems is considerably complex and subject to many different interpretations, depending on the design choice and requirements taken. Developing AI/ML based solutions for any use case has proven to be time consuming and technically challenging.
For these reasons, work in 3GPP will have to be directed towards those use cases where AI/ML can deliver considerable and tangible benefits when compared to existing techniques. Equally important would be to achieve a standard that is based on a common and solid understanding of how the standardized procedures work. It would be better to converge to ‘simpler’ solutions, which are fully interoperable across multi-vendor networks, than on more complex but less interoperable solutions. The latter would lead to a surge of cases where AI/ML based solutions may deliver worse performance than legacy methods, therefore making AI/ML less trustable and less reliable.
The road ahead is long and bumpy, but if you are an engineer, it is truly a lot of fun.
We should walk this path with the curious eyes of a child and with the steady pace of an experienced mountaineer.
Related reading
Find out why standardization lays an important baseline for future technologies
Read the Ericsson Technology Review article: Enhancing RAN performance with AI
Explore telecom AI
Explore AI in networks
Like what you’re reading? Please sign up for email updates on your favorite topics.
Subscribe nowAt the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.