Artificial intelligence in next-generation connected systems

Available in English 繁體中文

The next generation of networks will be able to sense, compute, learn, reason, and act on business intent almost autonomously, and to manage the ongoing explosion of data from an ever-increasing number of connected intelligent devices. Artificial intelligence (AI) plays a key role in enabling automation, managing complexity and scalability, and leveraging on data from distributed systems in real time. This white paper reflects on the technical challenges the R&D community needs to address in order to help communications service providers (CSPs) and other industry players fully capitalize on the potential of AI.

Introduction

The combination of billions of connected devices, petascale computing, and advanced communication capabilities that enable real-time interactions is leading to the creation of systems on a scale and level of complexity that is beyond the ability of humans to fully comprehend and control. The management and operation of these systems require an extremely high degree of intelligent automation.

An upside of the growing scale and the multitude of interactions that are leading to the increasing system complexity is that they make it possible to gather vast amounts of data from different parts of the system. The gathered data can be fed into models and used in methods to derive deep insights that can be used to optimize and tailor the behavior of the system. Ultimately, the system is able to learn from the outcome of its own behavior.

We are convinced that machine learning, machine reasoning, and other technologies from the field of AI offer the best opportunity to achieve the high levels of automation necessary to manage the complexity and optimization of system performance. Some of this work has already begun as we can see with initiatives to support AI in standards development organizations such as the 3rd Generation Partnership Project (3GPP) and the Open Radio Access Network (O-RAN) Alliance. However, to fully capitalize on the potential of AI, we must first overcome the five technical challenges outlined in this white paper.

Intelligent networks for industries

While it is certain that AI will play a key role in the automation of next-generation systems in a wide variety of industry sectors, it is particularly relevant when increasing the level of autonomy in telecom networks, which in turn enables transformation in other industries such as manufacturing and transportation.

Zero-touch cognitive networks

Across the globe, CSPs are deploying the fifth generation of 3GPP mobile networks (in other words, 5G) and also getting ready for the next generation, which will be able to sense, compute, learn, reason, and act on business intent almost autonomously, thus moving towards zero-touch cognitive networks. The future networks will be able to manage the ongoing explosion of data from ever-increasingly connected, intelligent devices and a multiplicity of new use cases at the network edge and cloud, along with complicated network topologies. This will bring new challenges in the areas of complexity, scalability, robustness, and trustworthiness to network operations, design, and rollout [1]. Traditionally, a mobile network is designed and managed by telecom experts who rely heavily on their extensive knowledge of the network topology, the subscribers’ mobility and usage patterns, and the radio propagation models to design and configure the policies that continuously orchestrate the network. With 5G, these topologies have grown more complex due to smaller cells and more advanced radio technology. Usage patterns have become less predictable for humans alone, and the radio propagation models have become harder to compute with new radio spectrum bands and denser topologies. As a result, AI is essential in assisting CSPs in laying out and operating 5G networks. To move forward towards the realization of a zero-touch paradigm of cognitive networks, and to reason and decide in an automated fashion within complex dependencies and broad domains, AI will have to become an even more integral part of the networks.

A mobile network is a highly distributed and decentralized system. As shown in Figure 1, AI capabilities are added in the network architecture to enable data processing for various purposes, both locally (close to where data is created) and centrally (where data can be consolidated). Local learning and decision-making take place at distributed sites while data and knowledge are blended across sites for a comprehensive global understanding of networks, services, and functions.

Figure 1. Local and global learning and automated decision-making in large distributed networks

Figure 1. Local and global learning and automated decision-making in large distributed networks

Each local site is a rich source of data on the state of the different components, the time series of events, and associated contextual information. This information can be used to build models of local behavior, whereas reasoning is required to process the knowledge gathered across sites in order to infer system-wide insights. Ideally, the knowledge and insights derived at one site can be used at the other sites for better predictions. The network infrastructure is evolving to support the increasing need for real-time capabilities through programmable data pipelines for the volume, velocity, and variety of real-time data and algorithms capable of real-time decision-making.

More intelligence being added to both networks and services as well as business processes will allow for a transition to a data-driven operation that will enable a higher degree of automation, performance, and efficiency. With an increasing level of autonomy, the role of the CSP managing the network will change by steering the network through intents and supervising automation, being in control. Intelligent functions can be customized for each of these services allowing them to operate in a more resilient, trustworthy, and secure manner, taking mobile networks to a new level of innovation for the benefit of industry and society [2].

Connected automated industries

Transformation towards Industry 4.0 is in full progress. Data from connected production units and items in production enable a level of efficiency and flexibility that were not possible before. Anomaly detection and root-cause analysis are used to remove bottlenecks and increase the production yield. Insights from prediction algorithms contribute to planned maintenance and increase the level of automation.

5G mobile systems are being designed according to requirements tailored for industries and will include support for bounded latency with high reliability that will further contribute to the transformation [3]. For example, control logic for individual production units that were previously contained in ruggedized devices on the factory floor can with a 5G connection be offloaded to on-prem IT infrastructure, becoming an integral part of the factory operations control.

By offloading SW from the device, the wireless connection adds flexibility, increases efficiency, and reduces the need for device SW management. Mobile connected units such as AGVs together with collaborating robots controlled from an edge cloud infrastructure can now facilitate complex automated production in real-time, assisted by AI in the cloud. Another example is the introduction of wireless connected AR/VR devices. Specialists can remotely assist local maintenance or even control delicate operations with haptic feedback. In addition, with SW offloaded from the device, batteries will last longer, and the user can benefit from more powerful AR/VR applications.

While AI promises to connect and automate industry making room for disruptive innovation, it also poses many challenges: data is often heterogeneous and distributed, and there are hard requirements on real-time autonomous decision-making. Furthermore, the safety- criticality of many use cases in the industry, most of which arise during human-machine collaboration, set stringent boundaries on the requirements and this adds to the complexity.

Automotive and transportation industries

Advances in the automotive and transportation industries are leading to the introduction of vehicles with increasing levels of automation, targeting different purposes such as the transportation of people and goods, and utility and infrastructure management. These vehicles are also associated with different operational domains, ranging from confined controlled environments (for example, harbors, mines, urban areas) to wide areas of the public road.

Connectivity can play different roles, depending on timing and geographical scale. From an individual vehicle perspective, real-time data sharing allows increased situational awareness and potentially agreement-seeking interactions with other vehicles. Also, (edge)-cloud based AI instances can process real-time data from road infrastructure and off-board sensors to supervise and extend the vehicle capabilities enabled by its on-board sensors. Furthermore, the enormous amounts of sensor data generated by vehicles can be centrally exploited, possibly off-line, in order to continuously improve operational safety and efficiency.

Connectivity and data analytics are also crucial for system-level traffic optimization, which entails different geographical and temporal horizons depending on the type of optimization, from local traffic adaptations to accommodate for local unexpected emergencies, to long- term flow optimizations. All of these optimizations benefit from the increasing real-time data availability in the connected IoT ecosystem, gathering cross-domain insights on the status of connectivity, mobility, and infrastructure.

Safety is clearly a prime concern for the automotive and transportation industry. While onboard sensors will continue to be essential to guaranteeing safety on public roads, central analytics and connectivity are gradually taking on a crucial role for vehicle-safe operations in confined areas as well as vehicle and system performance on public roads. AI-based predictability of networks, off-board sensors, and system performance will be an important ingredient in automated vehicle operations, while automation will be essential for timely fault identification and mitigation at all levels of the system. In essence, vehicle safety design will take into account network and cloud-based functionalities and methods based on data will be part of the solution, enabling a safe-by-design system.

It is well understood that AI is applicable to almost all operations across many industries and can help achieve efficiency through intelligent and adaptive automation. The applications presented above illustrate several facets of connectivity and AI in large industries.

Addressing the challenges

There are five main challenges that must be addressed to ensure the acceptance of AI as a viable approach for the intelligent automation of complex systems operating in or at close to real time. We believe that solution approaches will require strong domain knowledge and a deep understanding of underlying connectivity and communication aspects.

Systems that can make decisions autonomously

Zero-touch operations mean a reduced level of human involvement and, thus, an increased level of network autonomy to cope with the growing system’s complexity, size, and reduced decision-making lead time. One of the key differentiators of zero-touch operations is self-adapting automation, where unforeseen situations, intents, and requirements can be addressed without human intervention.

Transformation of network operations in which decisions are made by humans and then executed by machines, as well as those where decisions are made solely by machines requires these machines to understand the perceived situation, that is, relate it to existing background knowledge. We need to give machines more autonomy by moving from imperative (what to do) to declarative (what to achieve) objective specifications and give them means of provisioning relevant domain knowledge. We call such declarative objective specifications intents. Intents can express functional, nonfunctional, and operational goals, constraints, and requirements [4].

Knowledge-intensive systems use formal models to interpret perceived situations and execute actions based on autonomously made decisions. Such models can be obtained in different ways: by learning from data using a machine-learning (ML) approach, provided by domain experts, or automatically discovered by a search during logical inferencing.

Applying a solely traditional data science approach where a model is trained based on all available data has several disadvantages. First, a traditional approach may not scale and thereby become infeasible. The state vector can contain a huge number of features, and potential combinations of situations for training at inferencing times can be countless. Secondly, such an approach cannot handle unexpected situations as there is no data provided to train on. Instead, we need to divide the problem into subproblems and orchestrate a variety of more specific methods, using a so-called agent-based approach. Agents can be traditional ML components, analytical models, expert rules, and so on. Orchestration of agents can be driven by a continuous optimization loop trying to bridge the gap between current and desired (specified by intent) states according to the system’s understanding of the network state.

Figure 2. Properties of future systems that take automation to the next level

Figure 2. Properties of future systems that take automation to the next level

We see that hybrid approaches will be useful in next-generation intelligent systems where robust learning of complex models is combined with symbolic logic that provides knowledge representation, reasoning, and explanation facilities. The knowledge could be, for example, universal laws of physics or the best-known methods in a specific domain.

Intelligent systems must be endowed with the ability to make decisions autonomously to fulfill given objectives, robustness to be able to solve a problem in several different ways, and flexibility in decision-making by utilizing various pieces of both prepopulated and learned knowledge.

Distributed and decentralized intelligence

In a large distributed system, decisions are made at different locations and levels. Some decisions are based on local data and governed by tight control loops with low latency. Other decisions are more strategic, affect the system globally, and are made based on data collected from many different sources. Decisions made at a higher global level may also require real-time responses in critical cases such as power-grid failures, cascading node failures, and so on. The intelligence that automates such large and complex systems must reflect their distributed nature and support the management topology.

Data generated at the edge, in device or network edge node, will at times need to be processed in place. It may not always be feasible to transfer data to a centralized cloud; there may be laws governing where data can reside as well as privacy or security implications for data transfer. The scale of decisions in these cases is restricted to a small domain, so the algorithms and computing power necessary are usually fast and light. However, local models could be based on incomplete and biased statistics, which may lead to loss of performance. There is a need to leverage the scale of distribution, make appropriate abstractions of local models and transfer the insights gained to other local models.

Learning about global data patterns from multiple networked devices or nodes without having access to the actual data is also possible. Federated learning has paved the way on this front and more distributed training patterns such as vertical federated learning or split learning have emerged. These new architectures allow machine-learning models to adapt their deployments to the requirements they are to fulfill in terms of data transfer or compute, as well as memory and network resource consumption while maintaining excellent performance guarantees. However, more research is needed, in particular, to cater to different kinds of models and model combinations and stronger privacy guarantees.

A common distributed and decentralized paradigm is required to make the best use of local and global data and models as well as determine how to distribute learning and reasoning across nodes to fulfill extreme latency requirements. Such paradigms themselves may be built using machine learning and other AI techniques to incorporate features of self-management, self-optimization, and self-evolution.

Trustworthy AI

AI-based autonomous systems comprise complex models and algorithms; moreover, these models evolve over time with new data and knowledge without manual intervention. The dependence on data, the complexity of algorithms, and the possibility of unexpected emergent behavior of the AI-based systems requires new methodologies to guarantee transparency, explainability, technical robustness and safety, privacy and data governance, nondiscrimination and fairness, human agency and oversight, and societal and environmental wellbeing and accountability. These elements are crucial for ensuring that humans can understand and — consequently — establish calibrated trust in AI-based systems [5].

Explainable AI(XAI) is used to achieve transparency of AI-based systems that explain for the stakeholder why and how the AI algorithm arrived at a specific decision. The methods are applicable to multiple AI techniques like supervised learning, reinforcement learning (RL), machine reasoning, and so on [5]. XAI is acknowledged as being a crucial feature for the practical deployment of AI models in systems, for satisfying the fundamental rights of AI users related to AI decision-making, and is essential in telecommunications where standardization bodies such as ETSI and IEEE emphasize the need for XAI for the trustworthiness of intelligent communication systems.

The evolving nature of AI models requires either new approaches or extensions to the existing approaches to ensure the robustness and safety of AI models during both training and deployment in the real world. Along with statistical guarantees provided by adversarial robustness, formal verification techniques could be tailored to give deterministic guarantees for safety-critical AI-based systems. Security is one of the contributors to robustness, where both data and models are to be protected from malicious attacks. Privacy of the data, that is, the source and the intended use, must be preserved. The models themselves must not leak privacy information. Furthermore, data should be validated for fairness and domain expectations because of the bias it can introduce to AI decisions.

Since the stakeholders of AI systems are ultimately humans, methods such as those based on causal reasoning and data provenance need to be developed to provide accountability of decisions. The systems should be designed to continuously learn and refine the stakeholder requirements they are set to meet and escalate to a higher level of automated decision making or eventually to human level when they do not have sufficient confidence in certain decisions.

Human-machine collaboration

Connected, intelligent machines of varied types are becoming more present in our lives, ranging from virtual assistants to collaborative robots or cobots [6]. For a proper collaboration, it is essential that these machines can understand human needs and intents accurately. Furthermore, all data related to these machines should be available for situational awareness. AI is fundamental throughout this process to enhance the capabilities and collaboration of humans and machines.

Advances in natural language processing and computer vision have made it possible for machines to have a more accurate interpretation of human inputs. This is leveraged by considering nonverbal communication, such as body language and tone of voice. The accurate detection of emotions is now evolving and can support the identification of more complex behaviors, such as tiredness and distraction. In addition, progress in areas such as scene understanding and semantic-information extraction is crucial to having a complete knowledge representation of the environment (see Figure 3). All the perceptual information should be used by the machine to determine the optimum action that maximizes the collaboration. Reinforcement learning (RL), which is where a policy is trained to take the best action, given the current state and observation of the environment, is receiving increasingly more attention [6]. To avoid unsafe situations, strategies like safe AI are under investigation to ensure safety along the RL model life cycle. Details of RL is provided in the next section.

AI has also enabled a more complete understanding of how the machine operates with the aid of digital twins. Extended reality (XR) devices are becoming more present in mixed- reality setups to visualize detailed data of machines and interact with digital twins at the same time. This increases human understanding of how machines are operating and helps anticipate their actions. In combination with the XR interface, XAI can be applied to provide reasons for a certain decision taken by the machine.

To make collaboration happen, it is also important that the machines respond and interact with humans in a timely manner. As AI methods involved in the collaborative setup can have high computing complexity and machines might have constrained hardware resources, a distributed intelligence solution is required to achieve real-time responses. This means that the communication infrastructure plays a key role in the whole process by supporting ultra-reliable and low-latency communication networks.

Figure 3. Properties needed for human-machine collaboration

Figure 3. Properties needed for human-machine collaboration

Bringing games and simulations to industrial scale

RL is a machine-learning technique that trains a decision-making model using experiences in a data-driven approach. An RL agent learns by taking actions in an environment. It both exploits its current knowledge and explores the environment with random actions to gain new knowledge. Using the outcomes of such actions, for example, reward, the agent updates its policy with the goal of maximizing some notion of cumulative reward.

One key success factor to applying RL in a virtual environment has been to train agents competing against each other, exploring the range of alternative actions. In this regard, more research is required to figure out how to fully utilize the strengths of RL and other state-of-the-art algorithms, and tailor them to industrial settings and constraints. One avenue is to develop a safe exploration strategy that explores the environment in a controlled and safe manner – for example, by identifying state-action regions or temporal slices where some actions are permitted and confining exploration to them. Another option is to pretrain an agent in a virtual environment such as a simulator or emulator [7], where the agent can explore without limits. This entails a principled way of transferring to a real environment, referred to as Sim2Real. Two recent approaches for Sim2Real are domain randomization where the simulation parameters are randomized for generalized training and domain adaptation where the simulation-trained model is shifted closer to the real domain. A third option is offline RL in which an agent learns from a static dataset. The dataset should include trajectories of state, action, and reward generated by a logging policy. As the agent cannot interact with the static dataset, a critical challenge for training is to avoid being biased toward the dataset that usually contains limited and biased knowledge.

For industrial systems, including telecom networks, however, it is not always possible to explore all actions when the system is operational because an undesirable action and state combination could lead to degraded performance. With more research focused on these techniques that are not exclusive but complementary, we believe that the exploration challenge can be addressed and that the untapped power of RL can successfully be brought to industrial systems.

Conclusion

Based on our insights about intelligent mobile networks of the future and ongoing developments in industry automation, this white paper has highlighted five technical challenges that we believe must be addressed in order to be able to fully capitalize on the potential of AI technologies. The technical challenges involve

  1. systems that can make decisions autonomously,
  2. distributed and decentralized intelligence,
  3. trustworthy AI,
  4. human-machine collaboration, and
  5. bringing games and simulations to industrial scale

We believe these challenges are best addressed through approaches based on strong domain knowledge, including a deep understanding of the underlying connectivity/ communication aspects. We are working to address all of these challenges and we urge the larger research and development community to join us in finding solutions.

3GPP 3rd Generation Partnership Project


AI Artificial Intelligence


CSP Communications Service Provider


ML Machine Learning


O-RAN Open Radio Access Network


RL Reinforcement learning


XR Extended Reality


XAI Explainable AI

References

Further reading

Authors

Viktor Berggren

Viktor Berggren

Viktor Berggren is a Principal Researcher in artificial intelligence at Ericsson Research. He holds an M.Sc. in industrial and management engineering from Linköping Institute of Technology, Sweden. His current focus is on artificial intelligence and coming generations of mobile networks. Viktor has more than 30 years of experience at Ericsson in leading positions within research, product management, and systems management.

Rafia Inam

Rafia Inam

Rafia Inam is a Senior Project Manager at Ericsson Research in the area of AI. She has conducted research for Ericsson for the past six years on 5G for industries, network Slices management, AI for automation, and intelligent transport systems. She specializes in automation and safety for CPS and collaborative robots, trustworthy AI, XAI, risk assessment and mitigations using AI methods. Rafia received her Ph.D. in predictable real-time embedded software from Mälardalen University in 2014. She has co-authored 40+ refereed scientific publications and 50+ patent families and is a program committee member, referee, and guest editor for several international conferences and journals.

Leonid Mokrushin

Leonid Mokrushin

Leonid Mokrushin is a principal researcher at Ericsson Research. With a background in computer science and formal methods, he is currently focusing on knowledge-intensive symbolic AI systems and their practical applications in the telecom domain. He joined Ericsson in 2007 after postgraduate studies at Uppsala University in Sweden, where he specialized in the formal verification of real-time systems. He holds an M.S. in software engineering from Peter the Great St. Petersburg Polytechnic University, Russia.

Alberto Hata

Alberto Hata

Alberto Hata is a senior researcher in artificial intelligence at Ericsson Research. He has been working on topics related to industry 4.0, human-machine collaboration, edge computing, and computer vision. He obtained his Ph.D. degree in 2016 from the University of São Paulo (USP), Brazil, on localization, mapping, and perception for autonomous vehicles and autonomous mobile robots.

Jaeseong Jeong

Jaeseong Jeong

Jaeseong Jeong is a senior specialist AI for optimization at Ericsson Research. His research interests include machine learning, large-scale optimization, telecom data analytic. He received a Ph.D. degree from the Korea Advanced Institute of Science and Technology (KAIST) in 2014. Before joining Ericsson, he was with the Automatic Control Department, KTH Royal Institute of Technology, Sweden as a postdoctoral researcher.

Swarup Kumar Mohalik

Swarup Kumar Mohalik

Swarup Kumar Mohalik is a Principal Researcher at Ericsson Research. His expertise is in the areas of AI and formal methods, and his work primarily focuses on applying them to telecommunications, service automatization, and the Internet of Things (IoT). He has research experience in formal specification and verification of real-time embedded software and AI planning techniques. Swarup holds a Ph.D. in computer science from the Institute of Mathematical Sciences, Chennai, India, and a postdoctoral fellowship at LaBRI, University of Bordeaux, France.

Julien Forgeat

Julien Forgeat

Julien Forgeat is an artificial intelligence Principal Researcher at Ericsson Research. He joined Ericsson in 2010 after spending several years working on network analysis and optimization. He holds an M.Eng. in computer science from the National Institute of Applied Sciences in Lyon, France. At Ericsson, Julien has worked on mobile learning, the Internet of Things (IoT), and big data analytics before specializing in machine learning and AI infrastructure. His current research focuses on the software components required to run AI and machine learning workloads on distributed infrastructures as well as the algorithmic approaches that are best suited for complex distributed and decentralized use-cases.

Stefano Sorrentino

Stefano Sorrentino

Stefano Sorrentino is a Principal Researcher at Ericsson Research in Stockholm (Sweden), coordinating research activities on automotive, transportation, and public safety. He has years of experience as a 3GPP RAN1 delegate and has contributed to the creation of new systems for public safety and automotive. Stefano is the author of several IEEE conference and journal papers and he received the Ericsson Inventor of the Year Award in 2018. More recently, Stefano has been a board member and Chairman of the System Architecture group in the 5G Automotive Association (5GAA).