Skip navigation

Symbolic reinforcement learning for safe RAN control

Video

Demonstrating "Symbolic reinforcement learning for safe RAN control" as part of a paper accepted to AAMAS 2021 - 20th International Conference on Autonomous Agents and Multiagent Systems

Abstract

Reinforcement learning (RL) is a powerful approach when faced with the need to make optimal decisions while interacting with uncertain environments. RL performs well when optimizing a given criterion encoded via a reward function, and has been applied in many use cases such as robotics, autonomous driving, network optimization, etc. However, the large-scale exploration performed by RL algorithms can sometimes take the system to unsafe states.

We present our work on a Symbolic Reinforcement Learning (SRL) based architecture for safe control in Radio Access Network (RAN) applications.

For this work, we are looking at the problem of optimizing Remote Electrical Tilt (RET) in a network consisting of a set of fixed Base Stations (BS) equipped with antennas, which can be controlled by adjusting their vertical tilt angle affecting the coverage area, signal quality and related Key Performance Indicators (KPIs).

Recent research has focused on performing the RET optimization by employing RL strategies for their self-learning capabilities and ability to adapt in uncertain environments, potentially leading to unsafe states.

The term safety refers to particular constrained bounds of the network KPIs in order to guarantee that the performance is maintained when the algorithms are deployed in a live network.

We propose a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology in order for the latter to execute optimal safe performance which is measured through certain KPIs.

In our automated tool, a user can shield an RL agent running in a given cellular network with aim of optimizing network performance by selecting a high-level safety specification expressed in Linear Temporal Logic (LTL) in terms of certain Key Performance Indicators (KPIs).

The safety is ensured through model-checking techniques over combined discrete system models (automata) that are abstracted through the reinforcement learning process.

In the video we demonstrate the user interface (UI) helping the user set intent specifications to the system, and inspect the difference in agent proposed actions with or without the shield allowing and blocking actions according to the safety specification.

Arxiv preprint