Skip navigation
Like what you’re reading?

How to automate resource dimensioning in cloud

Resource dimensioning is one of the key problem areas in resource allocation in telecom and cloud environments. This blog post explains some key challenges and presents a way of resolving them using a Reinforcement Learning-based approach.

Senior Researcher, cloud intelligence

Master Researcher, Cloud intelligence

How to automate resource dimensioning in cloud

Senior Researcher, cloud intelligence

Master Researcher, Cloud intelligence

Senior Researcher, cloud intelligence

Contributor (+1)

Master Researcher, Cloud intelligence

What is resource dimensioning in cloud?

Resource dimensioning refers to the process of identifying the amount of resources needed to accommodate Network Functions (NFs) in a cost-effective fashion. It involves aligning the performance criteria defined in a Service Level Agreement (SLA) with the resources allocated within the infrastructure for each Network Function (NF). This alignment ensures the desired application performance levels are achieved while optimizing the utilization of Network Functions Virtualization Infrastructure (NFVI) or cloud resources. The same is true for microservices where the challenge is to determine the resource needs of individual microservices to achieve a given KPI for the entire application, or alternatively, for one or more specific microservices.

A Network Service (NS) generally contains a set of NFs with some dependencies. The dependency relationship among the NFs can be represented as a chain of NFs (in the case of Service Function Chain (SFC)) or by a more general graph topology. In cloud computing, a similar concept can be found in microservices whereby an application is composed of a set of microservices that interact with each other to implement the functionality of the application.

An NS is often accompanied by KPI requirements that are specified in its SLA. These KPI requirements indicate the expected level of performance for the NS.. Examples include throughput, average latency, or a maximum allowed level of jitter.

Resource dimensioning is one of the key problem areas in resource allocation in telecom and cloud environments. Considering SFCs for instance, there are clear dependencies between (physical/virtual) NFs, the output of the first NF is the input of the second, and so on. The performance of a given NF does not depend only on the type and amount of resources available to it, but also on the performance of other NFs in the chain (for example, the preceding NFs). This gets even more complicated in more complex topologies like split and merge topologies. This makes developing accurate solutions for resource dimensioning a challenging task.

Use case

Let us consider the 5G Core (5GC) network architecture shown in Figure 1. The figure shows the main NFs in 5GC associated with the packet core and user data management domain. In this architecture, each NF offers one or more services to other NFs through various interfaces. These NFs communicate with each other for different procedures and each procedure triggers different NFs. Therefore, the performance of a given NF depends on the performance of other NFs as well. Understanding these interactions and interdependencies is important to achieve efficient resource dimensioning.

5GC Network Architecture

Figure 1: 5GC Network Architecture

Figure 2 shows Ericsson’s dual-mode 5GC portfolio as one realization of 5GC covering both Evolved Packet Core (EPC) and 5GC capabilities. The figure depicts the components of 5GC/EPC NFs grouped into a number of products. The green color boxes indicate existing EPC NFs and the orange boxes show 5GC NFs. The software architecture is based on cloud-native design principles. These cloud-native NFs can be virtualized or physical NFs (VNFs and PNFs).

The purpose of resource dimensioning is to ensure that the NFs within 5GC are allocated the correct amount of resources so that they can achieve the expected performance of the application with respect to various operation requirements.

One example of a procedure involved in 5GC is the registration procedure of the user equipment (UE). The registration procedure is initiated from the UE and communicated to the 5GC through the 5G base station; gNodeB (gNB). When the registration request is received by the Access and Mobility Management Function (AMF) at the 5GC, it initiates the default Packet Data Unit (PDU) session setup. At this stage, different 5GC NFs are involved in handling the registration request, including the Session Management Function (SMF), Policy Control Function (PCF), and Authentication Server Function (AUSF). The AMF authenticates the UE after obtaining the keys from the AUSF and then, obtains subscription data from the Unified Data Management (UDM). It then creates a policy association with the PCF. The PCF registers with the AMF so that it can be notified of events like location changes and communication failures. The AMF then updates the SMF context and sends an initial context setup request to activate the default PDU session. The SMF assigns an IP address and the tunnel ID to be used for sending uplink data and selects the User Plane Function (UPF) to be used for the session. The AMF notifies the SMF when the session is ready for uplink and downlink data transfer. Once the session is ready, the gNB enables security between the UE and itself and activates the default PDU session. At this point, downlink and uplink data streams are ready to flow between the UE and the internet.

The registration procedure could have requirements, for instance, in terms of completion time, which is the time it takes for the registration procedure of the UE to 5GC to complete, or, registration rate, which is the number of registrations that could be completed by 5GC per unit time. For a given set of requirements, we need to understand how many resources to allocate to individual NFs. If the resources of an individual NF introduce a bottleneck, it can degrade the entire 5GC performance.

A key challenge is to know how much resources are to be allocated to individual NFs while considering the interdependencies between the NFs. Today, this is often a manual task where an expert determines beforehand the amount of resources needed for each NF to ensure a specific level of performance for the entire application. To automate this process, we propose a Reinforcement Learning (RL) -based approach that takes in the KPI requirements for an NS and outputs the resource dimensions (amount of resources to be allocated) for each NF in the NS. Our RL-based approach can be trained for different conditions and different requirements. It can even be trained to provide resource dimensions for different NSs with different NFs and different dependencies among the NFs.

Ericsson's dual mode 5G core portfolio

Figure 2 : Ericsson’s dual-mode 5G Core portfolio

Our RL based solution addresses the following challenges for resource dimensioning in cloud

As shown in the previous section, NFs communicate with each other during different procedures. Capturing these interdependencies between NFs is crucial, since the performance of an NF is not only dependent on itself. Accordingly, one must generate resources for each NF taking into consideration its interactions with other NFs.

In addition, classical solutions based on profiling have high resource consumption. This is because they need to run the application/NS with different combinations of resources for the NFs. Unless something is done to limit this, the combinations will grow exponentially with the number of NFs in the NS. Trying out all possible combinations of resources will result in a huge cost and take very long.

Yet another challenge is that things change over time, and models will become invalid or have to be updated. Existing data on the system might not reflect the actual situation of the real system, with which the accuracy of the model might degrade.

How Reinforcement Learning works

Reinforcement Learning (RL) is a branch of AI in which an RL agent learns to take the correct action for a given system state through trial and error. The agent does this by repeatedly performing the following steps, as shown in Figure 3.

  1. interacting with the environment,
  2. observing how the system responds, and
  3. taking actions to maximize a long-term goal based on the knowledge available to the agent at each stage.
Reinforcement Learning Block Diagram

Figure 3: Reinforcement Learning Block Diagram

Actions are followed by feedback from the environment; either negative values (punishment) or positive values (reward), which allows the agent to make better decisions in the future. In RL, the environment can be a simulation of a system or a real system.

The training phase in RL is divided into episodes. In each episode, the RL agent repeatedly chooses actions based on its current state with the purpose of reaching some target state. RL uses a strategy called Epsilon Greedy policy, which chooses between exploration and exploitation by estimating the highest reward. At the beginning of the training, the agent starts by exploration, where the agent attempts to discover the behavior of the environment by taking random actions without the knowledge of what is going to happen. This allows the agent to improve knowledge of the effects of each action at a given state. Such episodes often end without reaching the target state. Gradually the agent exploits its current knowledge and prefers past actions that have been found to be effective at producing good rewards. Such episodes are more likely to reach the target state once the model is well trained. At the end of the training, the agent will have enough knowledge to make the best decision for each possible state of the environment.

The RL problem can be modeled as a Markov Decision Process (MDP), a mathematical framework used to model decision-making problems where the outcomes depend on both the actions taken and the underlying system's state. MDP is used to formalize the RL agent’s problem and store the knowledge about the managed system. This knowledge is defined by a set of states, which represent:

  • the position of the agents at a specific time-step in the environment
  • a set of actions representing decisions we want the agent to learn
  • transition probability from one state to the next state
  • an immediate reward obtained after executing an action reflecting how good the executed action was.

The purpose of the RL agent is to obtain this knowledge through its interactions with the system.

RL is one way to solve the problem of resource dimensioning in cloud environments. As such, it can become a mechanism for service providers to fulfill their SLA, and for cloud infrastructure providers to efficiently use their resources. In the next section, we present how we adopt RL to solve the automated resource dimensioning problem.

Reinforcement Learning for automated resource dimensioning

When using RL to solve the problem of resource dimensioning in a cloud environment, the training of the RL can be done for a number of episodes to generate a model that maps the requirements of the application (for example, target performance) to the resource amount. The goal of the RL agent is to identify the correct set of actions resulting in each NF getting the optimal amount of resources it needs to fulfill the application’s KPI or requirements.

The state space is composed of the current resources of the NFs and their current performance.

The actions in the action space indicate increasing/decreasing a resource of an NF with predefined minimum and maximum values.

The reward function is designed such that a positive reward is given when the performance of an NF is improving, a negative value when it is degrading, and a value of 0 when the performance does not change.

The state space and the action space in the resource dimensioning problem are too large to be completely known, so our approach is to use Deep RL (DRL) to solve the problem. DRL can take in very large inputs and decide what actions to perform to maximize the reward. This is done by using a Deep Neural Network (DNN) in the RL agent as a function approximation. With this, the DRL can estimate the agent’s preferred actions in different states based on a limited set of observations.

Figure 4 explains how our proposed RL-based approach works. The environment simulator can be implemented using existing tools, such as OpenAI Gym.

RL for Automated Resource Dimensioning

Figure 4: RL for Automated Resource Dimensioning

Step 1: The environment simulator needs to be instantiated with workload and performance-related parameters from the actual cloud system. To achieve that, the application of interest is deployed in some cloud environment, for example, Kubernetes.

Step 2: In this step, we monitor the resources consumed by the NFs (or microservices) from the hosting nodes and the performance of each NF for a time period. With that, we can extract and estimate some key workload and performance-related parameters, for example, relative workload distribution, relative service dependency, and relative service performance metrics per unit resources. For example, the relative service dependency represents the weights or the percentage of how much an NF’s performance depends on itself and on other dependent NFs.

Step 3: The environment simulator is then initialized using the estimations generated in step 2 to capture the behavior and the characteristics of the workload and the real system.

Step 4: After initializing the environment simulator, the training of the deep reinforcement learning agents starts. The training is done for a number of episodes. At the beginning of each episode, the agent starts with a random resource dimensioning and generates a random target performance indicating the expected performance of the application.

The resource dimensions are generated using a DNN, which generalizes from previously experienced states to new states through function approximation . The state is given as input and the value of all possible actions is generated as output. The input state of the DNN consists of the current resources of an NF and their current performance. The generated resource dimensions are applied as actions on the environment simulator and a reward is calculated. The reward reflects whether the performance of the NFs is improving or degrading.

The actions are selected using a modified version of Epsilon Greedy policy . At each episode, we select between exploration and exploitation per NF. This means for a graph of NFs, in a single episode, for some NFs the agent takes actions by exploration (randomly), whereas for others an action is selected by exploitation. For the exploration part, the actions for each NF are not taken randomly. Instead, the predicted values of the actions are taken, the probability distribution of each action is calculated, and finally, an action is selected randomly considering their probabilities.

Once the RL agent is trained and learned, the model will be able to dimension resources for any target performance, given any starting resource dimensions.

Step 5: At this step, the solution receives a target KPI (or expected performance) for the application. Its responsibility is to generate resource dimensions for the NFs such that the application meets its KPI. To achieve this, the solution initializes the simulated environment with the latest estimated parameters obtained from the cloud environment (Step 3). It then dimensions resources using the trained RL model.

The solution checks then whether the application meets the target KPI. If yes, then the process ends. This step might involve waiting for a predetermined amount of time to ascertain that the behavior of the application has stabilized following the change in the resources.

If the target KPI is not met, then the solution goes back to Step 2 and extracts the latest workload and performance-related parameters from the cloud environment. At this stage, the environment simulator is updated and initialized again (Step 3) with the latest workload and performance-related parameters obtained from the cloud. Once, the environment simulator is initialized, we go again to Step 5 to perform inferencing and generate resource dimensions for the NFs. These steps are repeated until the target KPI is met.

The approach explained above can have limitations; the accuracy of the model gets degraded when the characteristics of the application change due to factors like a software upgrade, a change in hardware, changes in workload characteristics or even changes in the forwarding graph of the application. Ways to address these limitations:

  • To capture different applications, the model can be trained for random service dependencies until it is accurate enough.
  • To capture the behavior of the system under different workload profiles, the model can be trained for random service performance metrics per unit resources until it is accurate enough.

These trainings can happen in different sequential orders. Once the model is accurate enough for a specific training purpose (capturing different applications, for example), it can be trained for another purpose (like capturing the behavior of the system under different workload profiles).

Evaluating the solution

We evaluated our proposal for a microservice-based application with 13 microservices.

  • Setup: The application was deployed in Kubernetes. We monitored the application’s metrics using Prometheus. The RL agent was trained for 200 episodes, each episode was limited to a maximum of 500 iterations. The DNN consisted of three hidden layers, each layer with 78 neurons.
  • Metrics: We considered throughput as the target KPI. We extracted performance and workload-related parameters that were used to initialize the environment simulator that is, OpenAI Gym.
  • Results: The solution was evaluated in terms of convergence of RL learning, where the RL agent started learning after 55 episodes. The training of the RL model Nvidia A100 GPU took about eight hours. We also evaluated the inference process. In this evaluation, a trained RL agent successfully generated the necessary resource dimensions after nine steps, effectively achieving the targeted KPIs for the applicationThese nine steps for inferencing took about one second.


In this blog post, we have presented the challenges associated with resource dimensioning considering interdependencies between NFs. This is a key problem because the amount of resources allocated to NFs determines how much performance we can get from the system. In addition, these resources need to be dimensioned efficiently. We presented our solution to overcome these challenges using an RL-based approach. We have also evaluated our solution to assess its performance.

In the future, we will look into a Multi-Agent Reinforcement Learning (MARL) based approach, where a team of agents collaborate to maximize a global reward, based on local and remote observations. MARL allows making parallel computation and the system can have a higher degree of scalability. We are planning to do that by associating an RL agent with each NF. This will allow us to train an NF agent that understands the resource needs of said NF during training time.

Learn more

Ericsson 5G Core (5GC)

The Ericsson Blog

Like what you’re reading? Please sign up for email updates on your favorite topics.

Subscribe now

At the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.