Like what you’re reading?

How AI-powered capacity planning will impact network expansion decisions

An explosion in data consumption calls for an efficiency-focused approach to networks, with reducing total cost of ownership as a key focus. So how can communication service providers leverage AI to improve network capacity planning, improve subscriber experience, and reduce TCO?

Dec 18, 2020 | 7 min.

Eugene Pang

Dec 18, 2020 | 7 min.

Eugene Pang

Network complexity is ever increasing. The introduction of 5G on top of legacy 2G, 3G and 4G networks, coupled with subscribers’ increasing expectations of a mobile experience close to fiber broadband, puts tremendous pressure on the communication service providers managing day-to-day operations. Service providers also face immense financial challenges due to decreasing revenue per gigabyte and market saturation, making it critical for survival to ensure maximum return on network investment decisions.

How can we leverage AI to transform our approach to network investment decisions, in order to make it faster, more granular, and able to quickly assess a variety of precise what-if scenarios that take traffic forecasts, user experience and revenue potential into consideration?

From slow and manual to fast, scalable and flexible

A typical capacity planning exercise starts with the planning strategy phase. This is where network performance, user experience expectations, market segmentation and business strategy, are analyzed to form the basis of the input to the capacity planning phase.

Figure 1: Planning phases

In the capacity planning phase spreadsheets are often used, when lacking a more sophisticated approach. The pressure falls on planning engineers to use their best knowledge to create a spreadsheet to perform traffic forecasting and performance prediction.

The legacy manual way of planning requires a substantial amount of resources and time. A typical medium size network of 10,000 sites requires weeks of labor-intensive tasks such as data collection, data aggregation, forecasting, triggering, dimensioning and prioritization. After such immense effort, only a few scenarios are produced for business case evaluation before producing the Bill of Quantity (BoQ) that serves the TCO assessment. Based on our experience, 60 percent of the time is devoted to data collection and low-value manual data processing using spreadsheets, leaving about 20 percent for the planning strategy phase and 20 percent for the scenario evolution phase – which are crucial phases where engineers can add a lot of value.

Due to this ineffective approach, manual planning typically does not analyze the network in a holistic manner, but reverts to looking only into a few KPIs to produce a final proposal, and a “one rule fits all” principle that assumes all cells to be equal, and is unable to address the needs of individual cells in a surgical manner. Manual planning is also prone to human error, so the outcome might deviate from time to time.

Figure 2: Manual execution vs. machine-learning-powered execution

Cognitive planning automates complex data collection, aggregation, forecasting, dimensioning triggers and prioritization to bring a real planning scenario analysis from weeks down to five minutes in a LTE network of 1,800 sites.

By applying machine learning, each cell is modelled individually, based on its unique characteristics. Multivariable modeling techniques consider multiple KPIs to predict the performance of each cell, individually. This surgical analysis eliminates the “one rule fits all” method can address individual cell capacity issues in greater detail.

The tremendous reduction in turnaround time enables operators to explore multiple scenarios by incorporating different planning inputs into the tool. By making full use of the flexible input, it is possible to segment the network into multiple market areas, to set thresholds on more than 20 KPIs and dimensions, and to flexibly strategize spectrum allocation.

Cognitive planning allows for quick exploration of multiple what-if scenarios that balance different capacity expansion TCO against performance gains, turns an seasonal activity into something that can be triggered anytime and it is key to decide on the optimum investment strategy going forward.

From network-centric to user-centric

How can you predict future performance? A common approach is to use simple rules to compare utilization to user throughput, for example, a simple prediction such as DL PRB Utilization > 80 percent leads to user throughput < 5Mbps; DLPRB Utilization > 92 percent leads to user throughput < 3Mbps. These simple rules are applied as a blanket and fail to consider cell bandwidth, load and radio quality.

Figure 3: Operator forecast and prediction method

Resource utilization such as DLPRB utilization is just one of the few factors that impacts user throughput. To illustrate the importance of considering load and quality, we gather cells with full LTE carrier bandwidth (20MHz) and high utilization (DLPRB% > 80 percent), plot their active users against CQI and use colors to illustrate the user throughput.

As the graph shows, not all cells in this group have bad user throughput (< 5Mbps) despite a high utilization (DLPRB percentage > 80 percent), debunking the legacy rule. This is because user experience also relies on network quality (CQI) and load (active users), so we can still have good performance cells with high utilization as long as the load is sufficiently low (few active users) and the quality is good (high CQI). This goes to show how the “one rule fit all” methodology is less effective for enabling targeted investment and proactive user experience assurance.

Figure 4: User throughput variation of high PRB utilization cells

In the following graph, we can see the cells with high utilization (DLPRB > 80 percent) display all ranges of user throughput, ranging from low to high. When all cells with high utilization (DLPRB > 80 percent) are proposed for expansion, those that are still have a good level of performance of more than 5Mbps user throughput will be over-dimensioned – resulting in unnecessary expansion expenditure. However, there could be other groups of cells where utilization is not as high (DLPRB < 80 percent) yet user throughput is bad (< 5Mbps) which are unnoticed and not captured as candidates for expansion, meaning the subscribers end up suffering.

Figure 5: DLPRB utilization versus user throughput

Cognitive planning overcomes this inefficacy through individual cell KPI modelling using machine learning, with main two steps involved: traffic forecasting and KPI prediction.

Historical data is automatically collected, processed and stored as the basis of these two processes. Traffic growth is forecasted using Holt or Holt-Winter models. A per-cell machine learning model then predicts the performance of each individual cell, using the traffic forecast and referencing the KPI model, which is the heart of KPI prediction.

The KPI model is an important component of cognitive planning, unique to every cell, and it’s retrained on a daily basis using the latest data. The slope and angle of the per-cell model is influenced by its own unique characteristics, which is in turn affected by bandwidth, load, radio quality and other variables. This method allows for improved granularity and accuracy to predict the KPI and performance of each individual cell.

Figure 6: Machine learning performance prediction

From unidimensional to multi-dimensional

Identifying poor performing cells is just the first part of the story, to apply the right solution is just as crucial to avoid unnecessary expansions. Capacity and performance are often analyzed in a unidimensional way, with one-to-one mapping of resources like PRB to capacity and performance. However, cell capacity and performance are also largely influenced by other important factors – radio quality and load. Optimal TCO requires a new way to evaluate capacity, which ensures the right solution is applied to each problem.

Figure 7: A new way of looking into capacity

Identifying candidates for optimization

By applying additional criteria, like radio quality (CQI) and spectral efficiency in cognitive planning, cells that are poor in radio quality and poor performance are listed as candidates for optimization instead of expansion, as shown in Figure 8, radio quality (CQI) can have quite profound impact on the maximum traffic volume that can be catered by a cell. All cells in the graph below are having high utilization (DLPRB between 90% to 100%) yet the maximum traffic volume that each cell can cater varies, depending on respective radio conditions. Cells with good quality (CQI 10) are able to cater 53 percent more traffic than those with lower quality (CQI 8). Optimization actions are suitable to address poor radio quality cells as first resort to boost the capacity and performance of the network.

Cognitive planning identifies poor radio quality cells, where capacity can be improved through optimization, rather than spending CAPEX for hardware expansion.

Figure 8: CQI impact on data volume at full DLPRB utilization

Identifying candidates for load balancing

Cognitive planning software will always simulate load-balancing among co-sector carriers before recommending capacity expansion. This is to exhaust any room for load-balancing related improvements ahead of investment. Utilizing the machine-learning-trained KPI model, the load-balancing effect is simulated by assuming traffic shift from one carrier to another and then predicting the performance of all the carriers within the same sector.

Expansion is only recommended if the predicted performance after the simulation still does not meet the designated experience criteria, otherwise a list of cells advised for load-balancing is churned out by the tool, for engineers to performance load-balancing optimization.

Figure 9: Machine learning-powered load-balancing simulation

The value of prioritization

In times where service providers are cautious about CAPEX spending, a good prioritization methodology is essential to maximize the return on investment (ROI) while ensuring that the most critical areas are addressed. Legacy methodologies put this objective at risk, as they fail to identify the right action (optimization, load balance, expansion) and lack flexibility to capture all relevant metrics.

For example, one of the service providers that we worked with relied solely on utilization (DLPRB utilization). When the planning department were told by the management to prioritize sectors for upgrade and work within the allocated CAPEX, typically they would prioritize sectors with the highest DLPRB utilization. However, as proven by the analysis shared, we know that utilization is just one factor that affects throughput performance and will fail to indicate how many users stand benefit from the potential capacity expansion.

We can think of a network like the restaurant chain owned by Mr. Smith, who had two restaurants: A and B. As his business is expanding, he is ready to invest on one more establishment, which could be nearby restaurant A or B. His main goal is to offload the crowd, as he often receives complaints from customer for long waiting time. Evidently, he would want to ensure he maximizes the return on his investment.

If the only metric that Mr. Smith considers is the space occupancy rate of his current two restaurants to gauge where he should position his third outlet, then it is just like when service providers solely use DLPRB utilization for prioritization. This is very tricky as both restaurants are crowded at peak hours. However, if Mr. Smith also studies the number of customers being served together with those waiting outside to be served, he would have a much better picture of which restaurant to offload, in order to reduce customer waiting time and maximize the return on investment of his expansion. Restaurant B is the obvious choice to be offloaded with his new restaurant. But if Mr. Smith prioritizes solely based on space occupancy rate, he might invest in the wrong place, and be unable to resolve the most severe customer waiting time problem without maximizing his investment.

Figure 10: Multidimensional dimensioning explained with an analogy

In network planning terms, load is analog to the number of customers, utilization (DLPRB utilization) is analog to the space occupancy rate and quality (CQI) is analog to the size of the restaurant. When we work on prioritization with our customer, we first pinpoint cells that can be potentially improved through optimization such as radio quality and load-balancing optimization. In the next step, considering the cells remaining, we incorporated the number of RRC users which is close approximate to load as the prioritization metric into the prioritization process. Additional thresholds were set to prioritize, depending on cell-bandwidth leading to the more accurate ranking presented below.

Figure 11: Prioritization threshold

With the combination of all these factors, we deliver optimal recommendations and expansion flow (e.g. optimization and load balancing ahead of expansion) and create true value, compared to the traditional method through a detailed analysis conducted with three months of field data:

~25 percent less carrier and site expansions, thanks to recommendations that account for optimization actions as a first resort whenever suitable.
Congested cells identified by cognitive planning have higher user and traffic density, with an average of 21 percent more RRC users per cell and 19 percent more data volume per cell compared to congested cells identified by operators. Thus, maximizing the ROI of capacity expansion.

Figure 12: User and traffic density differences

>75 percent field validated accuracy identifying which cells to expand when, three months ahead of experience target being unmet.
Lower churn (identify “bad experience” cells not picked by traditional methods)

Conclusion

The radio access network plays a critical role in CSPs’ overall spend for the mobile network infrastructure, with RAN equipment accounting for approximately 20% of mobile operator’s capex. Our research shows that operators with leading network quality enjoy a higher average ARPU (+31 percent) and lower average churn (-27 percent). As discussed through this blog, the use of AI for capacity planning is instrumental to make smart network investment decisions that optimize TCO while providing the best return in terms of network quality: a fundamental pillar for CSP market success.

Learn more

Explore cognitive technology solution

The Ericsson Blog

Like what you’re reading? Please sign up for email updates on your favorite topics.

Subscribe now

At the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.

How AI-powered capacity planning will impact network expansion decisions

Identifying candidates for load balancing

The value of prioritization

Conclusion

Learn more

RELATED CONTENT