What causes an increase in dropped calls during the summer?
Human behaviors and physical processes are often seasonal – and that includes the performance of mobile communications networks. Simone Vincenzi outlines his team’s investigation into why Dropped Call Rates (DCR) vary over time and space, and why they drop more over the summer months.
Most human behaviors and physical processes are influenced or driven by changes in weather, climate, time of the year or time of the week. Office buildings are busier during the weekdays than during the weekends, and people tend to spend more time outside and drive their vehicles more in the summer than in the winter.
Similar dynamics affect the performance of mobile communication networks: Radio Access Network’s Key Performance Indicators (KPIs) vary between weekdays and weekends, and over the seasons and years. For instance, both the number and rate of calls that “drop”— that is, calls that are terminated without any of the parties intentionally interrupting the call— tend to increase in the spring, reach a maximum in the summer, and decrease in late summer or early autumn (Fig. 1). The Dropped Call Rate (DCR, the percentage or proportion of calls that drop), which telecom operators use as one of the critical measures of the quality of the service they provide to customers, also tends to be lower during the weekend and during certain holidays, such as the Christmas holidays.
Telecom operators keep track of DCR, both overall and in specific regions (say, the North-Eastern United States), markets (New York City, for example), or districts and neighborhoods. Since telecom operators want to maximize both customers satisfaction and revenue, proper design, maintenance, and operation of mobile communication networks involve keeping both DCR and operator expenses as small as possible.
My team in Ericsson’s Global Artificial Intelligence Accelerator (GAIA) worked on identifying the determinants of DCR variation in a mobile communication network across time and locations, and then proposed solutions to “flatten” the time series over seasons and years. This “flattening”, that is no increase in DCR during the warmer months, may be achieved by changing the configuration parameters of cells or different network configurations for the most problematic locations (this latter point will be subject of future posts). My team worked on this problem with one of Ericsson’s key customers and partner operators, and in collaboration with the Ericsson Network Design Optimization team.
Hypotheses and challenges
Some theoretical and empirical studies have investigated how seasons and times of the day or the week affect DCR. At the same time, telecom engineers have developed some intuition on the root causes of DCR variation across days, weeks, and seasons. In particular, researchers and operators have identified variations in foliage (the collection of leaves of plants or trees at different spatial grain, from single tree to parks and forests) and in vehicular traffic are two of the most likely determinants of the seasonal variation of DCR.
One of the hypotheses is that both increased foliage coverage and vehicular traffic make “handoffs”— the process of passing from one base station to another while maintaining “always best connect” to the network—either more problematic, more frequent, or both. To allow call handoff to occur, base station coverage must overlap with other cells in the area, but increased foliage may reduce the coverage overlap of base stations by absorbing or blocking part of the radio signal, therefore making dropped calls more likely.
However, researchers and telecommunication engineers have rarely been able to formally test their hypotheses— using either experiments on live networks, or developing and fitting modern statistical or machine learning models on observational data— on the root causes of the increase in DCR during the warmer months. There are a few reasons behind this lack of tests and models.
First, different spatial scales of aggregation of DCR time-series data from cells or eNodeBs— which are LTE base station that supports multiple cells—can lead to vastly different results: noise can overwhelm signal when using data from single cells or eNodeBs, but aggregating thousands or cells or eNodeBs can hide or confound otherwise evident seasonal effects on DCR (hereinafter, I will use “cells” to indicate both cells and eNodeBs).
Then, the seasonal degradation of DCR observed at different level of spatial aggregation emerges from the contribution of thousands of individual cells within a region or market, but not every cell shows an increase in DCR during summer.
Choosing the correct spatial scale aggregation level is therefore critical for testing hypotheses on the root cause of DCR seasonal variation, but deciding at what scale to aggregate data from cells is challenging. This problem is a variety of the ‘modifiable areal unit problem’ (MAUP), which is a well-known source of statistical bias that occurs when point measures of spatial phenomena are aggregated or clustered together.
MAUP is fundamentally unsolvable, since aggregating or grouping data will inevitably cause some bias or loss of information, and finding the correct scale of aggregation for the statistical question posed may require some trial and error. Then, the usefulness of time series KPIs typically diminishes or expires after no more than two or three years, since changes in technology and network operations make older data less relevant for the current understanding of the root causes of KPI degradation and for predicting future KPI values.
Second, experimenting on live networks— such as changing cell configuration parameters, orientation of cells, or alternative network configurations — can be too risky given the complexity of mobile communication network and the essential service they provide. In addition, the number of potential changes in configuration parameters, cells orientation, and network configuration is vast, and too many experiments might be needed to pinpoint the most effective solution.
Lastly, identifying cause-effect relationships using observational data is inherently challenging. There are two main properties that are necessary for inferring causality in time-series data: (1) the cause precedes its effects in time, and (2) altering the causal variable affects the outcome, that is the value of the caused variable. Both properties are central to the modern theory of causality, but with complex physical processes, multiple correlated features, and no replicates over either time or space, formal causal inference is likely to be out of reach.
Methods and results
We started with the hypothesis that a variation in foliage coverage is a strong determinant of variation in DCR at the market (e.g., New York City is a market) or sub-market level of aggregation. North-Eastern US markets were the major focus of the investigation, and we used Southern California markets as control. A smaller DCR seasonal variation in Mediterranean climates—like that of Southern California—than in continental climates like those of New York or New England, would support the hypothesis that foliage coverage is among the root causes of DCR variation over seasons. We decided to aggregate cells KPIs at the congressional district level, which provided a fairly replicable and manageable level of aggregation of individuals cells KPIs (Fig. 3).
Using public data repositories, we collected foliage coverage data (Leaf Area Index = leaf area / ground area); weather data (e.g., precipitation, humidity, extreme weather events such as snow and rainstorms) and, events/calendar data (e.g., weekends, holidays, days with schools off, big events such as marches and sports events).
Since most KPIs are correlated with each other and the same is true for many environmental factors (for example, in the summer we may consistently observe higher humidity, fewer clouds, and higher temperatures), we did not attempt to formally infer causality. Instead, we fit Random Forest models to the time-series data, which we constructed as a one time series for each congressional district. Congressional districts included between 1000 and 2000 cells and 150 to 250 eNodeBs each.
We used KPIs, foliage, weather, and event and calendar effects as features. We used accuracy of prediction of the test data sets and feature importance to gather “circumstantial evidence” on the association between foliage and other candidate features in increasing DCR in the warmer months. In other words, we integrated the results from the Random Forests with the insights and intuition of domain experts, and then considered the whole emerging causal picture as tending to confirm or disconfirm the causal hypothesis we formulated.
We found that in most CDs, foliage coverage, along with the average Reference Signal Received Power (RSRP) and weekend effects, were the most important feature for explaining variation in DCR over time (Fig. 4). The importance of foliage coverage tended to be greater in rural locations, and we found a weak correlation between foliage coverage and DCR in San Diego. Instead, we found that in San Diego, the cells near the main roads leading to the beaches were among those showing the greatest increase in DCR during the warmer months. This result suggests that higher vehicular traffic may lead to higher DCR during the warmer months.
Our investigation therefore supports the hypothesis that increased foliage and increased vehicular traffic are the root causes of the increased DCR that is often observed during the warmer months.
Jieneng Yang, Yuyao Hu, Prasad Jogalekar, Jenny Yang, Aditya Shah, Nizar Faour, Josko Zek, Emad Edaibat, Francois Milly, and Jeanette Irekvist all contributed to the success of the project. Paul Mclachlan helped improve in concept and form a first version of the post.