Monitoring IoT application performance with Machine QoE

New capabilities in 5G make it ideal to provide cost-effective solutions for mission-critical Internet of Things systems that depend on ultra-reliability and low latency. A necessary component when creating these solutions is an awareness of the QoE that the network offers the IoT system.

Magazine

#ericssontechnologyreview

Ericsson CTO Erik Ekudden’s view on an M-QoE framework

Mission-critical Internet of Things (IoT) systems are heavily dependent on the ultra-reliability and low latency that comes with 5G. To ensure consistently high QoE in these kinds of systems, though, communications service providers (CSPs) need the support of a framework that is specifically designed to monitor the QoE of IoT devices – Machine QoE (M-QoE).

This Ericsson Technology Review article proposes an M-QoE framework designed to ensure a consistently high level of customer satisfaction. It also presents an interesting use case for M-QoE in the context of next-generation smart grids. Our research shows that the framework will assist CSPs in monitoring and predicting the M-QoE of an enterprise with widespread devices over their infrastructure without the need for expensive large-scale measurements.

August 11, 2020

Authors: Constant Wette Tchouati, Steven Rochefort, George Sarmonikas

Download article as pdf

Terms and abbreviations

5GC – 5G Core
AI – Artificial Intelligence
CSP – Communications Service Provider
E-KPIs – Expected Value KPIs
eMBB – Enhanced Mobile Broadband
IoT – Internet of Things
KPI – Key Performance Indicator
M-QoE – Machine QoE
ML – Machine Learning
mMTC – Massive Machine-Type Communications
O-KPIs – Observed KPIs
PDC – Phasor Data Concentrator
PMU – Phasor Measurement Unit
URLLC – Ultra-Reliable and Low-Latency Communications
WAMS – Wide Area Measurement System

The Internet of Things (IoT) is a worldwide network of physical objects – buildings, cars, wearables, industrial machines and so on – that are equipped with connectivity devices to build a communication network where connected objects can exchange data with other objects.

Almost anything can be connected to anything in the IoT, creating a complex network of connected machines, from simple wireless tags to sensors or actuators with capabilities to sense, communicate, process data or control their environment.

The potential to apply IoT technology is endless, stretching across all industry verticals, resulting in a wide variety of devices and communications requirements on the underlying network. Large-scale deployment of cellular IoT devices is expected in the years ahead with the widespread introduction of 5G technology.

For communications service providers (CSPs), a typical IoT subscriber is an enterprise or a service provider from any vertical that wants to enable connectivity in many devices spread in one region or worldwide. Given the specificity and variability of IoT requirements per vertical, CSPs need a new approach in the assessment of IoT subscribers’ QoE. To meet this need, Ericsson has developed a generic Machine QoE (M- QoE) framework for accurate prediction of IoT subscribers’ QoE and tested a vertical-specific version of it in a smart-grid scenario.

Creating a framework for Machine QoE

IoT networks support multiple technologies, standards, device types and application types with various performance requirements between verticals and between applications of the same vertical. A preliminary step to design an M- QoE model for this complex environment is to create classes of IoT applications of the same vertical or similar performance requirements. The model features can then be optimized for each CSP according to the specific requirements of their network environment.

Classification of IoT applications

There are several ways to classify IoT applications. One approach is to classify them according to business verticals such as health care, agriculture, automotive, transportation, surveillance, smart home, smart city, smart grid and smart metering. Each vertical comprises applications with various degrees of performance requirements ranging from very low to high data speed, from best effort to ultra-reliable, and from no latency to ultra-low latency.

From a network perspective, applications are often classified into three categories based on their network performance requirements: massive IoT, mission-critical control and enhanced mobile broadband (eMBB) [1]. Massive IoT requirements include deep coverage, ultra-low device energy consumption, ultra-low complexity and ultra-high density. Mission-critical control requirements include strong security, ultra-high reliability, ultra-low latency and extreme user mobility. The eMBB requirements include extreme capacity, extreme data rates and deep awareness (for discovery and optimization).

The 3GPP specifications define three main categories of applications, two of which are dedicated to IoT, namely ultra-reliable and low-latency communications (URLLC) and massive Machine-Type Communications (mMTC). URLLC provides the ultra-reliability or ultra-low latency required for numerous uses cases in smart grid, intelligent transportation systems and smart manufacturing, while mMTC enables connectivity to lots of low-power and low-cost devices for best-effort applications and long-range coverage in smart cities or smart logistics.

At Ericsson, we have defined four categories of IoT applications: Massive IoT, Critical IoT, Broadband IoT and Industrial Automation IoT. The first two correspond to the 3GPP mMTC and URLLC categories respectively. Broadband IoT provides the high-data rates and large data volume required for unmanned aerial vehicles/drones and augmented reality/virtual reality applications. Industrial IoT provides the precise indoor positioning and time-sensitive networking for real-time applications that require deterministic communication.

The M- QoE framework that we have designed generalizes on the four IoT categories by inferring from specific cases. For this purpose, we first investigated the important characteristics of IoT applications in three business verticals: utilities, automotive and smart cities [2].

KPIs for machines

Our framework to assess the QoE of IoT subscribers comprises a list of common IoT KPIs as shown in Table 1. These KPIs are calculated on a per device, per subscriber and per vertical basis. The threshold values are based on the application type and the IoT vertical that the device belongs to.

KPI	Definition
latency (millisecond)	the time it takes to transfer a given piece of information from the moment it is transmitted by the source to the moment it is successfully received at the destination
packet loss (ratio)	the percentage of frames that should have been forwarded by a network but were not
bit error ratio	the number of bit errors divided by the total number of transferred bits during a studied time interval
energy efficiency (joule/byte)	the energy consumed for the end-to-end transport of a byte
security	level of importance for attack prevention
data (messages) rate, downlink/uplink (bit/s)	a time-variable function that might be important to define some parameters (such as peak, burst, average, minimum, maximum) in order to better describe the data rate (messages)
jitter (ms)	the short-term variations of a digital signal’s significant instants from their ideal positions in time
packet delay variation (ms)	variation in latency as measured in the variability over time of the packet latency across a network (expressed as an average of the deviation from the network mean latency)
reliability	percentage of sent network layer packets successfully delivered to a given node within the time constraint required by the targeted service, divided by the total number of sent network layer packets
availability	the percentage of available time (with respect to total time) in a generic observation period of the connection across the transport network
mobility (km/h)	fixed (no mobility: office, home) or maximum speed in movement (pedestrian or using a means of transportation such as a train, road vehicle, airplane or drone)
traffic density (Mbit/s/sq km)	traffic in a specific area
connection density (devices/sq km)	number of devices in a specific area
coverage (sq km)	area of application interest
battery lifetime (days, years)	time of battery duration
data size (bytes)	size of the atomic packet or frame (average, maximum)
location accuracy	the maximum positioning error tolerated by the application (can be measured outdoor and indoor in 5G)
cost	overall cost including connection, device and cloud services

Table 1: List of common IoT KPIs

Framework architecture for Machine QoE

The system architecture for the M- QoE framework has the same common base as Ericsson’s operational support system for subscriber service assurance. It is the foundation to provide not only consistent IoT service assurance but also an enhanced experience for the users of the connected machines (IoT devices) across several touchpoints along the use case journey.

Figure 1 shows the high-level, end-to-end architecture for our M-QoE framework. Machines of different kinds are wirelessly connected over 5G (or other cellular technologies). The yellow dots indicate the points where the assurance metrics related to M-QoE are usually measured:

the IoT device
the RAN
the edge network
the core network/services network
vertical slices and service layers (not shown).

The metrics are then ingested and processed at different points of the network subject to the processing delay requirements of each use case.

There can be a combination of RAN, edge and centralized processing of the received metrics. Stream processing at the edge is enabled when low latency and real-time processing are required, whereas processing takes place at the central data center in cases where there is no low-latency requirement. A great variety of interfaces allow for raw data to be preprocessed prior to the generation of any insights. Processed data is stored in a distributed database to make it available for further analysis.

A rules engine and machine learning (ML) models are used to provide M-QoE scores along with other insights such as anomalies, patterns of use, mobility patterns and other use-case-specific insights. The cognitive reasoning system, driven by business intents such as the proactive service level assurance of smart-grid infrastructure, can use the insights for autonomic decision-making. It can continuously maintain autonomously the desired service level specification by recommending actions to the service/operations engineers and/or triggering closed loop actions with minimal human intervention.

All generated insights, recommendations and actions are exposed through dashboards to various business and operational level management systems including IoT service operations, and usually used for planning, management and decision-making purposes [3].

Figure 1: M-QoE framework architecture

Machine QoE models

To improve the performance of our M-QoE framework, we added some rule-based and ML models to detect insights or anomalies in KPI measurements that operators could use to improve IoT subscribers’ network experience, as shown in Figure 2.

Our M- QoE framework includes four models:

machine-type detection
application-type inference
M-QoE quantification
M-QoE prediction.

Figure 2: The architecture of ML models in M- QoE

CSPs can use the machine detection model in our M- QoE framework to detect if a device accessing their network is an IoT device or a mobile phone to better manage their QoE. This is helpful because, unlike mobile phones, there is no global database of IoT devices, and the IoT landscape is highly fragmented with many device models and service providers.

Our M-QOE framework also includes an ML model for IoT segmentation, which was trained with samples of traffic data of known IoT applications. This model can be used in production to detect traffic patterns and predict the type of IoT application and vertical. This is valuable, as the IoT ecosystem is fueled by a multitude of small and medium vertical service providers whose traffic patterns have yet to be discovered.

The M-QoE quantification in our framework is done by computing the gap between the observed KPIs (O-KPIs) and their expected values (E-KPIs) defined by the Service Level Agreement. It should be noted that a typical IoT subscriber owns multiple devices, and their overall satisfaction with a service is a function of multiple features that are monitored by the different O-KPIs. To compute the overall M-QoE of the IoT subscriber, the first step is to aggregate the O-KPI measurements of all their devices and derive a feature-specific M-QoE _i for feature i. The second step is to add up the feature-specific M-QoE i weighted by their importance factor to obtain the overall M-QoE.

The computation of an IoT subscriber’s O-KPI is done by aggregating the measurements of their devices plus measurements at the vertical node over different dimensions: temporal, spatial, devices and services. Our aggregation procedure consists of computing the following statistics: average, variance, skewness, kurtosis, percentile (5, 25, 50, 75 and 95), minimum, maximum and range. The M- QoE _i score is derived from the dataset of the difference (O-KPI_i - E-KPI_i) between the two measurements and using an ML algorithm on a scale of 1 to 5 labelled as (1) bad, (2) poor, (3) fair, (4) good or (5) excellent.

The overall M-QoE value is M-QoE = Σ[alpha_i* M- QoE _i] where alpha_i represents the importance level of feature i. Mobile operators can optimize the performance of this M- QoE model by selecting a subset of KPIs that are relevant to their network environment.

The purpose of the M-QoE prediction model is to overcome the challenge of CSPs not being able to probe the IoT devices or the vertical’s network infrastructure to access measurements of subscriber KPIs and vertical KPIs, as these are external environments. These measurements are usually obtained from the MOS (mean opinion score) test, but this is done only once, limiting the dynamic tracking of user satisfaction. To address these limitations, we trained an ML algorithm to classify and predict the unknown E-KPI measurements from the known O-KPI factors records. The M-QoE prediction model uses an inductive supervised learning approach.

Case study: Machine QoE in next-generation power grids

Electric power grids are one of the verticals that will benefit most from M-QoE in the near term. The business model in this vertical is undergoing a transformation from mega grids operated by one large company to segmented grids and micro grids, giving consumers the possibility to choose between several offerings from different suppliers. In this competitive market, consumer QoE becomes the main differentiator.

At the same time, electric power grids and distribution systems are also under strain due to the growing introduction of non-centralized renewable energy resources, electric storage systems and increased electrical demand from new sources such as electric vehicles. The evolving needs of power grids are driving distribution systems to be more dynamic, with support for bidirectional power flows compared with the traditional centrally managed systems [4].

All of these factors introduce new challenges in the operation, planning, protection and control of future power grids. The growing need for new solutions is being met by the introduction of smart grids enabled by advanced communications networks including smart devices, wireless connectivity, automation and real-time control, edge computing and analytics.

To improve power quality, many electrical grids are being upgraded with synchrophasor technology. With this in mind, we decided to test our proposed M-QoE framework by applying it in a smart-grid scenario to predict the M-QoE of devices called phasor measurement units (PMUs) in a cellular-based synchrophasor.

Synchrophasors

A synchrophasor is a time-synchronized measurement of a quantity described by a phasor [5, 6]. Like a vector, a phasor has magnitude and phase information. PMUs in a traditional PMU network are deployed across transmission grids to measure voltage and current, and with these measurements calculate parameters such as frequency and phase angle.

PMU measurements are time-stamped to an accuracy of a microsecond, synchronized using the timing signal available from GPS satellites or other equivalent time sources. Measurements taken by PMUs in different locations are therefore accurately synchronized with each other and can be time-aligned, allowing the relative phase angles between different points in the system to be determined as directly-measured quantities. Synchrophasor measurements can thus be combined to provide a precise and comprehensive view of an entire interconnection.

A typical synchrophasor network is composed of PMUs and phasor data concentrators (PDCs) that are the control centers of the wide area measurement system (WAMS). The communication systems play a crucial role in the performance of the WAMS. Requirements for data reporting rates are typically 30 to 60 records per second, and may be higher; in contrast, current SCADA (supervisory control and data acquisition) systems often report data every four to six seconds – over 100 times slower than PMUs.

Multiple communications technologies are used in synchrophasors, from power-line communications to microwave, but in order to meet the latency requirements, optical fiber is widely used. Traditional optical fiber solutions may become prohibitively expensive, though, as smart grids become more complex and the need for additional measurements in distribution grids grows. In light of this, work is underway to develop a new, micro-PMU (µPMU). A µPMU is a smaller, lower-cost and more accurate version of a PMU that can be deployed more extensively in a distribution grid and provide accurate information from more places, allowing for better dynamic distribution grid management.

Cellular-based synchrophasors

The cost of extending the WAMS to connect more devices can become an issue when deploying µPMUs at scale in distributed grids. 5G with network slicing can meet the connectivity requirements of a WAMS and reduce the cost of deploying µPMUs. Using low-band and mid-band radio frequencies, the data bandwidth requirements of PMU applications can be achieved, with latency values in access networks (radio access to network edge) of 10ms.

IoT Connectivity as a Service (CaaS) is a subscription-based solution offered by operators that utilities can purchase to manage the connectivity of their PMUs and µPMUs.

Private LTE is another cost-effective alternative for grid operators to build synchrophasors over a dedicated communication system managed by the mobile operator to meet the desired requirements of latency, reliability and coverage.

Synchrophasors – Machine QoE assessment

According to a recent report from the North American SynchroPhasor Initiative (NASPI) [4], PMU applications in a power system can be classified into four main categories based on their reliability ranges:

automation (latency of ≤ 10-100ms)
reliability (latency of ≤ 1,000ms)
planning (latency of ≤ 100-1,000ms)
operation (latency of ≤ 1,000ms).

The report also lists the relevant KPIs – accuracy, reliability, latency, and message rate – and ranks them per category on an importance-level scale of 1-4 in which 4 means critical, 3 means important, 2 means somewhat important and 1 means less important. Table 2 presents the key details of the NASPI’s classification of PMU applications in power systems.

Application	Accuracy	Availability/ reliability	Low latency	Message rate
Automation	4	4	4	4
Reliability	2	2	3	2
Planning	4	3	1	4
Operation	1	1	2	2

Table 2: Classification of PMU applications in power systems

Automation refers to the automated protection and control applications of PMUs in distribution systems. The aim of these applications is to improve the reliability and security, automated remedial action schemes, and asset utilization. Reliability is a class of PMU applications that can be divided into topology and disturbance detection and situational awareness. Planning is a class of PMU applications focused on better system understanding and improved system modeling of the distribution network. Operations is associated with the monitoring and visualization of the network performance.

A CSP supplying network connectivity services to a smart-grid operator can use the proposed framework for the assessment of customer M-QoE by calculating M-QoE = Σ[alpha_i* M-QoE _i] where:

i represents the relevant features of synchrophasor applications (accuracy, reliability, latency, message rate and security).
alpha_i represents the importance level coefficients of the KPIs derived from the number in Table 2 between 1 and 4.

We tested this M-QoE framework in an experimental 5G lab environment, where we built a synchrophasor network and connected between one and ten µPMUs to a PDC, depending on the test scenario. Probes were designed and installed in the communication network, in the µPMUs and in the PDC to collect and stream network and device O-KPI_i measurements associated with the transmission of payload data between µPMUs and the PDC.

To obtain enough O-KPI measurements, we identified the locations of potential µPMU devices in the grid network and used the M-QoE prediction model to generate the measurement data. This data was aggregated in the knowledge-based system running the rules-based and ML models for continuous evaluation and monitoring of the overall M-QoE of the grid operator. The experiment confirmed that the framework will assist CSPs in monitoring and predicting the M-QoE of an enterprise with widespread devices over their infrastructure without the need for expensive large-scale measurements.

In another experiment, we used the M-QoE framework for the measurement of packets lost and delay gaps in the transmission of synchrophasor data due to communication link failures. The results showed early prediction of failures and M-QoE as well as accurate detection of failure types [7].

Conclusion

The deployment of 5G networks will accelerate the growth of the Internet of Things (IoT) and offer a wealth of new business opportunities for communications service providers (CSPs). It is important to recognize, however, that the churn of a single IoT enterprise due to network issues may represent many device connections with significant revenue impact. To ensure consistent QoE in IoT applications, CSPs need a smart solution to consistently monitor the diversity of performance requirements that are requested by IoT subscribers. Ericsson’s Machine QoE framework is designed with this in mind, utilizing the power of artificial intelligence and machine learning to automatically discover and predict events in the CSP network, so that they can be addressed before they have a negative impact on QoE.

References

Receive the latest industry insights straight to your inbox

Subscribe to Ericsson Technology Review

Subscribe here

Internet of Things

Imagine a society where we will have more connected devices than people living on the planet. In the immediate future year 2025, more than 24.9 billion Internet of Things (IoT) connections are forecasted compared to the estimated human population of 8.1 billion people.

Ericsson 5G

At Ericsson, our 5G is made for innovation. No matter what you’ve set your sights on, we offer you the speed and flexibility you need to empower new services and use cases for people, businesses, and society at large. Let’s explore 5G!

Network Slicing

Discover this power virtualization capability that will enable flexibility.

Constant Wette Tchouati

joined Ericsson in 2000 and has since worked as a developer and project leader in product development, research projects, innovation and new business development involving various technologies, data science, IoT/machine-to-machine, IMS, and telco cloud architectures. He holds an M.Eng. in electrical engineering from the National Advanced School of Engineering in Yaoundé, Cameroon, an M.Sc. in computer engineering from the École Polytechnique de Montréal, Canada, an M.B.A. from HEC Montréal and a graduate certificate in data science from Harvard University in the US.

Steven Rochefort

joined Ericsson in 1994 with a background in software development for command and control systems. Rochefort has been involved in almost every aspect of mobile telephone development at Ericsson, from software development to system design, with a focus on IoT solutions. In 2016, he returned to his academic passion for mathematics, becoming a data scientist for Ericsson’s OSS (Operations Support Systems) product area. He holds a B.Sc. in mathematics and a graduate certificate in marketing research from Concordia University in Montreal, Canada.

George Sarmonikas

joined Ericsson in 2013 after several years of working for mobile operators. He currently heads AI & IoT Solutions within Business Unit Digital Services at Ericsson, developing novel products. Prior to this, he was responsible for the product management of Ericsson’s customer experience management and analytics portfolio, including assets for subjective experience scoring. Sarmonikas holds both an M.Sc. in communication systems and an M.Eng. in electronic engineering and computer science from the University of Bristol in the UK, as well as a graduate degree in artificial intelligence from Stanford University in the US.

Acknowledgements

The authors would like to thank Brunilde Sansò, Younes Seyedi, Orestes Gonzalo Manzanilla-Salazar, Hakim Mellah, Filippo Malandra and Houshang Karimi – all based at École Polytechnique de Montréal – for their contributions to this article.