TinyML as a Service and the challenges of machine learning at the edge
The TinyML as-a-Service project at Ericsson Research sets out to address the challenges that today limit the potential of machine learning (ML) paradigms at the edge of the embedded IoT world. In this post, the second post in our TinyML series, we take a closer look at the technical and non-technical challenges on our journey to making that happen. Learn more below.
This is the second post in a series about tiny machine learning (TinyML) at the deep IoT edge. Read our earlier introduction to TinyML as-a-Service, to learn how it ranks in respect to traditional cloud-based machine learning or the embedded systems domain.
TinyML is an emerging concept (and community) to run ML inference on Ultra Low-Power (ULP ~1mW) microcontrollers. TinyML as a Service will democratize TinyML, allowing manufacturers to start their AI business with TinyML running on microcontrollers.
In this article, we introduce the challenges behind the applicability of ML concepts within the IoT embedded world. Furthermore, we emphasize how these challenges are not simply due to the constraints added by the limited capabilities of embedded devices but are also evident where the computation capabilities of ML-based IoT deployments are empowered by additional resources confined at the network edge.
To summarize the nature of these challenges, we can say:
- Edge cannot solve everything
- Web and embedded components belong, for characteristics, to different technological domains
- The machine learning ecosystem is big and resource demanding. What ML tasks can be executed in the IoT space?
Below, we take a closer look at each of these challenges.
Why edge computing cannot solve everything
Edge computing promises higher performing service provisioning, both from a computational and a connectivity point of view.
Edge nodes support the latency requirements of mission critical communications thanks to their proximity to the end-devices, and enhanced hardware and software capabilities allow execution of increasingly complex and resource-demanding services in the edge nodes. There is growing attention, investments and R&D to make execution of ML tasks at the network edge easier. In fact, there are already several ML-dedicated "edge" hardware examples (e.g. Edge TPU by Google, Jetson Nano by Nvidia, Movidius by Intel) which confirm this.
Therefore, the question we are asking is: what are the issues that the edge computing paradigm has not been able to completely solve yet? And how can these issues undermine the applicability of ML concepts in IoT and edge computing scenarios?
We intend to focus on and analyze five areas in particular: (Note: Some areas we describe below may have solutions through other emerging types of edge computing but are not yet commonly available).
- Privacy. Data security and user privacy have received much attention in recent years, emphasized further by the recurrent news on public data leaks. Governments have acted to resolve such privacy issues, for example through new regulation and by strengthening already existing data security and privacy laws . An obvious example, of this is the General Data Protection Regulation (GDPR) enforced by the European Union in 2018. Despite these more stringent regulatory actions, many data owners still remain cautious, often showing reluctance to trust third-party cloud and edge service providers to store and manage their data. In simple words, we can affirm that there is a growing willingness from the final users for defining physical "on-premises" boundaries in which to keep their produced data. This aspect relates highly to ML, where data represents the key factor of the entire ecosystem. In the wider industry, several approaches have been defined such as "federated learning" to overcome such challenges. Here at Ericsson, we’re also well aware of the importance of security and privacy and remain proactive in addressing wider concerns. TinyML as-a-Service goes towards a similar direction of keeping the data on-premises, by trying to confine the processing of sensitive data only at the IoT device itself (and by consequence avoiding the data flowing towards external services).
- Network bandwidth. In scenarios characterized by the presence of a dense quantity of various IoT end-devices (devices suitable for the execution of a wide variety of heterogeneous tasks), it is reasonable to assume that the quantity of raw data generated by these devices is significant. On the other hand, it is also reasonable to assume that a large portion of these devices is equipped with narrowband network interfaces (for example NB-IoT or Cat-M1). These limited transmission capabilities indicate the need to pre-process the data "on-premises", in order to reduce the amount of data to offload at the edge, as well as avoid network bottlenecks between end-devices and edge with consequent performance degradation.
- Latency. The ability to ensure bounded latency communications to deliver high-performance services is a major design requirement for emerging mobile networks. This blog has already hosted several articles for explaining how 5G is the major enabler for latency-critical IoT applications and ensuring ultra-reliable and low-latency communications. We have also emphasized how the edge network plays a key role for supporting massive and critical machine type communication use cases, by ensuring millisecond-latency. TinyML as-a-service aims to further reduce network latency, by moving the execution of certain ML tasks (e.g. inference) to the device itself. Although this sort of “near-zero” latency is desirable and made possible through our approach, it is worth clarifying that we still cannot prescind from the support of the network edge and the rest of the mobile network.
- Reliability. In scenarios where cellular coverage cannot always be ensured such as in the case of limited or no connectivity in rural areas or on the open sea, neither the cloud nor the edge can extend the computation and connectivity capabilities of the end-devices. Consequently, the ability to perform “not dummy” tasks on-premises becomes a desirable and necessary feature. Being able to execute certain ML operations locally that were previously executed at the edge or in the cloud, as TinyML as-a-Service aims to do, inevitably produces several advantages.
- Energy efficiency. It is a well-known fact that one main design requirement of IoT networks is energy efficiency, especially considering the high probability that IoT devices are battery powered. There is however a further energy efficiency aspect that is often ignored, but which in reality can become extremely relevant. This relates to the fact that, in certain cases, network transmission can consume more energy than local computing. Considering this last element and the battery powered design of most IoT devices, TinyML as-a-Service has been designed to enable local processing in the IoT device itself, along with data transmission and ML computation in the edge and cloud.
Technological differences between web and embedded
The web and the embedded worlds feature very heterogeneous characteristics. Figure 1 (above) depicts how this high heterogeneity is characterized, by comparing qualitatively and quantitively the capacities of the two paradigms both from a hardware and software perspective. Web services can rely on powerful underlying CPU architectures with high memory and storage capabilities. From a software perspective, web technologies can be designed to choose and benefit from a multitude of sophisticated operating systems (OS) and complex software tools.
On the other hand, embedded systems can rely on the limited capacity of microcontroller units (MCUs) and CPUs that are much less powerful when compared with general-purpose and consumer CPUs. The same applies with memory and storage capabilities, where 500KB of SRAM and a few MBs of FLASH memory can already be considered an extensive resource. There have been several attempts to bring the flexibility of Linux-based systems into the embedded scenario (e.g. Yocto Project), but nevertheless most 32bit MCU-based devices have the capacity to run real-time operating systems and no more complex distribution.
In simple terms, when Linux can run, system deployment is made easier since software portability becomes straightforward. Furthermore, an even higher cross-platform software portability is also made possible thanks to the wide support and usage of lightweight virtualization technologies such as containers. With almost no effort, developers can basically ship the same software functionalities between entities operating under Linux distributions, as happens in the case of cloud and edge.
The impossibility of running Linux and container-based virtualization in MCUs represents one of the most limiting issues and biggest challenges for current deployments. In fact, it appears clear how in typical "cloud-edge-embedded devices" scenarios, cloud and edge services are developed and deployed with hardware and software technologies which are fundamentally different and easier to manage compared to embedded technologies.
TinyML as-a-Service tries to tackle this issue by taking advantage of alternative (and lightweight) software solutions.
The Machine Learning ecosystem is big and resource demanding. What ML tasks can be executed in the IoT space?
In the previous section, we considered on a high-level how the technological differences between web and embedded domains can implicitly and significantly affect the execution of ML tasks on IoT devices. Here, we analyze how a big technological gap exists also in the availability of ML-dedicated hardware and software web, edge, and embedded entities.
From a hardware perspective, during most of computing history there have been only a few types of processors, mostly available for general use. Recently, the relentless growth of artificial intelligence (AI) has led to the optimization of ML tasks for existing chip designs such as graphics processing units (GPUs), as well as the design of new dedicated hardware forms such as application specific integrated circuits (ASICs), which embed chips designed exclusively for the execution of specific ML operations. The common thread that connects all these new devices is their usage at the edge. In fact, these credit-card sized devices are designed with the idea of operating at the network edge.
At the beginning of this article we mentioned a few examples of this new family of devices (Edge TPU, Jetson Nano, Movidius). We foresee that in the near future even more big and small chip and hardware manufacturers will increasingly invest resources into the design and production of ML-dedicated hardware. However, it appears clear how, at least so far, there has not been the same effort in the embedded world.
Such a lack of hardware availability undermines somehow a homogeneous and seamless ML "cloud-to-embedded" deployments. In many scenarios, the software can help compensate for hardware deficiencies. However, the same boundaries that we find in the hardware sphere apply for the development of software tools. Today, in the web domain, there are hundreds of ML-oriented application software. Such availability is registering a constant growth thanks also to the possibility given by the different open source initiatives that allow passionate developers all over the world to merge efforts. The result is more effective, refined, and niche applications. However, the portability of these applications into embedded devices is not so straightforward. The usage of high-level programming languages (for instance Python), as well as the large sizes of the software runtime (intended as both runtime system and runtime program lifecycle phase) are just some of the reasons why software portability is painful, if not impossible.
The main rationale behind the TinyML as-a-Service approach is precisely the one to break the existing wall between cloud/edge and embedded entities. However, to expect exactly the same ML experience in the embedded domain as we have in the web and enterprise world would be unrealistic. It is still an irrefutable fact that size matters. The execution of ML inference is the only operation that we reasonably foresee to be executed in an IoT device. We are happy to leave all the other cumbersome ML tasks, such as data processing and training, to the more equipped and resourceful side of the scenario depicted in Figure 2.
In the next article, we will go through the different features which characterize TinyML as-a-Service and share the technological approach underlying the TinyML as-a-Service concept.
In the meantime, if you have not read it yet, we recommend reading our earlier introduction to TinyML as-a-Service.
The IoT world needs a complete ML experience. TinyML as-a-service can be one possible solution for making this enhanced experience possible, as well as expanding potential technology opportunities. Stay tuned!