Accelerating 5G vRAN: The Case for GPUs
5G usage has been predominantly 4G-style mobile broadband (MBB) services driving smartphone traffic growth. By the end of 2019, global mobile data traffic was around 40 exabytes (EB) per month and by 2025 we expect 160 EB, with video-related streaming accounting for about three-quarters of the total. Beyond MBB, 5G will allow service providers to address more of the IT value chain as they accommodate and enable an emerging range of enterprise applications and services. What will it take to spur this next wave of 5G across a broad spectrum of use-cases? RAN virtualization is a key element to the story of 5G reaching beyond MBB.
RAN Virtualization and 5G use-cases
Ericsson has been studying and prototyping designs for a virtualized NR baseband distributed unit (vDU or DU functionality implemented as software on a generic hardware platform). We have recently taken additional steps towards such a product in collaboration with leading-edge providers of complementary technologies, including NVIDIA for GPUs and Intel for CPUs. This will enable solutions to emerging challenges, e.g. deployment in data center-like environments and industrial and enterprise private networks. Benefits include roll-out and operational synergies due to the use of open-source cloud platforms and commercial off-the-shelf (COTS) hardware.
In general, emerging use-cases for 5G can be classed into two groups: those requiring a low latency round-trip time between device and application service (e.g. cloud gaming, server-side video rendering and deep learning) and those characterized by localized/site-bound application deployment (e.g. Industry 4.0, controller software for machines, sensors and actuators in factories). The former category involves application services moving closer to the access network (RAN) from hyperscale clouds (AWS, Azure, GCP). The latter involves RAN and Core being deployed in stand-alone private networks. Both cases imply a convergence of connectivity (5G) and IT services.
But there is a catch: the network may not evolve to support such applications (low-latency or localized/private) before the application-side demand emerges. At the same time, lack of support in the network means application developers do not come to the table in big numbers. To circumvent this, we – as industry leaders – need to make a move. To paraphrase Kevin Costner from the film Field of Dreams, “Build it and they will come.” The logic is clear: an initial “base” workload is needed for what is loosely called “edge.” NR baseband processing can fill that role—compelling a shift in architecture and technology stack—serving as a foundation for new services over a longer timeframe.
Harmonizing Virtualization with Acceleration
vRAN is about 5G becoming software-defined and programmable. By definition, vRAN involves disaggregation of software and hardware. There is an expectation that an operator will be able to run the same 5G software stack on a variety of servers and evolve capacity by swapping out the hardware, as we do with our PCs. There’s a cost to this: a lower degree of system integration implying lower performance per watt of power. But there is also an upside where Ericsson is taking the lead: greatly improving the pace of new network features and the adaptability of 5G to emerging use-cases.
Many 5G use-cases run well on pure CPU platforms, but as bandwidths increase and advanced antenna systems are deployed, x86 cores struggle to keep up and start driving impractical levels of power consumption. Hardware acceleration will be needed for the compute-heavy physical layer and scheduling workloads in 5G NR.
From desktop gaming to supercomputer benchmarking, hardware acceleration is attained using GPUs. While there are alternative paths to hardware acceleration, they usually involve customization using FPGAs or ASICs, therefore ruling out COTS hardware. Concerning compute density, GPUs devote a larger fraction of their chip area to arithmetic than CPUs do. As technologies evolve, GPUs are free to keep optimizing for a narrow workload of high-performance computing, whereas CPUs target a more diverse workload, such as databases and office applications.
To get high performance in a CPU implementation of NR one has to massively optimize single-core, single-thread performance with SIMD (single instruction multiple data). This is generally fun and exciting work but leads down a perilous path. Once the last optimization is done with no cache misses – what do you do? Wait for faster CPU clock rates? Or attempt to go multi-core in the CPU? The former alternative is a dead end. The latter case turns out to be challenging in the NR physical layer precisely because the single-threaded code can be so highly optimized. Going to multiple cores requires sacrificing the very data locality you worked so hard to engineer, meaning the first few cores beyond the initial one may cost more in lost per-core performance than is gained in parallelization. Bad news.
On the other hand, with NVIDIA GPUs you can take an extreme multi-threaded approach from the start. Since effective core counts per chip area tend to increase faster than clock rates, this sets you up for an ongoing effect over time. And GPUs are in all the major cloud platforms and run countless AI workloads across all industries (making them suitable for COTS hardware platform design).
NVIDIA GPUs offer an acceleration platform that delivers performance and programmability without sacrificing the “COTS” of x86 architecture. Here we outline our key requirements which the platform fulfills:
- Cloud-native and well-integrated with x86 virtualization
- Programmable in C and C++ with a flexible programming model
- An ecosystem of tools and libraries
- Availability of design professionals across many industries and academic research
- Mass-market and mass-production volumes
- The ability to scale from very small to very large deployments
- A roadmap of increasing compute capability
The CUDA ecosystem encompasses a large and vibrant developer community with access to outstanding libraries of common computational building blocks. In this respect, CUDA and NVIDIA GPUs have no rivals in high-performance computing. And NVIDIA is scaled for large volume production and logistics since GPUs also target PC gaming.
Our approach to RAN virtualization as a complement to our current portfolio will provide alternative ways of deploying NR baseband. Basing it on COTS hardware enables a wide range of emerging edge applications including augmented reality, AI and deep learning, and server-side video rendering to be co-located. In addition, it enables the solution to scale down to meet the needs of private networks. We see clear advantages in implementing the baseband processing and new emerging 5G use-cases in close proximity, with both benefiting from NVIDIA GPU-based hardware acceleration.
 - Ericsson Mobility Report, November 2019 edition and 2019 Q4 update.
This is the second of two articles on 5G RAN virtualization. The first, entitled “Virtualized 5G RAN: why, when and how?”