Addressing the memory-safety hardening challenge with CHERI
Memory safety refers to the prevention of software bugs and vulnerabilities that can cause programs to access or modify memory in unintended ways, leading to system crashes or security vulnerabilities. The University of Cambridge, United Kingdom, has developed a hardware-based security architecture called CHERI, which is designed to provide widespread memory safety. Ericsson Research has assessed the suitability of CHERI for telecommunication systems that require high performance and fault tolerance. This blog post describes key guiding points from the research team.
The memory-safety hardening challenge
Memory-safety vulnerabilities are flaws in software that occur due to improper memory access or management. Programming languages like C and C++ lack built-in memory-safety protections, but they are widely used for systems programming, embedded systems, and performance-critical applications.
Recently, leading cybersecurity organizations such as the US Cybersecurity & Infrastructure Security Agency (CISA), the National Security Agency (NSA), and other national cybersecurity agencies have emphasized the need for renewed focus on memory safety by the software industry. This culminated in a US White House Office of the National Cyber Director (ONCD) report, urging software manufacturers to adopt secure-by-design technologies, such as memory-safe programming languages and hardware.
CHERI – A way forward to pervasive memory safety
CHERI, which stands for Capability Hardware Enhanced RISC Instructions, is a hardware research architecture developed by the University of Cambridge and SRI International. It adds security features to protect low-level software from memory-safety vulnerabilities.
We have explored different aspects of the CHERI research architecture focusing on technical aspects relevant to telecommunication to find out whether CHERI is a viable technology for future telecommunication systems.
Our conclusions in summary
As with any security technology, CHERI has tradeoffs between security, cost, and operational impact on software. To make the tradeoff worthwhile, we want the protection to be as comprehensive and systematic as possible and the operational impact to be minimized.
Having realistic demonstrators for new hardware technologies is critical. They help potential adopters understand how these technologies will perform, and what kind of operational impact they will have.
Addressing the memory-safety hardening challenge requires a comprehensive approach, including the adoption of more secure hardware.
The CHERI research architecture
CHERI incorporates capability-based addressing that uses integrity-protected objects called capabilities instead of conventional memory pointers to address memory. Capabilities include information (see Figure 1) that is used by a CHERI-enabled processor to ensure memory accesses are safe and correct.
Each CHERI capability is double the width of the native integer pointer type of the baseline architecture. A capability consists of the baseline address, identifying a location in memory the capability points to, and capability metadata. The metadata includes permission information, object type, and an upper (UB) and lower bound (LB), relative to the baseline address, which limits the portion of the address space the capability can access. The bounds are stored in a compressed format to save space. One additional bit, the validity tag, is stored separately from the capability and is used to protect its integrity.
It is important to be able to observe the impact of CHERI with real, physical hardware on real-world performance-critical software. Industrial demonstrators, such as the Arm Morello prototype System-on-Chip and the CHERI software stack for Morello are important for potential adopters, such as Ericsson, to gauge the benefits and tradeoffs involved with porting telecommunications software to CHERI.
In addition, our Ericsson Research team has developed our own variants of CHERI-RISCV softcores to evaluate potential further extensions of the CHERI architecture. Thanks to open-source research, we have been able to leverage existing CHERI-RISCV softcores and evaluation platforms needed for experimentation in FPGA-based environments.
Is CHERI viable for telecommunications systems?
To understand the tradeoffs with using CHERI in telecommunications systems, we utilized the Arm Morello demonstrator to run CHERI-enabled 5G RAN production code. The Morello platform, developed by Arm, is a prototype system-on-chip (SoC), which implements a variant of CHERI on top of an ARMv8-A application processor, enabling industrial evaluation of CHERI hardware and software concepts. Morello is based on Arm's existing Neoverse N1 platform and CPU, which is the server-class counterpart to the Arm A76 microarchitecture. Basing Morello on a widely available commercial CPU design is a good choice by Arm, as it enables comparisons of CHERI to the baseline processor.
We ported synthetic Cloud-RAN Radio Link Control (RLC) and Medium Access Control (MAC) benchmarks written in C/C++ to Arm Morello to evaluate the portability and performance impact related to CHERI. Only one percent of the total source code in the benchmarks required changes to make the software compatible with CHERI. Most of the porting effort was due to dependencies that did not support CHERI at the time. As the CHERI ecosystem evolves, this is expected to become less of a problem.
The Morello SoC prototype shows some clear performance overheads. We observed characteristics that are consistent with studies conducted at the University of Cambridge in early performance results from the prototype Morello microarchitecture. These bottlenecks in hardware implementation are largely due to design decisions made within the constraints of the Morello research project. To make CHERI a viable alternative to performance-critical software, such as telecommunications systems, it is crucial to decrease its performance impact wherever possible.
The researchers at Cambridge have significantly boosted performance results in CPU benchmarks by using optimized compilation techniques and modified FPGA implementations of the Morello CPU. These enhancements are not available in the Arm Morello design used for the experiments at Ericsson. If such advances deliver improvements as estimated, the performance is likely acceptable also for telecommunication systems.
Increased focus needed on fault tolerance and resiliency
Fault-tolerant, highly available software is necessary for critical infrastructure, particularly in telecommunication systems. Availability is a primary focus, along with confidentiality and integrity, but many security approaches are at odds with such availability requirements.
Recent outages affecting major software and security vendors demonstrate that security solutions can become deeply embedded to the point where a malfunction, whether unintended or induced maliciously, has severe consequences for availability. Conventional memory-safety mitigations share similar traits: The focus in mitigation is typically on preventing attackers from gaining a foothold by exploiting a memory-safety vulnerability to the point where letting the software crash is preferable. However, maintaining the resilience and availability in systems critical to society is as important as detecting and mitigating memory-safety vulnerabilities.
In ongoing work, we are exploring methodologies to enhance the resilience of software by employing memory-safety vulnerability mitigations. Our early work in this area includes “Rewind & Discard” for the 64-bit x86 architecture, which improves the resilience of software-based defenses by introducing crash-resistant in-process isolation that leverages hardware-assisted isolation primitives.
CHERI can efficiently detect and mitigate common memory safety vulnerabilities but, like many software-based solutions, it overlooks the importance of software resilience and availability. To make sure a system can continue to function and remain responsive while mitigating an attack or when subjected to malicious inputs, the CHERI protection model must be made part of a larger architecture built for fault tolerance.
Fortunately, CHERI capabilities go beyond memory safety, also supporting highly scalable, hardware-enforced in-process or in-kernel software compartmentalization. This feature is critical for developing robust crash-resistant systems such as the “Rewind & Discard” approach.
We explored the application of the “Rewind & Discard” concept to the CHERI protection model in a prototype that demonstrates reduced performance degradation compared to the earlier research done on off-the-shelf commercial hardware.
Researchers and practitioners need to consider how to make CHERI, and other memory-safety mitigations practical for critical, high-availability software. These systems, are part of larger systems with strict demands for resilience and fault tolerance in addition to security.
Further CHERI improvements with conditional capabilities
While CHERI promises to mitigate large classes of memory-safety issues, including buffer overflows and use-after-free conditions, it intentionally leaves some important issues out of its scope. A notable omission is the undefined behavior associated with uninitialized variables. Vulnerabilities related to uninitialized variables account for a sizable, about 10 percent of memory-safety vulnerabilities reported to the Common Vulnerability Enumeration (CVE) program between 2015 and 2022. Software-based mitigations for uninitialized memory issues are well known. Examples include: zeroing memory mappings before first use; automatic initialization in the form of heap allocators returning zeroed memory; and compiler analysis that automatically zeros local variables. However, these approaches suffer from performance overhead that scales with the amount of memory consumed by an application and can interfere with debugging tools and other development-time practices used to detect and fix memory-safety issues during software development.
We have explored further extensions to the CHERI capability model to express memory-access policies that eliminate uninitialized memory accesses from CHERI-enabled software. Specifically, we introduce the notion of conditional permissions to capability-based addressing, which allows expressing memory-access policies that consider previous memory operations using a capability. This enables conditional capabilities that satisfy memory-safety objectives, such as “no reads to memory which has not been the subject of at least one write” (write-before-read, see Figure 2).
A write-before-read capability tracks, in addition to the upper (UB) and lower bound (LB), a third, operation-specific bound (OB) which describes the area of memory between the LB and OB that has been written using a capability. A newly allocated memory area under the write-before-read policy (a) is initially write-only. After a portion of the allocation has been written (b) the operation bound is increased by the hardware and the portion between the LB and OB becomes readable. Once the OB reaches the UB, the semantics of a conditional capability are identical to a conventional CHERI capability.
In our research publication Mon CHÉRI ♡ Adapting Capability Hardware Enhanced RISC with Conditional Capabilities, we present our hardware extension that adds conditional capabilities to the CHERI architecture, compiler support, and a detailed evaluation of our approach using the QEMU full-system simulator and our modified FPGA-based CHERI-RISCV softcore.
Conditional capabilities leverage and extend the CHERI capability model to implement memory access-control policies unattainable with traditional per-memory-page methods. Beyond write-before-read capabilities, they show that, despite some gaps, CHERI opens new ways to address existing challenges.
Conclusion
From an industry perspective, adopting technologies like CHERI involves significant initial costs and disruptions. Success with CHERI will require support from numerous stakeholders in both the chip industry and the open-source ecosystem. It’s important that the tradeoff between security, cost, and operational impact on software is worthwhile. The technologies need to offer comprehensive and systematic protection with minimized operational impact.
The availability of realistic demonstrators, such as the Arm Morello, is crucial for potential adopters in getting a realistic picture of characteristics, such as potential performance impact. Such characteristics are best determined using domain-relevant benchmarks that, as closely as possible, model the behavior of real-world systems.
The memory-safety hardening challenge must be addressed comprehensively, including more secure hardware. Solutions for memory-safety cannot come at the cost of equally important operational characteristics, such as fault tolerance. That is why at Ericsson, we are striving to push the boundary of all aspects necessary to secure the critical infrastructure of today and tomorrow.
Acknowledgements
MSc Thesis Student Fredrik Hessner from Lund University contributed to the memory-safe 5G software using CHERI project, supervised by Håkan Englund from Ericsson Research and Peter Svensson from Ericsson Networks.
MSc Thesis Student Sacha Ruchlejmer from Université Grenoble Alpes Grenoble INP – Phelma contributed to exploring Rewind and Discard for CHERI, supervised by Merve Gülmez from Ericsson Research.
Learn more
For more information about conditional capabilities, read our research paper Mon CHÉRI ♡ Adapting Capability Hardware Enhanced RISC with Conditional Capabilities
Open-access pre-print: M. Gülmez, H. Englund and J. T. Mühlberg and T. Nyman "Mon CHÈRI <3 Adapting Capability Hardware Enhanced RISC with Conditional Capabilities", 2024, arXiv preprint 2407.08663.
For more information about secure rewind and discard, read our research paper: Unlimited Lives: Secure In-Process Rollback with Isolated Domains
Published as: M. Gülmez, T. Nyman, C. Baumann and J. T. Mühlberg, "Rewind & Discard: Improving Software Resilience using Isolated Domains," 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Porto, Portugal, 2023, pp. 402-416, doi: 10.1109/DSN58367.2023.00046.
For more information about secure rewind and discard for Rust foreign function interfaces, read our research paper: Friend or Foe Inside? Exploring In-Process Isolation to Maintain Memory Safety for Unsafe Rust
Published as: M. Gülmez, T. Nyman, C. Baumann and J. T. Mühlberg, "Friend or Foe Inside? Exploring In-Process Isolation to Maintain Memory Safety for Unsafe Rust," 2023 IEEE Secure Development Conference (SecDev), Atlanta, GA, USA, 2023, pp. 54-66, doi: 10.1109/SecDev56634.2023.00020.
For more information about porting secure rewind and discard to Arm Morello, see Sacha Ruchlejmer’s MSc thesis: Secure Rewind and Discard on ARM Morello
Open-access pre-print: S. Ruchlejmer "Secure Rewind and Discard on ARM Morello” 2024 arXiv preprint 2407.04757.
Source code for Secure Rewind & Discard, Secure Rewind & Discard for Rust FFI, and Rewind & Discard for Arm Morello is available on Github
For more information about memory-safe 5G software using CHERI, see Fredrik Hessner’s MSc Thesis: Memory-Safe 5G Software Using the CHERI Hardware Architecture
Learn more about our research on Future network security
For more information about the CHERI project at Cambridge University, see their project page and their technical report on early performance results from prototype Morello microarchitecture
Like what you’re reading? Please sign up for email updates on your favorite topics.
Subscribe nowAt the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.