High Assurance Virtualization for ARMv8
To save cost and increase usability and capability, there is an increased demand for reusing the same ICT hardware for multiple functionalities. A fundamental security requirement is to provide isolation between the components providing these functionalities, including the associated data.
Ericsson has co-developed a high-assurance hypervisor and secure boot, codenamed HASPOC, for the latest ARM architecture, ARMv8. In this post, we provide details about the technologies of our platform.
The requirement to provide isolation has become relevant for most types of ICT devices. Not only for mainframes in data centers, but also for smart phones and computationally very limited devices such as embedded systems and sensors.
The solution was developed by our Security Research team at Ericsson Research, together with SICS and other partners as part of the Vinnova project HASPOC.
The isolation properties of the platform are formally verified, which establishes that multiple guests executing on the HASPOC platform enjoy the same isolation as if they had been executing in an ideal model comprising two separate, air-gap separated systems – but allowing for dedicated inter-guest communication channels across the air-gap. In addition, a Common Criteria Security Target has been specified, allowing for Common Criteria evaluation of products based on the HASPOC platform.
The hypervisor is of Type-1 (bare metal), which means that it does not require an underlying operating system and that it is transparent to guests. Each guest on the platform has been exclusively assigned one or more cores to execute guest code on.
Icon from icons8
Physical memory is exclusively assigned to and owned by each guest. The hypervisor enforces that one guest cannot, outside the scope of supervised inter-guest communication, access or modify information in the memory exclusively owned by another guest. The nested page table capability of the ARMv8 memory management unit is utilized for efficient virtual memory management and unmodified guest code.
Peripherals are where possible exclusively assigned to a specific guest via memory mapping. In other cases, the hypervisor may maintain data per guest per shared peripheral to avoid data interference and information leakage. Physical interrupts are exclusively assigned to the guest which owns the related peripheral. DMA capable peripherals are access controlled via so called S-MMUs, where available. These S-MMUs ensure that the peripherals do not violate the access boundaries defined for the guest owning a peripheral. If no S-MMU is available the hypervisor may control guest access to a peripheral through a hypervisor emulation driver system.
ARMv8 Hardware Model
A formal model of the ARMv8 architecture is needed both to perform formal verification of security properties and for code verification. An existing L3 ARMv8 model developed in Cambridge covering user level instructions have been expanded with system level instructions as well as with a model of the two-stage memory management unit.
The sequential architecture model is a straight-forward extension of the original Cambridge model. The memory is sequential and uses virtual addresses. The main addition is a large number of system level instructions and system registers. The registers can be set and retrieved using the system level instructions. The model supports cache and TLB invalidation instructions plus memory barriers, but these have no effect as sequential memory with transparent caches and TLBs/address translation is assumed.
The decomposed architecture model is developed to reflect low-level behavior of the ARMv8 architecture and peripherals. Specifically, it captures the effects of weakly consistent memory which is shared between all observers in a system.
To handle the complexity of the model, it is decomposed into several components which are communicating through message passing. Components include processor cores, MMUs, and the memory subsystem.
The formal verification of the HASPOC platform establishes mathematically proven and machine checked guarantees of the isolation of guests in the system. The proven property is non-interference, which means that no information can flow between guests except through explicitly defined channels. This means in particular that guests cannot change or observe private information of other guests or the hypervisor, i.e. integrity and confidentiality is preserved.
The first step of the formal verification employs a so called real-ideal bisimulation-based approach to ensure that the platform design resembles a system where non-interference holds by construction. The second step of the proof focuses on the hypervisor event handler code and establishes the correctness of its binary implementation by means of (semi)-automated code verification methods.
In general, a simulation proof between two systems A and B is conducted to show that B is refining (implementing) A, or – equivalently – that A abstracts from (simulates) B. The two systems are usually coupled with a simulation relation sim between states of A and B. It is then proven by induction that for any computation of B starting in state b that is related to a state a of A by sim, there exists a simulating execution of A such that the resulting states b’ and a’ are again contained in the simulation relation sim.
A bisimulation is a special case of simulation between two systems where both systems are simulating each other. That is, also for every computation of A one must show that there exists a simulating computation of B.
The idea of the real-ideal approach to proving information flow properties is to formulate an idealized abstract model of the system which by construction has the desired properties (e.g., non-interference). The main idea of virtualization (and isolation) is to provide an illusion to the guest processes that they are running on the machine alone. Each guest owns a virtual machine which in principle is completely isolated from the others’.
We pick up this idea and model the ideal system as a number of separate virtual machines, which are reduced versions of the original ARMv8 processor cores. Specifically, these machines can only run in EL1 and EL0 and there is only one stage of address translation. There is only a subset of the platform’s interrupts available to each guest and the memory allocated to each guest has a smaller size than the complete physical memory. Similarly, peripherals are only connected to machines which are allowed to use them. The communication mechanism between the guests provided by the hypervisor can be modelled abstractly by a message passing protocol using device-like uni-directional message channels between the guest virtual machines.
The “real” system model represents the HASPOC platform on the design level. The system resembles the actual physical machine with all execution levels and the second stage address translation for the guests. Similarly, the interrupt controller and system MMU become visible and the interrupts to the guests are routed through the hypervisor. Hypercalls and SMCs are not executed atomically but according to a transition system that resembles their C and assembly implementation. The interleaving of hypervisor and guest execution on each core becomes explicit, including the saving and restoring of the virtual machine contexts. Moreover, there is now only one weakly consistent memory shared between all guests. Once the models are fixed, one performs a bisimulation proof between the ideal and the real model in order to show that they behave equivalently.
Boot and Initialization
The bisimulation concerning the hypervisor execution starts in an initial ideal and a real state where the bisimulation already holds. To obtain this initial property, we need to argue about the boot code and the initialization code of the hypervisor. After the execution of this code, execution must arrive in a real state that is bisimilar to the initial state of the ideal model. To verify these requirements on the initialization of the HASPOC platform we mainly employ sequential code verification, which can be done since all boot code is executed on the primary processor core while all secondary cores are suspended. We therefore can reduce the weakly consistent concurrent model to a sequential setting where no other actors in the system interfere.
The verification of the bisimulation for the guest runtime phase of the platform covers two cases: execution of a guest process in its virtualized execution environment and switching from the guest into the hypervisor. The two cases mainly depend on the ARMv8 architecture and the configuration of the platform obtained from the initialization phase.
For every possible step of the guest process we need to show that the step preserves the bisimulation relation between the real and ideal model, i.e., it behaves as if it was running on its own isolated machine. As the model of the complete platform is highly complex, we follow the decomposition approach to prove the non-interference properties on the architecture. The model is decomposed into smaller automata which model the execution of the instruction core, MMU, SMMU, peripherals, the interrupt distributor, and the weakly consistent shared memory.
For the switch from EL0 or EL1 to hypervisor mode (EL2) or the Secure World (EL3), it must be shown that the interrupt and exception context switch sources bring the system into a trusted state. The proof of the context switch of the execution level is relatively easy since it only considers the processor core component of the detailed ARMv8 model.
Hypervisor Handler Execution
The final part of proving the real-ideal bisimulation concerns the functional correctness of the hypervisor handlers. Standard Hoare logic-style reasoning about pre- and post-conditions can be employed to show functional correctness. The bisimulation relation couples the abstract hypervisor data structures with their counterparts in memory, but since we in the real model still assume a sequential memory for the execution of handler code we do not need to care about the weak memory model here.
The formal verification described above is modelled as a transition system closely following its C and assembly implementation. However, strictly speaking, the proofs on this model give no guarantees at all for the actual system machine code running on the ARMv8 processor. To close this gap, we apply binary verification techniques on the compiled code.
The formal specification of the hypervisor design is basically a finite state automaton where each transition represents the sequential execution of a portion of the handler code. Case splits and function calls are natural candidates for start or end points of these transitions, i.e., the states of the automaton. The states are then mapped to the portions of the compiled code that implement them.
From the semantics of the real model one can derive pre- and post-conditions for the code. In order to verify that the code C of a handler actually satisfies the post-condition Q for pre-condition P, a technique called weakest pre-condition propagation is used. Intuitively, the weakest precondition for code C and post-condition Q represents the minimal assumptions on the initial state needed, so that C terminates and the resulting state satisfies Q. Then if P implies the weakest precondition, it is a sufficient pre-condition. Weakest precondition propagation now aims to find the weakest precondition by starting at condition Q and going backwards instruction by instruction in C, accumulating the weakest preconditions for each step.
A tool that implements this method for binary programs is BAP (Binary Analysis Platform). BAP defines an intermediate language (BIL) abstracting from any specific architecture, and provides tools for the formal analysis of such programs. The given code C thus has to be translated from ARMv8 machine code into a BIL program. To this end we have developed a so-called lifter based on the ARMv8 hardware model in HOL4. After translating C into BIL we can run the analysis to generate the weakest precondition for Q. Then a SAT solver is used to automatically show that P implies the weakest precondition and we obtain a proven theorem in HOL4.
Boot and Initialization
In this phase only one core is executing with interrupts disabled, so there is no need to consider concurrency or preemption issues. However, a large portion of the code configures peripherals such as the DRAM controller. No models are provided or created for peripherals; hence functional correctness is not verified. Instead it is only verified that the peripherals do not break confidentiality of information.
The code verification focuses mainly on the setup of page tables and copying of data to specified memory locations. Some manual code verification is done because the boot and hypervisor initialization code set their own page tables and activate the MMU. This breaks some of the verification conditions guaranteeing sequential consistency as the code changes the execution context that is assumed by the automated verification with BAP. However, when the new execution context is established, automated code verification can continue in the new context.
Hypervisor Handler Execution
The hypervisor handler code is typically executed concurrently on multiple cores. There are shared resources, such as the interrupt controller, which are accessed concurrently. Shared resources are protected by locks which implement exclusive access to them, i.e. a lock guards the entrance to a so called critical section so that only one handler at a time can manipulate the protected resource. If the locking primitives (acquire, release) indeed guarantee exclusive access, then one can integrate them and the critical sections into the sequential code verification methodology.
In addition to the functional correctness of the handlers one must also consider information flow properties. Execution of a handler for one guest should not be visible to others and the caller should not be able to leak information about other guests through the hypercall. In a weakly consistent memory model, where the shared caches become visible, proving the non-interference for the handlers becomes more complex and we omit a deeper discussion here.
The efficiency of the HASPOC platform has been compared to executing guests without virtualization by using an array of different benchmarking software. Ubuntu server 15.04 was used as guest operating system.
Core CPU performance and memory bandwidth, including cache usage, showed negligible performance decrease while network bandwidth decreased by around 14%. Guest context switching suffered a constant penalty. These issues are probably due to limitations of the interrupt controller used on the test hardware board, requiring the hypervisor to route interrupts.
On a platform where multiple, independent guests share resources it is vital that guests which have requirements on e.g. latency and/or throughput can have their requirements met under all circumstances. For this to be the case there must be mechanisms in place to regulate utilization of shared platform components. The study of these aspects is currently ongoing as an extension to the original HASPOC project.