Time for memory disaggregation?
It is widely known that utilization of memory is low in datacenters. Servers are overprovisioned with memory to adapt to the worst-case scenarios. What if we can reduce the amount of unused memory in a way that can be shared between servers?
Memory disaggregation decouples memory from compute servers and enables applications to access both local and remote memories. It still has some challenges that need to be addressed, for instance, how the memory is accessed, how the increase of latency impacts performance and how the memory is shared between different servers, just to name a few.
Current technologies do not allow full memory disaggregation, mainly due to the fact that processors need extremely fast access to memory. However, we envision that servers will retain a certain percentage of local memory for performance constraints and use disaggregated memory as an additional level of memory. Local DRAM can be used as a cache or fast temporary memory.
This partial disaggregation is in line with previous studies that show that certain cloud applications, like memcache or spark, require only a small fraction of local memory to keep the same performance. Therefore, the latency increase in memory access introduced by the fabric could be compensated by the design of the application. But the re-design of applications would not be enough without recent data center network improvements, where we now see networks of 10/40/100 Gbps, allowing high bandwidth and low latency interconnection of servers. These two aspects are key for a memory disaggregation system.
Creating a fully functional system is one part of the equation, while understanding its cost impact is another. Taking this into consideration, in order to have a cost advantage with disaggregated memory, the additional cost of interconnects should be low. By multiplexing the non-used memory between servers in order to accommodate the peak usage, we can reduce the total amount of memory at the rack or data center level. This reduction will be directly related to the overall utilization, i.e. if the average utilization is low, there is no need to have all the memory. In this sense, the CAPEX cost saving will be the amount saved in purchasing memory minus the cost of the additional infrastructure required to support memory disaggregation. The graph bellow illustrates the costs saving (Y axis) depending on the utilization of memory (X axis) and the cost of the infrastructure in relation to the total cost of the memory without disaggregation (20%-80%). Taking into consideration that average memory utilization is low (below 50% - as pointed out in, for example, in the paper Analysis of server utilization and resource consumption of the Google data centre trace log), there is a cost incentive for data centers that run servers with less than 80% of memory utilization, which is usually the case.
We have been studying how the performance of applications is affected by using memory disaggregation. Before an application is moved to these scenarios, it should be properly evaluated. Nevertheless, native applications that have the notion of local and remote memory could leverage on memory disaggregation in order to scale.
Another consideration that needs to be resolved is how the infrastructure is made available. Should the application aware of the location of the memory? Or should the operating system hide the complexity and be transparent to the application? Probably both cases have different use-cases. We intend to publish more data from our studies in the near future.
Coming back to the initial question: Is it time for memory disaggregation? Full memory disaggregation is not likely to happen soon, but partial disaggregation can surely become a reality. With the right mixture of local and remote memory ratios and with the latency of current generation interconnect technologies, memory disaggregation is possible today.