Authors
Mikhail Asiatici
Publication date
2021
Issue
8050
Publisher
EPFL
Description
Even if Dennard scaling came to an end fifteen years ago, Moore's law kept fueling an exponential growth in compute performance through increased parallelization. However, the performance of memory and, in particular, Dynamic Random Access Memory (DRAM), has been increasing at a slower pace for decades, making memory system optimization increasingly crucial. Conventional solutions mitigate the issue by shifting as many memory accesses as possible from off-chip DRAM to on-chip Static RAM (SRAM) memory, which has higher performance but lower capacity. This is achieved by relying on spatial and temporal locality or on precise compile-time information about the access pattern. However, when the access pattern is irregular and data-dependent, these solutions are ineffective and the processing-memory gap grows even wider as DRAMs themselves are optimized for sequential accesses. In this thesis, we present a novel memory system for throughput-oriented compute engines that perform irregular read accesses to DRAM. When accesses are irregular, we acknowledge that obtaining a reasonable benefit from on-chip memory may be unrealistic; therefore, we focus on minimizing stalls and reusing each memory response to serve as many misses as possible without relying on long-term data storage. This is the same insight behind nonblocking caches but on a vastly larger scale in terms of outstanding misses, which greatly increases the opportunities for data reuse when accelerators emit a large number of outstanding reads. Because we optimize miss handling rather than increasing hit rate, we call our architecture miss …