View article

[PDF] from epfl.ch

Large-scale graph processing on FPGAs with caches for thousands of simultaneous misses

Authors

Mikhail Asiatici, Paolo Ienne

Publication date

2021/6/14

Conference

2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)

Pages

609-622

Publisher

IEEE

Description

Efficient large-scale graph processing is crucial to many disciplines. Yet, while graph algorithms naturally expose massive parallelism opportunities, their performance is limited by the memory system because of irregular memory accesses. State-of-the-art FPGA graph processors, such as ForeGraph and FabGraph, address the memory issues by using scratchpads and regularly streaming edges from DRAM, but then they end up wasting bandwidth on unneeded data. Yet, where classic caches and scratchpads fail to deliver, FPGAs make powerful unorthodox solutions possible. In this paper, we resort to extreme nonblocking caches that handle tens of thousands of outstanding read misses. They significantly increase the ability of memory systems to coalesce multiple accelerator accesses into fewer DRAM memory requests; essentially, when latency is not the primary concern, they bring the advantages expected …

Total citations

Cited by 23

20212022202320242 7 8 6

Scholar articles

Large-scale graph processing on FPGAs with caches for thousands of simultaneous misses

M Asiatici, P Ienne - 2021 ACM/IEEE 48th Annual International Symposium …, 2021