Authors
Paolo Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, Luigi Raffo, Luca Benini
Publication date
2016/5/16
Book
Proceedings of the ACM International Conference on Computing Frontiers
Pages
376-383
Description
Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled …
Total citations
201620172018201920202021202220232024234842121
Scholar articles
P Meloni, G Deriu, F Conti, I Loi, L Raffo, L Benini - Proceedings of the ACM International Conference on …, 2016