Authors
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, Olivier Temam
Publication date
2014/12/13
Conference
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
Pages
609-622
Publisher
IEEE
Description
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications …
Total citations
20152016201720182019202020212022202320243411016226925331024624918469
Scholar articles
Y Chen, T Luo, S Liu, S Zhang, L He, J Wang, L Li… - 2014 47th Annual IEEE/ACM International Symposium …, 2014