Authors
Karen Ullrich, Edward Meeds, Max Welling
Publication date
2017/2/13
Journal
arXiv preprint arXiv:1702.04008
Description
The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.
Total citations
201720182019202020212022202320241962687786726319
Scholar articles
K Ullrich, E Meeds, M Welling - arXiv preprint arXiv:1702.04008, 2017