View article

[PDF] from thecvf.com

Rich feature hierarchies for accurate object detection and semantic segmentation

Authors

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

Publication date

2014

Conference

Proceedings of the IEEE conference on computer vision and pattern recognition

Pages

580-587

Description

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights:(1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www. cs. berkeley. edu/~ rbg/rcnn.

Total citations

Cited by 38180

20142015201620172018201920202021202220232024239 895 1477 2217 3195 4269 4648 5397 5560 5617 3011

Scholar articles

Rich feature hierarchies for accurate object detection and semantic segmentation

R Girshick, J Donahue, T Darrell, J Malik - Proceedings of the IEEE conference on computer …, 2014

Cited by 38180 Related articles All 51 versions