Authors
Xing Wei, Yue Zhang, Yihong Gong, Nanning Zheng
Publication date
2018
Conference
Proceedings of the IEEE conference on computer vision and pattern recognition
Pages
1867-1875
Description
Representing local image patches in an invariant and discriminative manner is an active research topic in computer vision. It has recently been demonstrated that local feature learning based on deep Convolutional Neural Network (CNN) can significantly improve the matching performance. Previous works on learning such descriptors have focused on developing various loss functions, regularizations and data mining strategies to learn discriminative CNN representations. Such methods, however, have little analysis on how to increase geometric invariance of their generated descriptors. In this paper, we propose a descriptor that has both highly invariant and discriminative power. The abilities come from a novel pooling method, dubbed Subspace Pooling (SP) which is invariant to a range of geometric deformations. To further increase the discriminative power of our descriptor, we propose a simple distance kernel integrated to the marginal triplet loss that helps to focus on hard examples in CNN training. Finally, we show that by combining SP with the projection distance metric, the generated feature descriptor is equivalent to that of the Bilinear CNN model, but outperforms the latter with much lower memory and computation consumptions. The proposed method is simple, easy to understand and achieves good performance. Experimental results on several patch matching benchmarks show that our method outperforms the state-of-the-arts significantly.
Total citations
20182019202020212022202320246111382410
Scholar articles
X Wei, Y Zhang, Y Gong, N Zheng - Proceedings of the IEEE conference on computer …, 2018