Authors
Zhiheng Ma, Xiaopeng Hong, Xing Wei, Yunfeng Qiu, Yihong Gong
Publication date
2021
Conference
Proceedings of the IEEE/CVF International Conference on Computer Vision
Pages
3205-3214
Description
This paper proposes to handle the practical problem of learning a universal model for crowd counting across scenes and datasets. We dissect that the crux of this problem is the catastrophic sensitivity of crowd counters to scale shift, which is very common in the real world and caused by factors such as different scene layouts and image resolutions. Therefore it is difficult to train a universal model that can be applied to various scenes. To address this problem, we propose scale alignment as a prime module for establishing a novel crowd counting framework. We derive a closed-form solution to get the optimal image rescaling factors for alignment by minimizing the distances between their scale distributions. A novel neural network together with a loss function based on an efficient sliced Wasserstein distance is also proposed for scale distribution estimation. Benefiting from the proposed method, we have learned a universal model that generally works well on several datasets where can even outperform state-of-the-art models that are particularly fine-tuned for each dataset significantly. Experiments also demonstrate the much better generalizability of our model to unseen scenes.
Total citations
20212022202320242172010
Scholar articles
Z Ma, X Hong, X Wei, Y Qiu, Y Gong - Proceedings of the IEEE/CVF International Conference …, 2021