Authors
Xinhua Cheng, Mengxi Jia, Qian Wang, Jian Zhang
Publication date
2022/5/26
Journal
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Publisher
IEEE
Description
Pedestrian attribute recognition (PAR), which aims to identify attributes of the pedestrians captured in video surveillance, is a challenging task due to the poor quality of images and diverse spatial distribution among attributes. Existing methods usually model PAR as a multi-label classification problem and manually map attributes to an ordered list corresponding to the outputs of classifiers or sequential models. However, the inherent textual information among attribute annotations is largely neglected in these visual-only methods. In this paper, we first alleviate this issue by proposing a novel visual-textual baseline (VTB) for PAR which introduces an additional textual modality to explore the textual semantic correlations from attribute annotations by pre-trained textual encoders instead of human definitions. VTB encodes pedestrian images and attribute annotations into visual and textual features respectively, interacts …
Total citations
20222023202421314
Scholar articles
X Cheng, M Jia, Q Wang, J Zhang - IEEE Transactions on Circuits and Systems for Video …, 2022