Follow
Wenhai Wang (王文海)
Wenhai Wang (王文海)
CUHK | Shanghai AI Laboratory | NJU
Verified email at cuhk.edu.hk - Homepage
Title
Cited by
Year
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang, H Tian, L Lu, X Zhu, T Lu, ...
arXiv preprint arXiv:2407.15838, 2024
2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
P Zhang, X Dong, Y Zang, Y Cao, R Qian, L Chen, Q Guo, H Duan, ...
arXiv preprint arXiv:2407.03320, 2024
2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ...
arXiv preprint arXiv:2406.08418, 2024
2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang, Z Chen, X Zhu, L Lu, T Lu, ...
arXiv preprint arXiv:2406.08394, 2024
12024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
C Yang, X Zhu, J Zhu, W Su, J Wang, X Dong, W Wang, L Lu, B Li, J Zhou, ...
arXiv preprint arXiv:2406.07543, 2024
2024
Needle In A Multimodal Haystack
W Wang, S Zhang, Y Ren, Y Duan, T Li, S Liu, M Hu, Z Chen, K Zhang, ...
arXiv preprint arXiv:2406.07230, 2024
2024
LLMs Meet Multimodal Generation and Editing: A Survey
Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu, R Yuan, Y Xing, W Wang, ...
arXiv preprint arXiv:2405.19334, 2024
32024
VLG: General Video Recognition with Web Textual Knowledge
J Lin, Z Liu, W Wang, W Wu, L Wang
International Journal of Computer Vision, 1-26, 2024
12024
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ...
arXiv preprint arXiv:2404.16821, 2024
542024
Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd
X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, S Zhang, H Duan, ...
arXiv preprint arXiv:2404.06512, 2024
362024
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Y Yang, W Wang, Z Chen, J Dai, L Zheng
International Conference on Learning Representation (ICLR), 2024
2024
Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures
Y Duan, W Wang, Z Chen, X Zhu, L Lu, T Lu, Y Qiao, H Li, J Dai, W Wang
arXiv preprint arXiv:2403.02308, 2024
162024
The all-seeing project v2: Towards general relation comprehension of the open world
W Wang, Y Ren, H Luo, T Li, C Yan, Z Chen, W Wang, Q Li, L Lu, X Zhu, ...
arXiv preprint arXiv:2402.19474, 2024
122024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Y Mu, J Chen, Q Zhang, S Chen, Q Yu, C Ge, R Chen, Z Liang, M Hu, ...
arXiv preprint arXiv:2402.16117, 2024
42024
Mm-interleaved: Interleaved image-text generative modeling via multi-modal feature synchronizer
C Tian, X Zhu, Y Xiong, W Wang, Z Chen, W Wang, Y Chen, L Lu, T Lu, ...
arXiv preprint arXiv:2401.10208, 2024
212024
Feature Selection Based on Intrusive Outliers Rather Than All Instances
L Yuan, C Mei, W Wang, T Lu
IEEE Transactions on Image Processing (TIP), 2024
2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Y Xiong, Z Li, Y Chen, F Wang, X Zhu, J Luo, W Wang, T Lu, H Li, Y Qiao, ...
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
152024
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks
Z Chen, J Wu, W Wang, W Su, G Chen, S Xing, Z Muyan, Q Zhang, X Zhu, ...
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
124*2024
Avsegformer: Audio-visual segmentation with transformer
S Gao, Z Chen, G Chen, W Wang, T Lu
AAAI Conference on Artificial Intelligence (AAAI), 2024
242024
The all-seeing project: Towards panoptic visual recognition and understanding of the open world
W Wang, M Shi, Q Li, W Wang, Z Huang, L Xing, Z Chen, H Li, X Zhu, ...
International Conference on Learning Representation (ICLR), 2024
402024
The system can't perform the operation now. Try again later.
Articles 1–20