MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark H Liu, Z Zheng, Y Qiao, H Duan, Z Fei, F Zhou, W Zhang, S Zhang, D Lin, ... arXiv preprint arXiv:2405.12209, 2024 | 5 | 2024 |
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, X Dong, Y Zang, P Zhang, ... arXiv preprint arXiv:2407.11691, 2024 | 1 | 2024 |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Y Qiao, H Duan, X Fang, J Yang, L Chen, S Zhang, J Wang, D Lin, ... arXiv preprint arXiv:2406.14544, 2024 | 1 | 2024 |