GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher Y Yuan, W Jiao, W Wang, J Huang, P He*, S Shi, Z Tu ICLR 2024, 2023 | 89 | 2023 |
All languages matter: On the multilingual safety of large language models W Wang, Z Tu, C Chen, Y Yuan, J Huang, W Jiao, MR Lyu ACL 2024 Findings, 2023 | 26 | 2023 |
On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs J Huang, W Wang, EJ Li, MH Lam, S Ren, Y Yuan, W Jiao, Z Tu, MR Lyu ICLR 2024 (Oral), 2023 | 26* | 2023 |
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments J Huang, EJ Li, MH Lam, T Liang, W Wang, Y Yuan, W Jiao, X Wang, Z Tu, ... arXiv preprint arXiv:2403.11807, 2024 | 11 | 2024 |
A & b== b & a: Triggering logical reasoning failures in large language models Y Wan, W Wang, Y Yang, Y Yuan, J Huang, P He, W Jiao, MR Lyu arXiv preprint arXiv:2401.00757, 2024 | 10 | 2024 |
New Job, New Gender? Measuring the Social Bias in Image Generation Models W Wang, H Bai, J Huang, Y Wan, Y Yuan, H Qiu, N Peng, MR Lyu MM 2024 (Oral), 2024 | 2 | 2024 |
The earth is flat? unveiling factual errors in large language models W Wang, J Shi, Z Tu, Y Yuan, J Huang, W Jiao, MR Lyu arXiv preprint arXiv:2401.00761, 2024 | 2 | 2024 |
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training Y Yuan, W Jiao, W Wang, J Huang, J Xu, T Liang, P He, Z Tu arXiv preprint arXiv:2407.09121, 2024 | 1 | 2024 |
On the Resilience of Multi-Agent Systems with Malicious Agents J Huang, J Zhou, T Jin, X Zhou, Z Chen, W Wang, Y Yuan, M Sap, MR Lyu arXiv preprint arXiv:2408.00989, 2024 | | 2024 |
Does ChatGPT Know That It Does Not Know? Evaluating the Black-Box Calibration of ChatGPT Y Yuan, W Wang, Q Guo, Y Xiong, C Shen, P He COLING 2024 (Oral), 5191-5201, 2024 | | 2024 |