Wei Xiong

Cited by

	All	Since 2019
Citations	770	770
h-index	16	16
i10-index	18	18

520

260

130

390

202120222023202419 38 208 504

Public access

View all

7 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Tong ZhangUIUCVerified email at tongzhang-ml.org
Han ZhongPeking UniversityVerified email at stu.pku.edu.cn
Hanze DongSalesforce ResearchVerified email at salesforce.com
Chengshuai ShiElectrical and Computer Engineering, University of VirginiaVerified email at virginia.edu
Jipeng ZhangHong Kong University of Science and TechnologyVerified email at connect.ust.hk
Cong ShenAssociate Professor, University of VirginiaVerified email at virginia.edu
Shizhe DiaoNVIDIA ResearchVerified email at nvidia.com
Yong LinPrinceton UniversityVerified email at princeton.edu
Liwei WangProfessor, Peking UniversityVerified email at cis.pku.edu.cn
Zhaoran WangAssociate Professor at Northwestern UniversityVerified email at northwestern.edu
Zhuoran YangYale UniversityVerified email at yale.edu
Rui PanUIUCVerified email at illinois.edu
KaShun SHUMThe Hong Kong University of Science and TechnologyVerified email at connect.ust.hk
Chenlu YeComputer Science, University of Illinois Urbana-ChampaignVerified email at illinois.edu
Hangyu LinFudan UniversityVerified email at fudan.edu.cn
Jing YangAssociate Professor of Electrical Engineering, Penn State UniversityVerified email at psu.edu
Haoxiang WangResearch Scientist, NVIDIAVerified email at illinois.edu
Han ZhaoAssistant Professor of Computer Science, University of Illinois at Urbana-ChampaignVerified email at illinois.edu
Nan JiangAssociate Professor of Computer Science, UIUCVerified email at illinois.edu
Deepanshu GoyalVisiting Student, Singapore University of Technology and DesignVerified email at rgipt.ac.in

Wei Xiong

Other names熊伟

Computer Science, University of Illinois Urbana-Champaign

Verified email at illinois.edu - Homepage

Learning Theory RLHF


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Raft: Reward ranked finetuning for generative foundation model alignment H Dong, W Xiong, D Goyal, Z Yihan, C Winnie, R Pan, S Diao, J Zhang, ... TMLR, 2023	214	2023
Mitigating the Alignment Tax of RLHF Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ... arXiv preprint arXiv:2309.06256, 2023	58*	2023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang ICML 2024, 2023	53*	2023
Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang arXiv preprint arXiv:2211.01962, 2022	53*	2022
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang ICLR 2023, 2022	44	2022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang ICML 2022, 2022	43	2022
Lmflow: An extensible toolkit for finetuning and inference of large foundation models S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang NAACL 2024, Best Demo Paper Award, 2023	41	2023
Decentralized multi-player multi-armed bandits with no collision information C Shi, W Xiong, C Shen, J Yang AISTATS 2020, 2020	41	2020
Maximize to explore: One objective function fusing estimation, planning, and exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang NeurIPS 2023 36, 2024	27*	2024
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games W Xiong, H Zhong, C Shi, C Shen, T Zhang ICML 2022, 2022	25	2022
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization C Shi, W Xiong, C Shen, J Yang NeurIPS 2021, 2021	24	2021
RLHF Workflow: From Reward Modeling to Online RLHF H Dong, W Xiong, B Pang, H Wang, H Zhao, Y Zhou, N Jiang, D Sahoo, ... TMLR, 2024	21	2024
Distributional reinforcement learning for multi-dimensional reward functions P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu NeurIPS 2021, 2021	20	2021
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes C Ye, W Xiong, Q Gu, T Zhang ICML 2023, 2022	19	2022
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang ACL 2024, 2024	17	2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model C Ye, W Xiong, Y Zhang, N Jiang, T Zhang arXiv preprint arXiv:2402.07314, 2024	16*	2024
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction H Ye, W Xiong, T Zhang arXiv preprint arXiv:2012.15010, 2020	16	2020
DPO Meets PPO: Reinforced Token Optimization for RLHF H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang arXiv preprint arXiv:2404.18922, 2024	14	2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts H Wang, W Xiong, T Xie, H Zhao, T Zhang arXiv preprint arXiv:2406.12845, 2024	8	2024
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang ECCV 2024, 2024	7	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors