Follow
Wei Xiong
Wei Xiong
Verified email at illinois.edu - Homepage
Title
Cited by
Cited by
Year
Raft: Reward ranked finetuning for generative foundation model alignment
H Dong, W Xiong, D Goyal, Z Yihan, C Winnie, R Pan, S Diao, J Zhang, ...
TMLR, 2023
2142023
Mitigating the Alignment Tax of RLHF
Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ...
arXiv preprint arXiv:2309.06256, 2023
58*2023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICML 2024, 2023
53*2023
Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
53*2022
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
ICLR 2023, 2022
442022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
ICML 2022, 2022
432022
Lmflow: An extensible toolkit for finetuning and inference of large foundation models
S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang
NAACL 2024, Best Demo Paper Award, 2023
412023
Decentralized multi-player multi-armed bandits with no collision information
C Shi, W Xiong, C Shen, J Yang
AISTATS 2020, 2020
412020
Maximize to explore: One objective function fusing estimation, planning, and exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
NeurIPS 2023 36, 2024
27*2024
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
ICML 2022, 2022
252022
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
C Shi, W Xiong, C Shen, J Yang
NeurIPS 2021, 2021
242021
RLHF Workflow: From Reward Modeling to Online RLHF
H Dong, W Xiong, B Pang, H Wang, H Zhao, Y Zhou, N Jiang, D Sahoo, ...
TMLR, 2024
212024
Distributional reinforcement learning for multi-dimensional reward functions
P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu
NeurIPS 2021, 2021
202021
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
C Ye, W Xiong, Q Gu, T Zhang
ICML 2023, 2022
192022
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang
ACL 2024, 2024
172024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
C Ye, W Xiong, Y Zhang, N Jiang, T Zhang
arXiv preprint arXiv:2402.07314, 2024
16*2024
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction
H Ye, W Xiong, T Zhang
arXiv preprint arXiv:2012.15010, 2020
162020
DPO Meets PPO: Reinforced Token Optimization for RLHF
H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang
arXiv preprint arXiv:2404.18922, 2024
142024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
H Wang, W Xiong, T Xie, H Zhao, T Zhang
arXiv preprint arXiv:2406.12845, 2024
82024
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang
ECCV 2024, 2024
72024
The system can't perform the operation now. Try again later.
Articles 1–20