Regret bounds for risk-sensitive reinforcement learning O Bastani, JY Ma, E Shen, W Xu Advances in Neural Information Processing Systems 35, 36259-36269, 2022 | 17 | 2022 |
Uniformly conservative exploration in reinforcement learning W Xu, Y Ma, K Xu, H Bastani, O Bastani International Conference on Artificial Intelligence and Statistics, 10856-10870, 2023 | 16* | 2023 |
Gaps of summands of the Zeckendorf lattice N Borade, D Cai, DZ Chang, B Fang, A Liang, SJ Miller, W Xu arXiv preprint arXiv:1909.01935, 2019 | 8 | 2019 |
Shattering the agent-environment interface for fine-tuning inclusive language models W Xu, S Dong, D Arumugam, B Van Roy arXiv preprint arXiv:2305.11455, 2023 | 6 | 2023 |
Distribution of eigenvalues of matrix ensembles arising from wigner and palindromic toeplitz blocks K Blackwell, N Borade, A Bose, CD VI, N Luntzlara, R Ma, SJ Miller, ... arXiv preprint arXiv:2102.05839, 2021 | 4 | 2021 |
Pearl: A Production-ready Reinforcement Learning Agent Z Zhu, RS Braz, J Bhandari, D Jiang, Y Wan, Y Efroni, L Wang, R Xu, ... arXiv preprint arXiv:2312.03814, 2023 | 2 | 2023 |
Distribution of eigenvalues of random real symmetric block matrices K Blackwell, N Borade, CD VI, N Luntzlara, R Ma, SJ Miller, M Wang, ... arXiv preprint arXiv:1908.03834, 2019 | 1 | 2019 |
Exploration Unbound D Arumugam, W Xu, B Van Roy arXiv preprint arXiv:2407.12178, 2024 | | 2024 |
RLHF and IIA: Perverse Incentives W Xu, S Dong, X Lu, G Lam, Z Wen, B Van Roy ICML 2024 Workshop on Models of Human Feedback for AI Alignment, 2023 | | 2023 |
Posterior Sampling for Continuing Environments W Xu, S Dong, B Van Roy arXiv preprint arXiv:2211.15931, 2022 | | 2022 |