Categorical reparameterization with gumbel-softmax E Jang, S Gu, B Poole arXiv preprint arXiv:1611.01144, 2016 | 2965 | 2016 |
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates S Gu, E Holly, T Lillicrap, S Levine 2017 IEEE international conference on robotics and automation (ICRA), 3389-3396, 2017 | 1215 | 2017 |
Continuous deep q-learning with model-based acceleration S Gu, T Lillicrap, I Sutskever, S Levine International conference on machine learning, 2829-2838, 2016 | 934 | 2016 |
Continuous deep q-learning with model-based acceleration S Gu, T Lillicrap, I Sutskever, S Levine International conference on machine learning, 2829-2838, 2016 | 934 | 2016 |
Towards deep neural network architectures robust to adversarial examples S Gu, L Rigazio arXiv preprint arXiv:1412.5068, 2014 | 722 | 2014 |
Data-efficient hierarchical reinforcement learning O Nachum, SS Gu, H Lee, S Levine Advances in neural information processing systems 31, 2018 | 472 | 2018 |
Q-prop: Sample-efficient policy gradient with an off-policy critic S Gu, T Lillicrap, Z Ghahramani, RE Turner, S Levine arXiv preprint arXiv:1611.02247, 2016 | 305 | 2016 |
Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control N Jaques, S Gu, D Bahdanau, JM Hernández-Lobato, RE Turner, D Eck International Conference on Machine Learning, 1645-1654, 2017 | 221* | 2017 |
Temporal difference models: Model-free deep rl for model-based control V Pong, S Gu, M Dalal, S Levine arXiv preprint arXiv:1802.09081, 2018 | 192 | 2018 |
Dynamics-aware unsupervised discovery of skills A Sharma, S Gu, S Levine, V Kumar, K Hausman arXiv preprint arXiv:1907.01657, 2019 | 168 | 2019 |
Human-centric dialog training via offline reinforcement learning N Jaques, JH Shen, A Ghandeharioun, C Ferguson, A Lapedriza, ... arXiv preprint arXiv:2010.05848, 2020 | 151* | 2020 |
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning SS Gu, T Lillicrap, RE Turner, Z Ghahramani, B Schölkopf, S Levine Advances in neural information processing systems 30, 2017 | 147 | 2017 |
Muprop: Unbiased backpropagation for stochastic neural networks S Gu, S Levine, I Sutskever, A Mnih arXiv preprint arXiv:1511.05176, 2015 | 131 | 2015 |
Neural adaptive sequential monte carlo SS Gu, Z Ghahramani, RE Turner Advances in neural information processing systems 28, 2015 | 126 | 2015 |
Neural adaptive sequential monte carlo SS Gu, Z Ghahramani, RE Turner Advances in neural information processing systems 28, 2015 | 126 | 2015 |
Near-optimal representation learning for hierarchical reinforcement learning O Nachum, S Gu, H Lee, S Levine arXiv preprint arXiv:1810.01257, 2018 | 122 | 2018 |
A divergence minimization perspective on imitation learning methods SKS Ghasemipour, R Zemel, S Gu Conference on Robot Learning, 1259-1277, 2020 | 113 | 2020 |
Leave no trace: Learning to reset for safe and autonomous reinforcement learning B Eysenbach, S Gu, J Ibarz, S Levine arXiv preprint arXiv:1711.06782, 2017 | 97 | 2017 |
The mirage of action-dependent baselines in reinforcement learning G Tucker, S Bhupatiraju, S Gu, R Turner, Z Ghahramani, S Levine International conference on machine learning, 5015-5024, 2018 | 93 | 2018 |
Language as an abstraction for hierarchical deep reinforcement learning Y Jiang, SS Gu, KP Murphy, C Finn Advances in Neural Information Processing Systems 32, 2019 | 91 | 2019 |