Attention is all you need A Vaswani arXiv preprint arXiv:1706.03762, 2017 | 142687 | 2017 |
Relational inductive biases, deep learning, and graph networks PW Battaglia, JB Hamrick, V Bapst, A Sanchez-Gonzalez, V Zambaldi, ... arXiv preprint arXiv:1806.01261, 2018 | 3817 | 2018 |
Self-attention with relative position representations P Shaw, J Uszkoreit, A Vaswani arXiv preprint arXiv:1803.02155, 2018 | 2628 | 2018 |
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018 | 2001 | 2018 |
Advances in neural information processing systems A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Attention is all you need, 2017 | 1911 | 2017 |
Attention Is All You Need.(Nips), 2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762 10, S0140525X16001837, 2017 | 1459 | 2017 |
Attention augmented convolutional networks I Bello, B Zoph, A Vaswani, J Shlens, QV Le Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 1348 | 2019 |
Stand-alone self-attention in vision models P Ramachandran, N Parmar, A Vaswani, I Bello, A Levskaya, J Shlens Advances in neural information processing systems 32, 2019 | 1330 | 2019 |
Bottleneck transformers for visual recognition A Srinivas, TY Lin, N Parmar, J Shlens, P Abbeel, A Vaswani Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 1224 | 2021 |
Music transformer CZA Huang, A Vaswani, J Uszkoreit, N Shazeer, I Simon, C Hawthorne, ... arXiv preprint arXiv:1809.04281, 2018 | 926 | 2018 |
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018 | 637 | 2018 |
Efficient content-based sparse attention with routing transformers A Roy, M Saffar, A Vaswani, D Grangier Transactions of the Association for Computational Linguistics 9, 53-68, 2021 | 555 | 2021 |
Scaling local self-attention for parameter efficient visual backbones A Vaswani, P Ramachandran, A Srinivas, N Parmar, B Hechtman, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 453 | 2021 |
Mesh-tensorflow: Deep learning for supercomputers N Shazeer, Y Cheng, N Parmar, D Tran, A Vaswani, P Koanantakool, ... Advances in neural information processing systems 31, 2018 | 410 | 2018 |
Learning whom to trust with MACE D Hovy, T Berg-Kirkpatrick, A Vaswani, E Hovy Proceedings of the 2013 Conference of the North American Chapter of the …, 2013 | 408 | 2013 |
One model to learn them all L Kaiser, AN Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, ... arXiv preprint arXiv:1706.05137, 2017 | 396 | 2017 |
Attention is all you need. CoRR abs/1706.03762 (2017) A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... | 303 | 2017 |
Decoding with large-scale neural language models improves translation A Vaswani, Y Zhao, V Fossum, D Chiang Proceedings of the 2013 conference on empirical methods in natural language …, 2013 | 300 | 2013 |
Proceedings of the 31st international conference on neural information processing systems A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Curran Associates Inc., Red Hook, NY, USA, 2017 | 244 | 2017 |
Relational inductive biases, deep learning, and graph networks. arXiv 2018 PW Battaglia, JB Hamrick, V Bapst, A Sanchez-Gonzalez, V Zambaldi, ... arXiv preprint arXiv:1806.01261, 2018 | 223 | 2018 |