Vivit: A video vision transformer A Arnab, M Dehghani, G Heigold, C Sun, M Lučić, C Schmid Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 2400 | 2021 |
Attention bottlenecks for multimodal fusion A Nagrani, S Yang, A Arnab, A Jansen, C Schmid, C Sun Advances in neural information processing systems 34, 14200-14213, 2021 | 585 | 2021 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 536 | 2024 |
Scaling vision transformers to 22 billion parameters M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ... International Conference on Machine Learning, 7480-7512, 2023 | 439 | 2023 |
Simple open-vocabulary object detection M Minderer, A Gritsenko, A Stone, M Neumann, D Weissenborn, ... European Conference on Computer Vision, 728-755, 2022 | 431 | 2022 |
On the robustness of semantic segmentation models to adversarial attacks A Arnab, O Miksik, PHS Torr IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 888-897, 2018 | 363 | 2018 |
Higher order conditional random fields in deep neural networks A Arnab, S Jayasumana, S Zheng, PHS Torr Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The …, 2016 | 314* | 2016 |
Multiview transformers for video recognition S Yan, X Xiong, A Arnab, Z Lu, M Zhang, C Sun, C Schmid Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 294 | 2022 |
Pixelwise instance segmentation with a dynamically instantiated network A Arnab, PHS Torr Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 294 | 2017 |
Exploiting temporal context for 3D human pose estimation in the wild A Arnab, C Doersch, A Zisserman Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 277 | 2019 |
Dual graph convolutional network for semantic segmentation L Zhang, X Li, A Arnab, K Yang, Y Tong, PHS Torr arXiv preprint arXiv:1909.06121, 2019 | 246 | 2019 |
Weakly-and Semi-Supervised Panoptic Segmentation Q Li, A Arnab, PHS Torr Proceedings of the European Conference on Computer Vision (ECCV), 102-118, 2018 | 213 | 2018 |
End-to-end generative pretraining for multimodal video captioning PH Seo, A Nagrani, A Arnab, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 186 | 2022 |
Conditional random fields meet deep neural networks for semantic segmentation: Combining probabilistic graphical models with deep learning for structured prediction A Arnab, S Zheng, S Jayasumana, B Romera-Paredes, M Larsson, ... IEEE Signal Processing Magazine 35 (1), 37-52, 2018 | 172 | 2018 |
Dynamic graph message passing networks L Zhang, D Xu, A Arnab, PHS Torr Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 149 | 2020 |
Tokenlearner: Adaptive space-time tokenization for videos M Ryoo, AJ Piergiovanni, A Arnab, M Dehghani, A Angelova Advances in neural information processing systems 34, 12786-12797, 2021 | 148 | 2021 |
Pali-x: On scaling up a multilingual vision and language model X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ... arXiv preprint arXiv:2305.18565, 2023 | 137 | 2023 |
Tokenlearner: What can 8 learned tokens do for images and videos? MS Ryoo, AJ Piergiovanni, A Arnab, M Dehghani, A Angelova arXiv preprint arXiv:2106.11297, 2021 | 120 | 2021 |
The efficiency misnomer M Dehghani, A Arnab, L Beyer, A Vaswani, Y Tay arXiv preprint arXiv:2110.12894, 2021 | 105 | 2021 |
Learning with neighbor consistency for noisy labels A Iscen, J Valmadre, A Arnab, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 82 | 2022 |