Kosmos-2: Grounding multimodal large language models to the world Z Peng, W Wang, L Dong, Y Hao, S Huang, S Ma, F Wei arXiv preprint arXiv:2306.14824, 2023 | 537 | 2023 |
Knowledge neurons in pretrained transformers D Dai, L Dong, Y Hao, Z Sui, B Chang, F Wei arXiv preprint arXiv:2104.08696, 2021 | 481 | 2021 |
Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Advances in Neural Information Processing Systems 36, 72096-72109, 2023 | 434 | 2023 |
Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers D Dai, Y Sun, L Dong, Y Hao, S Ma, Z Sui, F Wei arXiv preprint arXiv:2212.10559, 2022 | 327 | 2022 |
Visualizing and understanding the effectiveness of BERT Y Hao, L Dong, F Wei, K Xu arXiv preprint arXiv:1908.05620, 2019 | 238 | 2019 |
Self-attention attribution: Interpreting information interactions inside transformer Y Hao, L Dong, F Wei, K Xu Proceedings of the AAAI Conference on Artificial Intelligence 35 (14), 12963 …, 2021 | 227 | 2021 |
Optimizing prompts for text-to-image generation Y Hao, Z Chi, L Dong, F Wei Advances in Neural Information Processing Systems 36, 2024 | 127 | 2024 |
Language models are general-purpose interfaces Y Hao, H Song, L Dong, S Huang, Z Chi, W Wang, S Ma, F Wei arXiv preprint arXiv:2206.06336, 2022 | 104 | 2022 |
Subhojit Som, Xia Song, and Furu Wei S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Language is not all you need: Aligning perception with language models …, 2023 | 49 | 2023 |
Subhojit Som, Xia Song, and Furu Wei. Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... arXiv preprint arXiv:2302.14045 1 (2), 3, 2023 | 48 | 2023 |
Investigating learning dynamics of BERT fine-tuning Y Hao, L Dong, F Wei, K Xu Proceedings of the 1st Conference of the Asia-Pacific Chapter of the …, 2020 | 47 | 2020 |
Structured prompting: Scaling in-context learning to 1,000 examples Y Hao, Y Sun, L Dong, Z Han, Y Gu, F Wei arXiv preprint arXiv:2212.06713, 2022 | 42 | 2022 |
Prototypical calibration for few-shot learning of language models Z Han, Y Hao, L Dong, Y Sun, F Wei arXiv preprint arXiv:2205.10183, 2022 | 34 | 2022 |
Grounding multimodal large language models to the world Z Peng, W Wang, L Dong, Y Hao, S Huang, S Ma, Q Ye, F Wei The Twelfth International Conference on Learning Representations, 2024 | 32 | 2024 |
Large language model for science: A study on P vs. NP Q Dong, L Dong, K Xu, G Zhou, Y Hao, Z Sui, F Wei arXiv preprint arXiv:2309.05689, 2023 | 14 | 2023 |
Prototypical fine-tuning: Towards robust performance under varying data sizes Y Jin, X Wang, Y Hao, Y Sun, X Xie Proceedings of the AAAI Conference on Artificial Intelligence 37 (11), 12968 …, 2023 | 11 | 2023 |
Learning to sample replacements for electra pre-training Y Hao, L Dong, H Bao, K Xu, F Wei arXiv preprint arXiv:2106.13715, 2021 | 9 | 2021 |
Data Selection via Optimal Control for Language Models Y Gu, L Dong, H Wang, Y Hao, Q Dong, F Wei, M Huang arXiv preprint arXiv:2410.07064, 2024 | 1 | 2024 |
Towards Optimal Learning of Language Models Y Gu, L Dong, Y Hao, Q Dong, M Huang, F Wei arXiv preprint arXiv:2402.17759, 2024 | | 2024 |