Meditron-70b: Scaling medical pretraining for large language models Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ... arXiv preprint arXiv:2311.16079, 2023 | 57 | 2023 |
Landmark Attention: Random-Access Infinite Context Length for Transformers A Mohtashami, M Jaggi Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023 | 44 | 2023 |
Masked Training of Neural Networks with Partial Gradients A Mohtashami, M Jaggi, SU Stich The 25th International Conference on Artificial Intelligence and Statistics, 2021 | 25* | 2021 |
Critical parameters for scalable distributed learning with large batches and asynchronous updates S Stich, A Mohtashami, M Jaggi International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021 | 16 | 2021 |
Characterizing & finding good data orderings for fast convergence of sequential gradient methods A Mohtashami, S Stich, M Jaggi arXiv preprint arXiv:2202.01838, 2022 | 12 | 2022 |
The splay-list: A distribution-adaptive concurrent skip-list V Aksenov, D Alistarh, A Drozdova, A Mohtashami 34th International Symposium on Distributed Computing 179, 2020 | 10 | 2020 |
Special Properties of Gradient Descent with Large Learning Rates A Mohtashami, M Jaggi, S Stich ICML 2023, 2022 | 8* | 2022 |
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models A Mohtashami, M Verzetti, PK Rubenstein Practical ML for Developing Countries Workshop @ ICLR 2023, 2023 | 4 | 2023 |
Quarot: Outlier-free 4-bit inference in rotated llms S Ashkboos, A Mohtashami, ML Croci, B Li, M Jaggi, D Alistarh, T Hoefler, ... arXiv preprint arXiv:2404.00456, 2024 | 2 | 2024 |
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging M Pagliardini, A Mohtashami, F Fleuret, M Jaggi arXiv preprint arXiv:2402.02622, 2024 | 1 | 2024 |
Social Learning: Towards Collaborative Learning with Large Language Models A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ... arXiv preprint arXiv:2312.11441, 2023 | 1 | 2023 |
CoTFormer: More Tokens With Attention Make Up For Less Depth A Mohtashami, M Pagliardini, M Jaggi Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023 | 1 | 2023 |
MEDITRON: Open Medical Foundation Models Adapted for Clinical Practice A Bosselut, Z Chen, A Romanou, A Bonnet, A Hernández-Cano, ... | | 2024 |
Reproducibility Report for "On Warm-Starting Neural Network Training" A Mohtashami, E Pajouheshgar, K Kireev ML Reproducibility Challenge 2020, 2021 | | 2021 |
A Gradient-Based Approach to Neural Networks Structure Learning AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi | | 2019 |
TPS (Task Preparation System): A Tool for Developing Tasks in Programming Contests K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH | | 2019 |
MLO J Bachmann Ona, SA Bahreinian, LF Barba Flores, WA Ben Naceur, ... | | |