Amirkeivan Mohtashami

Cited by

	All	Since 2019
Citations	181	181
h-index	7	7
i10-index	6	6

120

20212022202320243 14 59 103

Public access

View all

2 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Martin JaggiEPFLVerified email at epfl.ch
Sebastian Urban StichCISPA Helmholtz Center for Information SecurityVerified email at cispa.de
Matteo PagliardiniEPFLVerified email at epfl.ch
Dan AlistarhIST AustriaVerified email at ist.ac.at
Paul K RubensteinGoogle DeepMindVerified email at google.com
Saleh AshkboosETH ZurichVerified email at inf.ethz.ch
Florian HartmannGoogle ResearchVerified email at google.com
Matt SharifiGoogleVerified email at google.com
Mohammad RoghaniPhD student, Stanford UniversityVerified email at stanford.edu
Ehsan PajouheshgarPhD Student, EPFLVerified email at epfl.ch

Amirkeivan Mohtashami

EPFL

Verified email at epfl.ch

long context large language models efficient transformers neural network optimization


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Meditron-70b: Scaling medical pretraining for large language models Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ... arXiv preprint arXiv:2311.16079, 2023	57	2023
Landmark Attention: Random-Access Infinite Context Length for Transformers A Mohtashami, M Jaggi Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023	44	2023
Masked Training of Neural Networks with Partial Gradients A Mohtashami, M Jaggi, SU Stich The 25th International Conference on Artificial Intelligence and Statistics, 2021	25*	2021
Critical parameters for scalable distributed learning with large batches and asynchronous updates S Stich, A Mohtashami, M Jaggi International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021	16	2021
Characterizing & finding good data orderings for fast convergence of sequential gradient methods A Mohtashami, S Stich, M Jaggi arXiv preprint arXiv:2202.01838, 2022	12	2022
The splay-list: A distribution-adaptive concurrent skip-list V Aksenov, D Alistarh, A Drozdova, A Mohtashami 34th International Symposium on Distributed Computing 179, 2020	10	2020
Special Properties of Gradient Descent with Large Learning Rates A Mohtashami, M Jaggi, S Stich ICML 2023, 2022	8*	2022
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models A Mohtashami, M Verzetti, PK Rubenstein Practical ML for Developing Countries Workshop @ ICLR 2023, 2023	4	2023
Quarot: Outlier-free 4-bit inference in rotated llms S Ashkboos, A Mohtashami, ML Croci, B Li, M Jaggi, D Alistarh, T Hoefler, ... arXiv preprint arXiv:2404.00456, 2024	2	2024
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging M Pagliardini, A Mohtashami, F Fleuret, M Jaggi arXiv preprint arXiv:2402.02622, 2024	1	2024
Social Learning: Towards Collaborative Learning with Large Language Models A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ... arXiv preprint arXiv:2312.11441, 2023	1	2023
CoTFormer: More Tokens With Attention Make Up For Less Depth A Mohtashami, M Pagliardini, M Jaggi Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023	1	2023
MEDITRON: Open Medical Foundation Models Adapted for Clinical Practice A Bosselut, Z Chen, A Romanou, A Bonnet, A Hernández-Cano, ...		2024
Reproducibility Report for "On Warm-Starting Neural Network Training" A Mohtashami, E Pajouheshgar, K Kireev ML Reproducibility Challenge 2020, 2021		2021
A Gradient-Based Approach to Neural Networks Structure Learning AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi		2019
TPS (Task Preparation System): A Tool for Developing Tasks in Programming Contests K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH		2019
MLO J Bachmann Ona, SA Bahreinian, LF Barba Flores, WA Ben Naceur, ...

The system can't perform the operation now. Try again later.

Articles 1–17

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors