Follow
Torsten Hoefler
Title
Cited by
Cited by
Year
Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
T Ben-Nun, T Hoefler
ACM Computing Surveys (CSUR) 52 (4), 1-43, 2019
7072019
The convergence of sparsified gradient methods
D Alistarh, T Hoefler, M Johansson, N Konstantinov, S Khirirat, C Renggli
Advances in Neural Information Processing Systems 31, 2018
4762018
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
T Hoefler, D Alistarh, T Ben-Nun, N Dryden, A Peste
The Journal of Machine Learning Research 22 (1), 10882-11005, 2021
4542021
MPI: A Message-Passing Interface Standard
MPI Forum
Technical Report, 2012
439*2012
Slim fly: A cost effective low-diameter network topology
M Besta, T Hoefler
SC'14: proceedings of the international conference for high performance …, 2014
3262014
Characterizing the influence of system noise on large-scale applications by simulation
T Hoefler, T Schneider, A Lumsdaine
SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010
3102010
The PERCS high-performance interconnect
B Arimilli, R Arimilli, V Chung, S Clark, W Denzel, B Drerup, T Hoefler, ...
2010 18th IEEE Symposium on High Performance Interconnects, 75-82, 2010
2942010
Generic topology mapping strategies for large-scale parallel architectures
T Hoefler, M Snir
Proceedings of the international conference on Supercomputing, 75-84, 2011
2912011
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
T Hoefler, R Belli
Proceedings of the international conference for high performance computing …, 2015
2812015
Implementation and performance analysis of non-blocking collective operations for MPI
T Hoefler, A Lumsdaine, W Rehm
Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 1-10, 2007
2762007
Neural code comprehension: A learnable representation of code semantics
T Ben-Nun, AS Jakobovits, T Hoefler
Advances in Neural Information Processing Systems 31, 2018
2542018
LogGOPSim: simulating large-scale applications in the LogGOPS model
T Hoefler, T Schneider, A Lumsdaine
Proceedings of the 19th ACM International Symposium on High Performance …, 2010
2182010
Augment your batch: Improving generalization through instance repetition
E Hoffer, T Ben-Nun, I Hubara, N Giladi, T Hoefler, D Soudry
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020
1832020
Using automated performance modeling to find scalability bugs in complex codes
A Calotoiu, T Hoefler, M Poke, F Wolf
Proceedings of the International Conference on High Performance Computing …, 2013
1772013
Dare: High-performance state machine replication on rdma networks
M Poke, T Hoefler
Proceedings of the 24th International Symposium on High-Performance Parallel …, 2015
1762015
Using advanced MPI: Modern features of the message-passing interface
W Gropp, T Hoefler, R Thakur, E Lusk
MIT Press, 2014
1742014
To push or to pull: On reducing communication and synchronization in graph computations
M Besta, M Podstawski, L Groner, E Solomonik, T Hoefler
Proceedings of the 26th International Symposium on High-Performance Parallel …, 2017
1622017
Enabling highly-scalable remote memory access programming with MPI-3 one sided
R Gerstenberger, M Besta, T Hoefler
Proceedings of the International Conference on High Performance Computing …, 2013
1622013
Multistage switches are not crossbars: Effects of static routing in high-performance networks
T Hoefler, T Schneider, A Lumsdaine
2008 IEEE International Conference on Cluster Computing, 116-125, 2008
1592008
Message progression in parallel computing-to thread or not to thread?
T Hoefler, A Lumsdaine
2008 IEEE International Conference on Cluster Computing, 213-222, 2008
1502008
The system can't perform the operation now. Try again later.
Articles 1–20