Deep learning for post-processing ensemble weather forecasts P Grönquist, C Yao, T Ben-Nun, N Dryden, P Dueben, S Li, T Hoefler Philosophical Transactions of the Royal Society A 379 (2194), 20200092, 2021 | 185 | 2021 |
Data Movement Is All You Need: A Case Study on Optimizing Transformers A Ivanov, N Dryden, T Ben-Nun, S Li, T Hoefler Proceedings of Machine Learning and Systems 3, 2021 | 149 | 2021 |
Chimera: efficiently training large-scale neural networks with bidirectional pipelines S Li, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2021 | 123 | 2021 |
NUMA-aware shared-memory collective communication for MPI S Li, T Hoefler, M Snir Proceedings of the 22nd international symposium on High-performance parallel …, 2013 | 120 | 2013 |
Parallel processing systems for big data: a survey Y Zhang, T Cao, S Li, X Tian, L Yuan, H Jia, AV Vasilakos Proceedings of the IEEE 104 (11), 2114-2136, 2016 | 116 | 2016 |
CAS‐ESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) version 2 H Zhang, M Zhang, J Jin, K Fei, D Ji, C Wu, J Zhu, J He, Z Chai, J Xie, ... Journal of Advances in Modeling Earth Systems, e2020MS002210, 2020 | 82 | 2020 |
Taming unbalanced training workloads in deep learning with partial collective operations S Li, T Ben-Nun, SD Girolamo, D Alistarh, T Hoefler Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of …, 2020 | 65 | 2020 |
Asynchronous Decentralized SGD with Quantized and Local Updates G Nadiradze, A Sabour, P Davies, S Li, D Alistarh Advances in Neural Information Processing Systems 34, 2021 | 57 | 2021 |
Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods T Yao, J Wang, H Wu, P Zhang, S Li, K Xu, X Liu, X Chi IEEE Transactions on Sustainable Energy, 2021 | 53 | 2021 |
Near-optimal sparse allreduce for distributed deep learning S Li, T Hoefler Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022 | 47 | 2022 |
Flare: flexible in-network allreduce D De Sensi, S Di Girolamo, S Ashkboos, S Li, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2021 | 46 | 2021 |
Improved MPI collectives for MPI processes in shared address spaces S Li, T Hoefler, C Hu, M Snir Cluster Computing 17 (4), 1139-1155, 2014 | 34 | 2014 |
Efficient quantized sparse matrix operations on tensor cores S Li, K Osawa, T Hoefler SC22: International Conference for High Performance Computing, Networking …, 2022 | 32 | 2022 |
A photovoltaic power output dataset: Multi-source photovoltaic power output dataset with Python toolkit T Yao, J Wang, H Wu, P Zhang, S Li, Y Wang, X Chi, M Shi Solar Energy 230, 122-130, 2021 | 32 | 2021 |
Cache-oblivious MPI all-to-all communications based on Morton order S Li, Y Zhang, T Hoefler IEEE Transactions on Parallel and Distributed Systems, 2018 | 31 | 2018 |
Kernel optimization for short-range molecular dynamics C Hu, X Wang, J Li, X He, S Li, Y Feng, S Yang, H Bai Computer Physics Communications, 2016 | 21 | 2016 |
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices K Osawa, S Li, T Hoefler Proceedings of Machine Learning and Systems 5, 2023 | 20 | 2023 |
Efficient parallel optimizations of a high-performance SIFT on GPUs Z Li, H Jia, Y Zhang, S Liu, S Li, X Wang, H Zhang Journal of Parallel and Distributed Computing, 2018 | 20 | 2018 |
Hammingmesh: A network topology for large-scale deep learning T Hoefler, T Bonato, D De Sensi, S Di Girolamo, S Li, M Heddes, J Belk, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 19 | 2022 |
Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms D Cheng, S Li, Z Hanping, F Xia, Y Zhang IEEE Transactions on Parallel and Distributed Systems, 2021 | 19* | 2021 |