Adrián Castelló
Cited by
Cited by
Argobots: A lightweight low-level threading and tasking framework
S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ...
IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017
SLURM support for remote GPU virtualization: Implementation and performance study
S Iserte, A Castelló, R Mayo, ES Quintana-Ortí, F Silla, J Duato, C Reaño, ...
2014 IEEE 26th International Symposium on Computer Architecture and High …, 2014
Improving the User Experience of the rCUDA Remote GPU Virtualization Framework
C Reano, F Silla, A Castelló, AJ Pena, R Mayo, ES Quintana-Ortí, J Duato
High Performance and Portable Convolution Operators for Multicore Processors
P San Juan, A Castelló, MF Dolz, P Alonso-Jordá, ES Quintana-Ortí
SBAC-PAD 2020, 2020
A Review of Lightweight Thread Approaches for High Performance Computing
A Castelló, AJ Peña, S Seo, R Mayo, P Balaji, ES Quintana-Ortí
2016 IEEE International Conference on Cluster Computing (CLUSTER 2016), 471-480, 2016
PyDTNN: a user-friendly and extensible framework for distributed deep learning
S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre
The Journal of Supercomputing 77, 9971-9987, 2021
On the use of remote GPUs and low-power processors for the acceleration of scientific applications
A Castelló, J Duato, R Mayo, AJ Pena, ES Quintana-Ortí, V Roca, F Silla
The Fourth International Conference on Smart Grids, Green Communications and …, 2014
Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks
A Castelló, MF Dolz, ES Quintana-Ortí, J Duato
2nd High Performance Machine Learning Workshop (HPML 2019), 534-541, 2019
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations
A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña
International Conference on Parallel Processing (ICPP-2017), 60-69, 2017
Analysis of model parallelism for distributed neural networks
A Castelló, MF Dolz, ES Quintana-Ortí, J Duato
Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019
Enabling GPU Virtualization in Cloud Environments
S Iserte, FJ Clemente-Castelló, A Castelló, R Mayo, ES Quintana-Ortí
CLOSER 2016, 2016
Reformulating the direct convolution for high-performance deep learning inference on ARM processors
S Barrachina, A Castelló, MF Dolz, TM Low, H Martínez, ES Quintana-Ortí, ...
Journal of Systems Architecture 135, 102806, 2023
Anatomy of the BLIS family of algorithms for matrix multiplication
A Castelló, ES Quintana-Ortí, FD Igual
2022 30th Euromicro International Conference on Parallel, Distributed and …, 2022
Accelerating distributed deep neural network training with pipelined MPI allreduce
A Castelló, ES Quintana-Ortí, J Duato
Cluster Computing 24 (4), 3797-3813, 2021
A flexible research-oriented framework for distributed training of deep neural networks
S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre
2021 IEEE International Parallel and Distributed Processing Symposium …, 2021
GLT: A unified API for lightweight thread libraries
A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña
Euro-Par 2017: Parallel Processing: 23rd International Conference on …, 2017
Programming parallel dense matrix factorizations with look-ahead and OpenMP
S Catalán, A Castelló, FD Igual, R Rodríguez-Sánchez, ES Quintana-Ortí
Cluster Computing 23, 359-375, 2020
On the adequacy of lightweight thread approaches for high-level parallel programming models
A Castelló, R Mayo, K Sala, V Beltran, P Balaji, AJ Peña
Future Generation Computer Systems 84, 22-31, 2018
Exploiting task-parallelism on GPU clusters via OmpSs and rCUDA virtualization
A Castelló, R Mayo, J Planas, ES Quintana-Ortí
The 1st IEEE International Workshop on Reengineering for Parallelism in …, 2015
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS
A Castelló, S Barrachina, MF Dolz, ES Quintana-Ortí, P San Juan, ...
Journal of Systems Architecture 125, 102459, 2022
The system can't perform the operation now. Try again later.
Articles 1–20