Vol. 15, No. 3, March 31, 2021
10.3837/tiis.2021.03.006,
Download Paper (Free):
Abstract
Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All-reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies — allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA’s collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
H. Choi, Y. Kim, J. Lee, Y. Kim, "Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment," KSII Transactions on Internet and Information Systems, vol. 15, no. 3, pp. 911-931, 2021. DOI: 10.3837/tiis.2021.03.006.
[ACM Style]
HyeonSeong Choi, Youngrang Kim, Jaehwan Lee, and Yoonhee Kim. 2021. Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment. KSII Transactions on Internet and Information Systems, 15, 3, (2021), 911-931. DOI: 10.3837/tiis.2021.03.006.
[BibTeX Style]
@article{tiis:24357, title="Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment", author="HyeonSeong Choi and Youngrang Kim and Jaehwan Lee and Yoonhee Kim and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2021.03.006}, volume={15}, number={3}, year="2021", month={March}, pages={911-931}}