• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design

Vol. 16, No. 11, November 30, 2022
10.3837/tiis.2022.11.009, Download Paper (Free):

Abstract

Big data analytics offers endless opportunities for operational enhancement by extracting valuable insights from complex voluminous data. Hadoop is a comprehensive technological suite which offers solutions for the large scale storage and computing needs of Big data. The performance of Hadoop is closely tied with its configuration settings which depends on the cluster capacity and the application profile. Since Hadoop has over 190 configuration parameters, tuning them to gain optimal application performance is a daunting challenge. Our approach is to extract a subset of impactful parameters from which the performance enhancing sub-optimal configuration is then narrowed down. This paper presents a statistical model to analyze the significance of the effect of Hadoop parameters on a variety of performance metrics. Our model decomposes the total observed performance variation and ascribes them to the main parameters, their interaction effects and noise factors. The method clearly segregates impactful parameters from the rest. The configuration setting determined by our methodology has reduced the Job completion time by 22%, resource utilization in terms of memory and CPU by 15% and 12% respectively, the number of killed Maps by 50% and Disk spillage by 23%. The proposed technique can be leveraged to ease the configuration tuning task of any Hadoop cluster despite the differences in the underlying infrastructure and the application running on it.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
R. S. Priya, A. J. Prakash, V. R. Uthariaraj, "An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design," KSII Transactions on Internet and Information Systems, vol. 16, no. 11, pp. 3619-3637, 2022. DOI: 10.3837/tiis.2022.11.009.

[ACM Style]
R. Sathia Priya, A. John Prakash, and V. Rhymend Uthariaraj. 2022. An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design. KSII Transactions on Internet and Information Systems, 16, 11, (2022), 3619-3637. DOI: 10.3837/tiis.2022.11.009.

[BibTeX Style]
@article{tiis:38003, title="An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design", author="R. Sathia Priya and A. John Prakash and V. Rhymend Uthariaraj and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2022.11.009}, volume={16}, number={11}, year="2022", month={November}, pages={3619-3637}}