• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

Vol. 10, No. 9, September 29, 2016
10.3837/tiis.2016.09.003, Download Paper (Free):

Abstract

Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
N. M. F. Qureshi and D. R. Shin, "RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment," KSII Transactions on Internet and Information Systems, vol. 10, no. 9, pp. 4063-4086, 2016. DOI: 10.3837/tiis.2016.09.003.

[ACM Style]
Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin. 2016. RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment. KSII Transactions on Internet and Information Systems, 10, 9, (2016), 4063-4086. DOI: 10.3837/tiis.2016.09.003.

[BibTeX Style]
@article{tiis:21206, title="RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment", author="Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2016.09.003}, volume={10}, number={9}, year="2016", month={September}, pages={4063-4086}}