• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Pattern mining for large distributed dataset: A parallel approach (PMLDD)

Vol. 12, No. 11, November 29, 2018
10.3837/tiis.2018.11.007 , Download Paper (Free):

Abstract

Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
A. Pal and M. Kumar, "Pattern mining for large distributed dataset: A parallel approach (PMLDD)," KSII Transactions on Internet and Information Systems, vol. 12, no. 11, pp. 5287-5303, 2018. DOI: 10.3837/tiis.2018.11.007 .

[ACM Style]
Amrit Pal and Manish Kumar. 2018. Pattern mining for large distributed dataset: A parallel approach (PMLDD). KSII Transactions on Internet and Information Systems, 12, 11, (2018), 5287-5303. DOI: 10.3837/tiis.2018.11.007 .

[BibTeX Style]
@article{tiis:21919, title="Pattern mining for large distributed dataset: A parallel approach (PMLDD)", author="Amrit Pal and Manish Kumar and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2018.11.007 }, volume={12}, number={11}, year="2018", month={November}, pages={5287-5303}}