• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter

Vol. 9, No.10, October 31, 2015
10.3837/tiis.2015.10.017, Download Paper (Free):

Abstract

A separation of text and non-text elements plays an important role in document layout analysis. A number of approaches have been proposed but the quality of separation result is still limited due to the complex of the document layout. In this paper, we present an efficient method for the classification of text and non-text components in document image. It is the combination of whitespace analysis with multi-layer homogeneous regions which called recursive filter. Firstly, the input binary document is analyzed by connected components analysis and whitespace extraction. Secondly, a heuristic filter is applied to identify non-text components. After that, using statistical method, we implement the recursive filter on multilayer homogeneous regions to identify all text and non-text elements of the binary image. Finally, all regions will be reshaped and remove noise to get the text document and non-text document. Experimental results on the ICDAR2009 page segmentation competition dataset and other datasets prove the effectiveness and superiority of proposed method.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
Tuan-Anh Tran, In-Seop Na and Soo-Hyung Kim, "Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter," KSII Transactions on Internet and Information Systems, vol. 9, no. 10, pp. 4072-4091, 2015. DOI: 10.3837/tiis.2015.10.017

[ACM Style]
Tran, T., Na, I., and Kim, S. 2015. Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter. KSII Transactions on Internet and Information Systems, 9, 10, (2015), 4072-4091. DOI: 10.3837/tiis.2015.10.017