Vol. 13, No. 1, January 31, 2019
10.3837/tiis.2019.01.015,
Download Paper (Free):
Abstract
Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, the researchers are paying much attention to Document Summarization. The key point in any successful document summarizer is a good document representation. The traditional approaches based on word overlapping mostly fail to produce that kind of representation. Word embedding has shown good performance allowing words to match on a semantic level. Naively concatenating word embeddings makes common words dominant which in turn diminish the representation quality. In this paper, we employ word embeddings to improve the weighting schemes for calculating the Latent Semantic Analysis input matrix. Two embedding-based weighting schemes are proposed and then combined to calculate the values of this matrix. They are modified versions of the augment weight and the entropy frequency that combine the strength of traditional weighting schemes and word embedding. The proposed approach is evaluated on three English datasets, DUC 2002, DUC 2004 and Multilingual 2015 Single-document Summarization. Experimental results on the three datasets show that the proposed model achieved competitive performance compared to the state-of-the-art leading to a conclusion that it provides a better document representation and a better document summary as a result.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
K. Al-Sabahi, Z. Zuping, Y. Kang, "Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings," KSII Transactions on Internet and Information Systems, vol. 13, no. 1, pp. 254-276, 2019. DOI: 10.3837/tiis.2019.01.015.
[ACM Style]
Kamal Al-Sabahi, Zhang Zuping, and Yang Kang. 2019. Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings. KSII Transactions on Internet and Information Systems, 13, 1, (2019), 254-276. DOI: 10.3837/tiis.2019.01.015.
[BibTeX Style]
@article{tiis:21980, title="Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings", author="Kamal Al-Sabahi and Zhang Zuping and Yang Kang and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2019.01.015}, volume={13}, number={1}, year="2019", month={January}, pages={254-276}}