• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features


Abstract

Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
Sangwon Hwang, Jang-Eui Hong and Young-Kwang Nam, "Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features," KSII Transactions on Internet and Information Systems, vol. 13, no. 3, pp. 1639-1658, 2019. DOI: 10.3837/tiis.2019.03.030

[ACM Style]
Hwang, S., Hong, J., and Nam, Y. 2019. Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features. KSII Transactions on Internet and Information Systems, 13, 3, (2019), 1639-1658. DOI: 10.3837/tiis.2019.03.030