Vol. 14, No. 4, April 30, 2020
10.3837/tiis.2020.04.001,
Download Paper (Free):
Abstract
In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
J. Zhang, J. Zhang, S. Ma, J. Yang, G. Gui, "Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data," KSII Transactions on Internet and Information Systems, vol. 14, no. 4, pp. 1400-1418, 2020. DOI: 10.3837/tiis.2020.04.001.
[ACM Style]
Jie Zhang, Jianing Zhang, Shuhao Ma, Jie Yang, and Guan Gui. 2020. Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data. KSII Transactions on Internet and Information Systems, 14, 4, (2020), 1400-1418. DOI: 10.3837/tiis.2020.04.001.
[BibTeX Style]
@article{tiis:23421, title="Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data", author="Jie Zhang and Jianing Zhang and Shuhao Ma and Jie Yang and Guan Gui and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2020.04.001}, volume={14}, number={4}, year="2020", month={April}, pages={1400-1418}}