• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Similarity Evaluation and Fine-Tuning of Embedding Models from Various Linguistic Perspectives

Vol. 19, No. 9, September 30, 2025
10.3837/tiis.2025.09.008, Download Paper (Free):

Abstract

Recent advancements in embedding models have brought significant progress in the field of natural language processing (NLP). In particular, embedding models play a critical role in the Retrieval-Augmented Generation (RAG) pipeline by enabling the effective retrieval of similar documents, thus mitigating the hallucination phenomenon in large language models (LLMs). Against this background, the selection of public embedding models and the optimization of their performance through fine-tuning have emerged as pivotal challenges. This study conducts a linguistic evaluation of various publicly available embedding models and addresses performance deficiencies in specific tasks through the application of fine-tuning techniques. The experiments covered five linguistic tasks, and for tasks with poor performance, fine-tuning of the embedding models resulted in an average improvement of 9.4%. The findings of this research are expected to provide valuable insights into the selection and fine-tuning of embedding models for NLP applications and RAG pipelines.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
H. Kang, "Similarity Evaluation and Fine-Tuning of Embedding Models from Various Linguistic Perspectives," KSII Transactions on Internet and Information Systems, vol. 19, no. 9, pp. 2963-2983, 2025. DOI: 10.3837/tiis.2025.09.008.

[ACM Style]
Hanhoon Kang. 2025. Similarity Evaluation and Fine-Tuning of Embedding Models from Various Linguistic Perspectives. KSII Transactions on Internet and Information Systems, 19, 9, (2025), 2963-2983. DOI: 10.3837/tiis.2025.09.008.

[BibTeX Style]
@article{tiis:103310, title="Similarity Evaluation and Fine-Tuning of Embedding Models from Various Linguistic Perspectives", author="Hanhoon Kang and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.09.008}, volume={19}, number={9}, year="2025", month={September}, pages={2963-2983}}