A Systematic Literature Review on Application of Text Embedding Techniques in NLP
| dc.contributor.author | Muchori, Juliet Gathoni | |
| dc.contributor.author | Maina, Mwangi Peter | |
| dc.date.accessioned | 2026-02-10T08:32:40Z | |
| dc.date.issued | 2025-12 | |
| dc.description.abstract | In many natural language processing (NLP) tasks such as sentiment analysis, text classification, information retrieval, and topic modeling text embeddings or representation techniques are foundational. These methods convert raw text into numerical form so that machine learning or deep learning algorithms can work with them. Among the most widely used are TF-IDF a sparse, frequency-based representation, static embeddings like GloVe which capture word co-occurrence statistics globally but are non‐ contextual, and contextual transformer embeddings such as BERT, RoBERTa, and DistilBERT which encode words in context, enabling modeling of polysemy, syntax, and semantics in more dynamic ways. This review aims to systematically examine and compare existing models and studies from 2020-2025 that use these embedding/representation techniques individually or in hybrid form in classification-type tasks. Specifically, its objectives are: To review classification models using TF-IDF and assess their strengths and; To examine classification models that combine or use TF-IDF + GloVe /static embeddings, detailing how they improve semantic modeling but potentially suffer from interpretability or domain mismatch issues; To assess work with contextual embeddings (BERT, RoBERTa, DistilBERT), including their performance gains, computational/resource costs, and potential; Barbara Kitchenham guidelines will be used and journal papers to be review will be from scholar journal such as IEEE, Elsevier, Springer, ACM Digital library, Citeseer Library, arXiv, Wiley. It was noted that TFIDF suffers from Sematic meaning of words which can be solved by Glove, BERT, Word2Vec or GloVe. For future works there is one promising direction which is to combine TF-IDF with semantic embedding methods, such as word embeddings or contextual embeddings like BERT, to capture both lexical and contextual features of text. | |
| dc.identifier.issn | 2320-7639 | |
| dc.identifier.other | https://doi.org/10.26438/ijsrcse.v13i6.791 | |
| dc.identifier.uri | https://www.researchgate.net/profile/Juliet-Muchori/publication/399523546_A_Systematic_Literature_Review_on_Application_of_Text_Embedding_Techniques_in_NLP/links/695df76aa1fd01798911965a/A-Systematic-Literature-Review-on-Application-of-Text-Embedding-Techniques-in-NLP.pdf | |
| dc.identifier.uri | https://repository.mnu.ac.ke/handle/123456789/94 | |
| dc.language.iso | en_US | |
| dc.publisher | International Journal of Scientific Research in Computer Science and Engineering | |
| dc.subject | TF-IDF | |
| dc.subject | GloVe | |
| dc.subject | BERT | |
| dc.subject | RoBERTa | |
| dc.subject | DistilBERT | |
| dc.subject | text embedding | |
| dc.subject | Machine learning | |
| dc.subject | NLP | |
| dc.subject | Word2Vec | |
| dc.title | A Systematic Literature Review on Application of Text Embedding Techniques in NLP | |
| dc.type | Article |
