A Systematic Literature Review on Application of Text Embedding Techniques in NLP

dc.contributor.authorMuchori, Juliet Gathoni
dc.contributor.authorMaina, Mwangi Peter
dc.date.accessioned2026-02-10T08:32:40Z
dc.date.issued2025-12
dc.description.abstractIn many natural language processing (NLP) tasks such as sentiment analysis, text classification, information retrieval, and topic modeling text embeddings or representation techniques are foundational. These methods convert raw text into numerical form so that machine learning or deep learning algorithms can work with them. Among the most widely used are TF-IDF a sparse, frequency-based representation, static embeddings like GloVe which capture word co-occurrence statistics globally but are non‐ contextual, and contextual transformer embeddings such as BERT, RoBERTa, and DistilBERT which encode words in context, enabling modeling of polysemy, syntax, and semantics in more dynamic ways. This review aims to systematically examine and compare existing models and studies from 2020-2025 that use these embedding/representation techniques individually or in hybrid form in classification-type tasks. Specifically, its objectives are: To review classification models using TF-IDF and assess their strengths and; To examine classification models that combine or use TF-IDF + GloVe /static embeddings, detailing how they improve semantic modeling but potentially suffer from interpretability or domain mismatch issues; To assess work with contextual embeddings (BERT, RoBERTa, DistilBERT), including their performance gains, computational/resource costs, and potential; Barbara Kitchenham guidelines will be used and journal papers to be review will be from scholar journal such as IEEE, Elsevier, Springer, ACM Digital library, Citeseer Library, arXiv, Wiley. It was noted that TFIDF suffers from Sematic meaning of words which can be solved by Glove, BERT, Word2Vec or GloVe. For future works there is one promising direction which is to combine TF-IDF with semantic embedding methods, such as word embeddings or contextual embeddings like BERT, to capture both lexical and contextual features of text.
dc.identifier.issn2320-7639
dc.identifier.otherhttps://doi.org/10.26438/ijsrcse.v13i6.791
dc.identifier.urihttps://www.researchgate.net/profile/Juliet-Muchori/publication/399523546_A_Systematic_Literature_Review_on_Application_of_Text_Embedding_Techniques_in_NLP/links/695df76aa1fd01798911965a/A-Systematic-Literature-Review-on-Application-of-Text-Embedding-Techniques-in-NLP.pdf
dc.identifier.urihttps://repository.mnu.ac.ke/handle/123456789/94
dc.language.isoen_US
dc.publisherInternational Journal of Scientific Research in Computer Science and Engineering
dc.subjectTF-IDF
dc.subjectGloVe
dc.subjectBERT
dc.subjectRoBERTa
dc.subjectDistilBERT
dc.subjecttext embedding
dc.subjectMachine learning
dc.subjectNLP
dc.subjectWord2Vec
dc.titleA Systematic Literature Review on Application of Text Embedding Techniques in NLP
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abstract.pdf
Size:
108.53 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: