Nzekon Nzeko'o, Armel Jacques
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Class-Oriented Text Vectorization for Text Classification: Case Study of Job Offer Classification Wabo Tatchum, Ghislain; Nzekon Nzeko'o, Armel Jacques; Sosso Makembe, Fritz; Youh Djam, Xaviera
Journal of Computer Science and Engineering (JCSE) Vol 5, No 2: August (2024)
Publisher : ICSE (Institute of Computer Sciences and Engineering)

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Advances in data science have made it possible to solve many real-life problems using automatic text classification applications. This is the case in e-recruitment, where job offers are classified and recommended to jobseekers. In natural language processing, text classification involves a vectorization step, whereby each document is represented as a vector of coordinates linked to a keyword. Those keywords are obtained by vectorizing the entire corpus, and are used to distinguish one document from another in the corpus. However, it is preferable for each keyword to distinguish one class from another. To obtain these types of keywords, the authors consider the class of documents in the vectorization process. They first create a class-oriented document for each class by merging all documents from the same class, and then apply a vectorization algorithm. Experiments are carried out using datasets from Minajobs, Nigham, and Monster with the classification models Decision Tree, Naive Bayes, Support Vector Machine, and a deep neural network self-attention transformer (TFM). The vectorization methods used on class-oriented documents are Doc2Vec and TF-IDF combined with our class-oriented vectorization strategies, including OC, ZIPF, and OWDC. To evaluate these experiments, we used the precision, MAP, and F1-Score metrics. According to the results, the TFM methods can improve accuracy by 29, 40, and 33% compared to previous work and the traditional way of classifying text documents. The NB methods can improve accuracy by 19, 22, and 20%, while the DT methods can improve accuracy by 34, 37, and 34%. The SVM methods can improve accuracy by 33, 34, and 34% in the Monster, Nigham, and Minajobs datasets. In addition, we validate our contribution by comparing ourselves with three other works in the literature using four datasets (RE'16, Wap, WebKB, and Kla) and obtain improvements in accuracy and F1-score up to 55%.