Bulletin of Applied Mathematics and Mathematics Education
Vol. 4 No. 2 (2024)

Optimization of feature selection on semi-supervised data

Wijayanti, Dian Eka (Unknown)
Afriyani, Sintia (Unknown)
Surono, Sugiyarto (Unknown)
Dewi, Deshinta Arrova (Unknown)



Article Info

Publish Date
15 Oct 2024

Abstract

This research explores feature selection optimization in semi-supervised text data by utilizing the technique of dividing data into training and testing sets and implementing pseudo-labeling. Proportions of data division, namely 70:30, 80:20, and 90:10, were used as experiments, employing TF-IDF weighting and PSO feature selection. Pseudo-labeling was applied by assigning positive, negative, and neutral labels to the training data to enrich information in the classification model during the testing phase. The research results indicate that the linear SVM model achieved the highest accuracy with a 90:10 data division proportion with a value of 0.9051, followed by Random Forest, which had an accuracy of 0.9254. Although RBF SVM and Poly SVM yielded good results, KNN showed lower performance. These findings emphasize the importance of feature selection strategies and the use of pseudo-labeling to enhance the performance of classification models in semi-supervised text data, offering potential applications across various domains that rely on semi-supervised text analysis.

Copyrights © 2024






Journal Info

Abbrev

BAMME

Publisher

Subject

Mathematics

Description

BAMME welcomes high-quality manuscripts resulted from a research project in the scope of applied mathematics and mathematics education, which includes, but is not limited to the following topics: Analysis and applied analysis, algebra and applied algebra, logic, geometry, differential equations, ...