Garuda - Garba Rujukan Digital

Bulletin of Applied Mathematics and Mathematics Education

Vol. 4 No. 2 (2024)

Wijayanti, Dian Eka (Unknown)
Afriyani, Sintia (Unknown)
Surono, Sugiyarto (Unknown)
Dewi, Deshinta Arrova (Unknown)

Publish Date
15 Oct 2024

This research explores feature selection optimization in semi-supervised text data by utilizing the technique of dividing data into training and testing sets and implementing pseudo-labeling. Proportions of data division, namely 70:30, 80:20, and 90:10, were used as experiments, employing TF-IDF weighting and PSO feature selection. Pseudo-labeling was applied by assigning positive, negative, and neutral labels to the training data to enrich information in the classification model during the testing phase. The research results indicate that the linear SVM model achieved the highest accuracy with a 90:10 data division proportion with a value of 0.9051, followed by Random Forest, which had an accuracy of 0.9254. Although RBF SVM and Poly SVM yielded good results, KNN showed lower performance. These findings emphasize the importance of feature selection strategies and the use of pseudo-labeling to enhance the performance of classification models in semi-supervised text data, offering potential applications across various domains that rely on semi-supervised text analysis.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Bulletin of Applied Mathematics and Mathematics Education

Website

Abbrev

BAMME

Publisher

Universitas Ahmad Dahlan

Subject

Mathematics

Description

BAMME welcomes high-quality manuscripts resulted from a research project in the scope of applied mathematics and mathematics education, which includes, but is not limited to the following topics: Analysis and applied analysis, algebra and applied algebra, logic, geometry, differential equations, ...

Article Info

Abstract

Optimization of feature selection on semi-supervised data

Article Info

Abstract