Claim Missing Document
Check
Articles

Found 3 Documents
Search

Optimization of feature selection on semi-supervised data Wijayanti, Dian Eka; Afriyani, Sintia; Surono, Sugiyarto; Dewi, Deshinta Arrova
Bulletin of Applied Mathematics and Mathematics Education Vol. 4 No. 2 (2024)
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/bamme.v4i1.11104

Abstract

This research explores feature selection optimization in semi-supervised text data by utilizing the technique of dividing data into training and testing sets and implementing pseudo-labeling. Proportions of data division, namely 70:30, 80:20, and 90:10, were used as experiments, employing TF-IDF weighting and PSO feature selection. Pseudo-labeling was applied by assigning positive, negative, and neutral labels to the training data to enrich information in the classification model during the testing phase. The research results indicate that the linear SVM model achieved the highest accuracy with a 90:10 data division proportion with a value of 0.9051, followed by Random Forest, which had an accuracy of 0.9254. Although RBF SVM and Poly SVM yielded good results, KNN showed lower performance. These findings emphasize the importance of feature selection strategies and the use of pseudo-labeling to enhance the performance of classification models in semi-supervised text data, offering potential applications across various domains that rely on semi-supervised text analysis.
Chi-Square Feature Selection with Pseudo-Labelling in Natural Language Processing Afriyani, Sintia; Surono, Sugiyarto; Solihin, Iwan Mahmud
JTAM (Jurnal Teori dan Aplikasi Matematika) Vol 8, No 3 (2024): July
Publisher : Universitas Muhammadiyah Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31764/jtam.v8i3.22751

Abstract

This study aims to evaluate the effectiveness of the Chi-Square feature selection method in improving the classification accuracy of linear Support Vector Machine, K-Nearest Neighbors and Random Forest in natural language processing when combined with classification algorithms as well as introducing Pseudo-Labelling techniques to improve semi-supervised classification performance. This research is important in the context of NLP as accurate feature selection can significantly improve model performance by reducing data noise and focusing on the most relevant information, while Pseudo-Labelling techniques help maximise unlabelled data, which is particularly useful when labelled data is sparse. The research methodology involves collecting relevant datasets, thus applying the Chi-Square method to filter out significant features, and applying Pseudo-Labelling techniques to train semi-supervised models. In this study, the dataset used in this research is the text data of public comments related to the 2024 Presidential General Election, which is obtained from the Twitter scrapping process. The characteristics of this dataset include various comments and opinions from the public related to presidential candidates, including political views, support, and criticism of these candidates. The experimental results show a significant improvement in classification accuracy to 0.9200, with precision of 0.8893, recall of 0.9200, and F1-score of 0.8828. The integration of Pseudo-Labelling techniques prominently improves the performance of semi-supervised classification, suggesting that the combination of Chi-Square and Pseudo-Labelling methods can improve classification systems in various natural language processing applications. This opens up opportunities to develop more efficient methodologies in improving classification accuracy and effectiveness in natural language processing tasks, especially in the domains of linear Support Vector Machine, K-Nearest Neighbors and Random Forest well as semi-supervised learning.
A GENETIC ALGORITHM–PARTICLE SWARM OPTIMIZATION OPTIMIZED DOFCM APPROACH TO ENHANCE CLUSTERING AND OUTLIER DETECTION Afriyani, Sintia; Fajriyah, Rohmatul
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 2 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss2pp1453-1472

Abstract

In the era of Industry 4.0, Big Data from the IoT demands advanced analysis techniques. Outlier detection is vital as anomalies may indicate sensor failures, fraud, or abnormal medical records. Fuzzy clustering methods such as DOFCM are often applied, yet their performance depends on accurate cluster center placement, which remains challenging. While several Fuzzy C-Means extensions address outlier sensitivity, most rely on single optimization strategies. The integration of PSO and GA into DOFCM has been rarely explored, making this study novel in evaluating how different evolutionary algorithms enhance clustering robustness and anomaly detection. This research introduces DOFCM-PSO and DOFCM-GA, tested on five benchmark datasets with outliers: Iris, Wine, Sonar, Diabetes, and Ionosphere. The Silhouette Coefficient (SC) was used as the evaluation metric. Results show that GA consistently outperforms PSO, with SC values improving by approximately 0.02–0.03 (equivalent to an increase of 8–12%) across datasets. For instance, the Iris dataset improved from 0.6029 (PSO) to 0.6291 (GA), while the Wine dataset increased from 0.2759 to 0.2958. In addition, evaluation of computational time and outlier detection further supports these findings. Although GA required slightly longer runtime than PSO, it substantially reduced the number of outliers while still achieving higher SC values. A similar pattern was observed in the Diabetes dataset, where GA decreased outliers from 20 to 7 with a modest SC improvement. These results indicate that PSO is more efficient in runtime, but GA provides more robust clustering by minimizing anomalies and producing better separation quality. Despite promising results, this study is limited by the relatively small dataset sizes and sensitivity to parameter settings, which may influence outcomes. Future work should apply the method to larger datasets and include additional clustering indices. Overall, DOFCM-GA can be considered a robust approach for fuzzy clustering in the presence of anomalies.