Enggar Novianto
Universitas Sebelas Maret

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Indonesian Applied Research Computing and Informatics

Improving Thesis Title Classification Accuracy Using Ensemble Classifier and Modified Chi-Square Feature Selection Method Ritzkal; Wahyu Tisno Atmojo; Panji Novantara; Sabir Rosidin; Ahmad Dedi Jubaedi; Enggar Novianto
Indonesian Applied Research Computing and Informatics Vol. 1 No. 1: July (2025)
Publisher : PT. Teras Digital Nusantara

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Text classification of academic documents, particularly thesis titles, poses challenges due to high dimensionality, sparsity, and topic heterogeneity. Conventional feature selection techniques, such as the standard Chi-Square, often fall short in capturing discriminative features effectively. This research aims to enhance classification accuracy by proposing a Modified Chi-Square feature selection method that integrates term frequency and class distribution information. The selected features are then classified using ensemble decision tree algorithms, including Random Forest, Gradient Boosting, and XGBoost. Experiments were conducted on a labeled dataset of thesis titles using TF-IDF for vector representation. Evaluation metrics such as accuracy, precision, recall, F1-score, and AUC were used to assess model performance. The results showed that the combination of Modified Chi-Square and XGBoost outperformed other models, achieving the highest accuracy of 93.8% and an AUC of 0.94. These findings demonstrate that the integration of advanced feature selection and ensemble learning techniques can significantly improve academic text classification performance, providing valuable implications for the development of intelligent digital repositories and recommendation systems.