Artificial Intelligence (AI) has revolutionized educational tools by enabling systems that proactively understand and respond to student needs. ChatGPT, a widely used generative model for education in Indonesia. However, it struggles to classify student questions accurately due to ambiguous phrasing, overlapping sentence structures, and difficulty recognizing intent, which limits its effectiveness as a learning assistant. This study compares the performance of Convolutional Neural Networks (CNN), which extract locally important features from word sequences with Support Vector Machines (SVM) in classifying student questions known for handling high-dimensional data and efficiently finding the optimal hyperplane for text classification. A dataset of 2,797 Indonesian ChatGPT interactions (71% clear vs. 29% unclear) was preprocessed through case folding, stop-word removal, stemming, and tokenisation, followed by data augmentation based on synonyms, which was applied to the minority class to balance the dataset. The models were tuned through grid or random search with prediction testing of the best model using 5-fold cross-validation comparisons across three data splits (70:30, 80:20, and 90:10). Results showed that CNN achieved balanced accuracy, precision, recall, and F1-score of 0.90 on the 90:10 split, outperforming SVM, which plateaued at 0.85 accuracy and dropped to 0.76 in F1-score. The embedded filters of the CNN found generality from lexical variation through the process of augmentation, while the TF-IDF sparse vectors in the SVM failed to maintain this level of semantics. These findings underscore that CNN is more adaptive to diverse data and better suited for integration into ChatGPT-based educational tools, particularly in supporting reliable classification and personalised AI feedback in student learning contexts.
Copyrights © 2025