This study investigates the application of text mining techniques in Automatic Short Answer Grading (ASAG) by comparing five textual similarity methods: Cosine Similarity, Jaccard Similarity, Dice’s Coefficient, Overlap Coefficient, and Matching Coefficient. The dataset consists of five definition-based questions answered by 25 students in a Human–Computer Interaction course. The data were preprocessed using case folding, tokenization, stop word removal, and stemming. The results show that Cosine Similarity achieved the highest similarity score of 67.00%, followed by Overlap Coefficient (66.67%) and Dice’s Coefficient (63.16%), while Jaccard Similarity and Matching Coefficient produced lower scores of 46.15%. These findings indicate that vector-based similarity methods are more effective in handling variations in sentence structure and keyword usage compared to set-based approaches, particularly for definition-based short answers. This study provides a comparative evaluation of multiple lexical similarity methods within a unified experimental setting, offering practical insights for selecting appropriate techniques in ASAG applications.
Copyrights © 2026