Fatrisna Salsabila, Reni
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Comparison of Text Representation for Clustering Student Concept Maps Fatrisna Salsabila, Reni; Dwi Prasetya, Didik; Widyaningtyas, Triyanna; Hirashima, Tsukasa
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol 24 No 2 (2025)
Publisher : LPPM Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v24i2.4598

Abstract

This research aims to address the critical challenge of selecting a text representation method that effectively captures students’ conceptual understanding for clustering purposes. Traditional methods, such as Term Frequency-Inverse Document Frequency (TF-IDF), often fail to capture semantic relationships, limiting their effectiveness in clustering complex datasets. This study compares TF-IDF with the advanced Bidirectional Encoder Representations from Transformers (BERT) to determine their suitability in clustering student concept maps for two learning topics: Databases and Cyber Security. The method used applies two clustering algorithms: K-Means and its improved variant, K-Means++, which enhances centroid initialization for better stability and clustering quality. The datasets consist of concept maps from 27 students for each topic, including 1,206 concepts and 616 propositions for Databases, as well as 2,564 concepts and 1,282 propositions for Cyber Security. Evaluation is conducted using two metrics Davies-Bouldin Index (DBI) and Silhouette Score, to assess the compactness and separability of the clusters. The result of this study is that BERT consistently outperforms TF-IDF, producing lower DBI values and higher Silhouette Scores across all clusters (k= 2 - k=10). Combining BERT with K-Means++ yields the most compact and well-separated clusters, while TF-IDF results in overlapping and less-defined clusters. The research concludes that BERT is a superior text representation method for clustering, offering significant advantages in capturing semantic context and enabling educators to identify student misconceptions and improve learning strategies.
Comparison of Text Representation for Clustering Student Concept Maps Fatrisna Salsabila, Reni; Dwi Prasetya, Didik; Widyaningtyas, Triyanna; Hirashima, Tsukasa
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 24 No. 2 (2025)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v24i2.4598

Abstract

This research aims to address the critical challenge of selecting a text representation method that effectively captures students’ conceptual understanding for clustering purposes. Traditional methods, such as Term Frequency-Inverse Document Frequency (TF-IDF), often fail to capture semantic relationships, limiting their effectiveness in clustering complex datasets. This study compares TF-IDF with the advanced Bidirectional Encoder Representations from Transformers (BERT) to determine their suitability in clustering student concept maps for two learning topics: Databases and Cyber Security. The method used applies two clustering algorithms: K-Means and its improved variant, K-Means++, which enhances centroid initialization for better stability and clustering quality. The datasets consist of concept maps from 27 students for each topic, including 1,206 concepts and 616 propositions for Databases, as well as 2,564 concepts and 1,282 propositions for Cyber Security. Evaluation is conducted using two metrics Davies-Bouldin Index (DBI) and Silhouette Score, to assess the compactness and separability of the clusters. The result of this study is that BERT consistently outperforms TF-IDF, producing lower DBI values and higher Silhouette Scores across all clusters (k= 2 - k=10). Combining BERT with K-Means++ yields the most compact and well-separated clusters, while TF-IDF results in overlapping and less-defined clusters. The research concludes that BERT is a superior text representation method for clustering, offering significant advantages in capturing semantic context and enabling educators to identify student misconceptions and improve learning strategies.