This study compares the performance of Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) with BERT embedding for classifying users’ digital literacy levels from textual digital footprints, dataset of 1,500 Indonesian-language texts from platform X was annotated by three experts into low, medium, and high literacy categories. After text preprocessing, TF-IDF features were applied to SVM and BERT tokenization to LSTM. Models were evaluated using 5-Fold Cross-Validation to ensure reliability. Results show that LSTM-BERT achieved the highest performance (F1-Score = 73.8%) compared to SVM (70.50%), with confusion-matrix analysis indicating better accuracy in detecting high-literacy texts. These findings confirm that contextual linguistic patterns effectively represent digital literacy levels and highlight the potential of deep-learning approaches for scalable, objective, and automated literacy assessment based on text data.
Copyrights © 2026