Rizian, Rizailo Akfa
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Enhancing Classification of Self-Reported Monkeypox Symptoms on Social Media Using Term Frequency-Inverse Document Frequency Features and Graph Attention Networks Rizian, Rizailo Akfa; Budiman, Irwan; Faisal, Mohammad Reza; Kartini, Dwi; Indriani, Fatma; Ahmad, Umar Ali
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 6 (2025): JUTIF Volume 6, Number 6, Desember 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.6.5482

Abstract

Early detection of infectious diseases plays a crucial role in minimizing their spread and enabling timely intervention. In the digital era, social media has emerged as a valuable source of real-time health information, where individuals often share self-reported symptoms that can serve as early warning signals for disease outbreaks. However, textual data from social media is typically unstructured, noisy, and contextually diverse, posing challenges for conventional text classification methods. This study proposes a hybrid model combining Term Frequency–Inverse Document Frequency (TF-IDF) feature representation with a Graph Attention Network (GAT) to enhance the early detection of Monkeypox-related self-reported symptoms on Indonesian social media. A dataset of 3,200 tweets was collected through Tweet-Harvest and subsequently preprocessed and manually labeled, producing a balanced distribution between positive (51%) and negative (49%) samples. TF-IDF vectors were used to construct a document similarity graph via the k-Nearest Neighbors (k-NN) method with cosine similarity, enabling GAT to leverage both textual and relational information across posts. The model’s performance was evaluated using accuracy, precision, recall, and macro-F1, with macro-F1 serving as the primary indicator. The proposed TF-IDF + GAT model achieved 93.07% accuracy and a macro-F1 score of 93.06%, outperforming baseline classifiers such as CNN (92.16% macro-F1), SVM (85.73%), Logistic Regression (84.89%). These findings demonstrate the effectiveness of integrating classical text representations with graph-based neural architectures for improving social media based disease surveillance and supporting early epidemic response strategies.