Early detection of infectious diseases plays a crucial role in minimizing their spread and enabling timely intervention. In the digital era, social media has emerged as a valuable source of real-time health information, where individuals often share self-reported symptoms that can serve as early warning signals for disease outbreaks. However, textual data from social media is typically unstructured, noisy, and contextually diverse, posing challenges for conventional text classification methods. This study proposes a hybrid model combining Term Frequency–Inverse Document Frequency (TF-IDF) feature representation with a Graph Attention Network (GAT) to enhance the early detection of Monkeypox-related self-reported symptoms on Indonesian social media. A dataset of 3,200 tweets was collected through Tweet-Harvest and subsequently preprocessed and manually labeled, producing a balanced distribution between positive (51%) and negative (49%) samples. TF-IDF vectors were used to construct a document similarity graph via the k-Nearest Neighbors (k-NN) method with cosine similarity, enabling GAT to leverage both textual and relational information across posts. The model’s performance was evaluated using accuracy, precision, recall, and macro-F1, with macro-F1 serving as the primary indicator. The proposed TF-IDF + GAT model achieved 93.07% accuracy and a macro-F1 score of 93.06%, outperforming baseline classifiers such as CNN (92.16% macro-F1), SVM (85.73%), Logistic Regression (84.89%). These findings demonstrate the effectiveness of integrating classical text representations with graph-based neural architectures for improving social media based disease surveillance and supporting early epidemic response strategies.
Copyrights © 2025