Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika

Automatic Categorization of Mental Health Frame in Indonesian X (Twitter) Text using Classification and Topic Detection Techniques Setio Basuki; Rizky Indrabayu; Nico Ardia Effendy
Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika Vol. 10 No. 2 (2024): Oktober 2024
Publisher : Universitas Muhammadiyah Surakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/khif.v10i2.3328

Abstract

This paper aims to develop a machine learning model to detect mental health frames in Indonesian-language tweets on the X (Twitter) platform. This research is motivated by the gap in automatically detecting mental health frames, despite the importance of mental health issues in Indonesia. This paper addresses the problem by applying classification and topic detection methods across various mental health frames through multiple stages. First, this paper examines various mental health frames, resulting in 7 main labels: Awareness, Classification, Feelings and Problematization, Accessibility and Funding, Stigma, Service, Youth, and an additional label named Others. Second, it focuses on constructing a dataset of Indonesian tweets, totaling 29,068 data, by filtering tweets using the keywords "mental health" and "kesehatan mental". Third, this paper conducts data preprocessing and manual labeling of a random selection of 3,828 tweets, chosen due to the impracticality of labeling all data. Finally, the fourth stage involves conducting classification experiments using classical text features, non-contextual and contextual word embeddings, and performing topic detection experiments with three different algorithms. The experiments show that the BERT-based method achieved the highest accuracy, with 81% in the 'Others' vs. 'non-Others' classification, 80% in the seven main label classifications, and 92% in the seven main labels classification when using GPT-4-powered data augmentation. Topic detection experiments indicate that the Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) algorithms are more effective than the Hierarchical Dirichlet Process (HDP) in generating relevant keywords representing the characteristics of each main label.