Mental health has become a crucial global issue, with increasing numbers of individuals expressing their psychological conditions openly on social media platforms. This study aims to classify tweets related to mental health, specifically depression, using a combination of Support Vector Machine (SVM), Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction, and Chi-Square feature selection techniques. Although this approach has been widely applied in domains such as product and movie reviews, its application in the mental health context remains limited. The main challenge lies in capturing implicit psychological nuances and indirect expressions frequently present in platforms like Twitter, unlike the explicit text in other domains. Moreover, most prior studies have not integrated comprehensive preprocessing stages including lemmatization, stopword removal, and duplicate elimination for mental health data on social media. This research employs a dataset of 26,448 tweets derived from Kaggle and self-crawled data. The best result was achieved using an SVM with an RBF kernel without Chi-Square feature selection, yielding an accuracy of 74.93%. The study demonstrates that a comprehensive preprocessing pipeline can enhance classification performance. However, the model still struggles with sarcastic or ironic contexts. Future research is recommended to adopt deep learning approaches such as BERT or LSTM to capture more complex textual contexts.
Copyrights © 2025