Claim Missing Document
Check
Articles

Comparison of Naive Bayes and SVM Methods for Identifying Anxiety Based on Social Media Nugraha, Endri Rizki; Maharani, Warih
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6506

Abstract

This research aims to detect anxiety patterns from social media posts using Naive Bayes (NB) and Support Vector Machine (SVM) algorithms. Tweets are extracted using Data Crawling techniques, then continued their way into labeling using Depression Anxiety Stress Scale (DASS-42) questionnaire along with Random Oversampler to balance out the unbalanced dataset and NB and SVM were chosen for their effectiveness in text sentiment classification. This study integrates textual features obtained from the Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BoW) methods. The study compares the performance of these algorithms in detecting anxiety using datasets from the X platform. The comparison aims to identify the advantages and limitations of each method in handling textual sentiment data. This research aims to analyze sentiment data by calculating accuracy, recall, and F1-score to determine the most optimal performance outcome. The results indicate that the SVM with TF-IDF feature extraction achieved the highest accuracy of 72% and an average F1-Score of 61%, while the NB with BoW achieved 56% accuracy and an average F1-Score of 49%. These findings highlight the effectiveness of combining SVM and TF-IDF features which improve model effectiveness with SVM producing the best overall result in identifying anxiety from social media data.
Comparison of Random Forest and Decision Tree for Depression Detection Using Interaction Patterns Fathin, Felicia Talitha; Maharani, Warih
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6660

Abstract

This research focuses on evaluating the efficacy of Random Forest and Decision Tree, in detecting depression on tweets and interaction patterns on X social media. Depression as a global health problem often happens because of individuals' online behavior. This study uses data from X social media users in Indonesia who have filled out the DASS-42 questionnaire with an analysis approach that includes crawling data that includes tweets and interactions on X. The purpose of this research is to more accurately and comprehensively identify signs of depression by analyzing the interaction patterns of users on social media platforms through the integration of of several many methods for feature extraction and preprocessing situations.The methods used include data preprocessing, feature combination using TF-IDF, Bag of Words, and Word2Vec and model evaluation utilizing metrics such as Precision, Recall, Accuracy, and F1-score. The findings of this research show that Random Forest performs better than Decision Tree, with a combination of TF-IDF, BoW, Word2Vec and TF-IDF, Word2Vec features obtained an accuracy of 0.60. Although Random Forest is superior, both models are difficult to identify the positive class of depression which can be seen from the relatively low F1-score and recall values. Other factors affecting model performance include lack of data relevance, low interaction rate, and limited feature extraction.
Comparison of Random Forest and Decision Tree Methods for Emotion Classification based on Social Media Posts Tsaqif, Muhammad Abiyyu; Maharani, Warih
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6677

Abstract

Social media platforms like X (formerly Twitter) have become essential for expressing emotions and opinions, making emotion classification a critical task with applications in mental health, public sentiment monitoring, and customer feedback analysis. This study compares Random Forest and Decision Tree algorithms for classifying emotions such as joy, sadness, anger, and fear which are from social media posts. Data collection involved crawling tweets and manual labeling. Preprocessing included tokenization, stemming, and stopword removal, with feature extraction using TF-IDF and Bag of Words. Experimental scenarios tested data split ratios, resampling for class balance, and parameter tuning. Decision Tree parameters included criterion (gini, entropy), max depth (none, fixed values), min samples split (2, 5), and min samples leaf (1, 2). Random Forest parameters tuned were n_estimators (100–400), max depth (none, fixed values), min samples split (2, 5, 10), and min samples leaf (1, 2). Results showed Random Forest achieving a maximum accuracy of 76.17%, outperforming Decision Tree’s 72.62%. The combination of TF-IDF and Bag of Words delivered the highest accuracy for both models. This study underscores the importance of preprocessing, balanced datasets, and parameter optimization for effective emotion classification. The findings offer insights into advancing sentiment analysis and natural language processing, enabling practical applications in public sentiment tracking, customer experience enhancement, and crisis management.
Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset Ramadhan, Nur Ghaniaviyanto; Maharani, Warih; Gozali, Alfian Akbar; Adiwijaya, Adiwijaya
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.798

Abstract

Class imbalance is a significant challenge in machine learning classification tasks because it often causes models to be biased toward the majority class, resulting in poor detection of minority classes. This study proposes a novel enhancement to the Synthetic Minority Over-sampling Technique (SMOTE) by incorporating Euclidean distance-based feature weighting, called Weighted SMOTE. The key idea is to improve the quality of synthetic minority samples by calculating feature importance using a Random Forest model and assigning higher weights to the most relevant features. The objective of this research is to generate more representative synthetic data, reduce model bias, and increase predictive accuracy on highly imbalanced datasets. Experiments were conducted on four benchmark datasets from the KEEL Repository with imbalance ratios ranging from 0.013 to 0.081. The proposed Weighted SMOTE combined with an ensemble voting classifier (Random Forest, AdaBoost, and XGBoost) demonstrated significant improvements compared to standard SMOTE and models without resampling. For example, on the Zoo-3 dataset, the Balanced Accuracy Score (BAS) increased from 75% to 90%, while the F1-score improved from 48% to 94%. On the Cleveland-0_vs_4 dataset, precision improved from 83% to 91% and recall remained high at 99%. Statistical testing using the Wilcoxon signed-rank test confirmed these improvements with p-values 0.05 for key metrics. The findings show that the proposed method effectively balances sensitivity and precision, generates more meaningful synthetic samples, and reduces the risk of overfitting compared to conventional oversampling. The novelty of this work lies in integrating Euclidean-based feature weighting into the SMOTE process and validating its performance on multiple domains with varying feature types and imbalance ratios. These results indicate that the proposed Weighted SMOTE approach contributes a practical solution for improving classification performance and model stability on severely imbalanced data.
Peningkatan Kreativitas dan Keterampilan Digital Pemuda Karang Taruna Kampung Karasak Wibowo, Agung Toto; Fahlena, Hilda; Maharani, Warih; Ramadhan, Nur Ghaniaviyanto
Madani : Indonesian Journal of Civil Society Vol. 7 No. 2 (2025): Madani : Agustus 2025
Publisher : Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/madani.v7i2.2832

Abstract

Karasak Village, Ciheulang Village, Ciparay District, Bandung Regency, still faces challenges, including low digital literacy and limited use of information technology to support the development of the village's potential. This condition has implications for the limited ability of the younger generation to produce and distribute creative content that can strengthen local identity and increase village competitiveness in the digital realm. To address these challenges, the community service team implemented a training program targeting Youth Karang Taruna on December 15, 2024. The training materials were comprehensively designed, covering the use of social media, talent modules, storytelling, on-camera communication, video shooting techniques, and music and video editing. The method employed was a combination of theoretical instruction, direct practice, and interactive mentoring, enabling participants to produce digital content products independently. Evaluation of the activity was conducted through the distribution of questionnaires, with the results showing that 94% of participants agreed or strongly agreed with the usefulness of the training. These findings confirm that the activity was effective, well-received by participants, and has the potential to encourage increased digital literacy capacity and creativity of youth in creating content based on local potential, ready for publication on social media.
PEMBANGUNAN MODEL PREDIKSI KEPRIBADIAN BERDASARKAN TWEET DAN KATEGORI KEPRIBADIAN BIG FIVE DENGAN METODE AGGLOMERATIVE HIERARCHICAL CLUSTERING Yusup, Axel Haikal; Maharani, Warih
Telkatika: Jurnal Telekomunikasi Elektro Komputasi & Informatika Vol. 1 No. 1 (2021): Desember 2021
Publisher : Perpustakaan Universitas Telkom

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Media sosial adalah forum tempat pengguna dapat berinteraksi dengan pengguna lain dan berbagi informasi melalui komunitas dan jejaring sosial. Banyaknya unggahan dari milyaran pengguna media sosial menjadi sumber data untuk mengekstrak dan membuat informasi baru. Penelitian dimulai dengan membagikan formulir kesediaan dan kuesioner untuk mendapatkan persetujuan dari responden yang menggunakan bahasa Indonesia di tweet mereka untuk berpartisipasi dalam penelitian ini. Agglomerative Hierarchical Clustering yang dipilih untuk memperkaya metode prediksi kepribadian seseorang berdasarkan konten di media sosial. Model pada penelitian ini memiliki akurasi 20.1% dengan rata-rata silhouette score -0.23. Keunikan kata yang tinggi dari setiap tweet yang diproses menjadi tantangan bagi model ini untuk menghasilkan performa yang optimal. Model ini dapat menangani data dalam jumlah besar dalam waktu singkat tetapi belum memberikan performa yang lebih optimal dibandingkan kasus serupa yang diselesaikan dengan supervised learning. Kata kunci: media sosial, kepribadian, prediksi, metode, tweet
PERSONALITY DETECTION ON TWITTER USER USING XGBOOST ALGORITHM Adinda Putri Rosyadi; Warih Maharani; Prati Hutari Gani
Jurnal Teknik Informatika (Jutif) Vol. 5 No. 1 (2024): JUTIF Volume 5, Number 1, February 2024
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2024.5.1.1166

Abstract

Personality is a person's identity that is addressed to the public. The Big Five personality is the most commonly used personality model. Detecting a person's personality is still a difficult task today. Because personality detection still often requires humans to fill out lengthy questionnaires to evaluate various personality traits. Therefore, a system that is able to identify personality easily and specifically is needed. By using social media, individuals often express their feelings. Twitter is the most popular social networking platform today. In this research, we use the XGBoost Algorithm, a powerful machine learning method, to create a personality detection system that improves upon existing approaches. Our research aims to determine how well the XGBoost algorithm can recognize Big Five personality features in Twitter users. We achieved encouraging results through in-depth investigation and experimentation. The XGBoost algorithm successfully developed a model that can recognize all Big Five personality trait labels but with different precision, recall and f1-score values. The highest value was obtained for the Extroversion label with a precision of 0.92, recall of 1.00 and f1-score of 0.96. Meanwhile, the lowest value is owned by the Agreeableness label with a precision value of 0.29, recall 0.29, and f1-score of 0.29. This research demonstrates the potential of the XGBoost Algorithm for personality discovery on social media platforms, providing a fast and accurate method to identify distinctive characteristics. Overall, the results of this study demonstrate the efficiency of the XGBoost Algorithm in the context of personality recognition, opening the door for further development in understanding and evaluating human behavior through social media platforms such as Twitter.
DEPRESSION DETECTION ON TWITTER USING GATED RECURRENT UNIT Holle, Alfransis Perugia Bennybeng; Warih Maharani
Jurnal Teknik Informatika (Jutif) Vol. 5 No. 1 (2024): JUTIF Volume 5, Number 1, February 2024
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2024.5.1.1187

Abstract

In the present era, technological advancements have significantly impacted society, particularly in the use of social media. One popular social media platform is Twitter, where people could share moments, thoughts, and statuses. However, since the COVID-19 pandemic, the usage of Twitter increased, and some users began exhibiting symptoms of depression. The condition of depression required a means to channel emotions that could assist users in coping. By employing the GRU method and Word2Vec feature extraction, we developed a depression detection system capable of analyzing users' Twitter posts and identifying potential signs of depression. The dataset used in this research was obtained from 165 participants who agreed to utilize their personal Twitter data and completed a questionnaire based on the Depression Anxiety and Stress Scales-42 (DASS-42). The questionnaire results served as labels that were processed for Word2Vec feature extraction and subsequently fed into the GRU model. The evaluation revealed an accuracy rate of 57.58% and an f1-score of 56.25. By using the bidirectional layer in the model, there is an improvement in precision, recall, and f1-score values.
Analyzing Public Sentiment on the Relocation of Indonesia's Capital to Kalimantan as the Ibu Kota Nusantara Using Logistic Regression Maharani, Warih; Latifa, Agisni Zahra
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 2 (2025): JUTIF Volume 6, Number 2, April 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.2.4230

Abstract

The Ibu Kota Nusantara (IKN) relocation project aims to equalize economic development and reduce the burden on Jakarta, but has elicited mixed reactions from the public, including both support and opposition. Therefore, this study applies machine learning-based sentiment analysis, using Logistic Regression to explore public opinion on the relocation, and leveraging social media data from platform X to gain insights into information, opinions, and public reactions. The Textblob, VADER, and SentiWordNet labeling methods employ a majority vote of the three labels to determine the final label. In order to achieve data balance, SMOTE is employed in this study. Moreover, this study applies a combination of preprocessing, N-gram, and TF-IDF to illuminate the impact of this combination on model performance. The results indicate that the combination of preprocessing Scenario 3 with unigram, bigram, trigram, and TF-IDF feature extraction yields the best performance, achieving a precision of 0.7641, recall of 0.7767, F1-score of 0.7634, and accuracy of 0.7641. This research demonstrates the efficacy of proper preprocessing and feature extraction in enhancing the performance of the Logistic Regression model for sentiment classification, thereby contributing to the analysis of public opinion on IKN policy regarding other issues in the future.
Comparative Analysis of Naive Bayes and SVM for Improved Emotion Classification on Social Media Pratama, Rio Ferdinand Putra; Maharani, Warih
Jurnal Pendidikan Informatika (EDUMATIC) Vol 9 No 1 (2025): Edumatic: Jurnal Pendidikan Informatika
Publisher : Universitas Hamzanwadi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29408/edumatic.v9i1.29087

Abstract

identifying emotions such as happy, angry, sad, and fear. However, Indonesian text processing faces challenges due to language complexity and slang. This research aims to compare Naive Bayes and SVM models, focusing on evaluating the impact of preprocessing, feature extraction, and parameter optimization to improve emotion classification. The dataset was collected from API X using crawling techniques and manually annotated by six annotators. The training process used full and half preprocessing datasets with TF-IDF, BoW, and Word2Vec feature extraction. Naive Bayes and SVM models were evaluated using accuracy, precision, recall, and F1 score. Our results show that full preprocessing improves accuracy, with TF-IDF + BoW achieving 78.01% with SVM and outperforming Naïve Bayes at 75.53%. The results classify emotions into four classes: happy, sad, angry, and fear. This study demonstrates the value of preprocessing and feature selection to deal with slang and complexity in Indonesian texts. These results provide insights for developing optimal emotion classification models and offer applications in sentiment analysis, social media monitoring, and mental health detection.
Co-Authors Adhie Rachmatulloh Sugiono Adinda Putri Rosyadi Adiwijaya Agung Toto Wibowo Aisyiyah, Syarifatul Ajeung Angsaweni Aji Gunadi, Gagah Al Giffari, Muhammad Zacky Aldy Renaldi Alfian Akbar Gozali Algi Erwangga Putra Alif Rahmat Julianda Andre Agasi Simanungkalit Angelina Prima Kurniati Anisa Herdiani annisa Imadi Puti Arianti Primadhani Tirtopangarsa Arie Ardiyanti Suryani Artanto Ageng Kurniawan Asep Aprianto Aziz Alfauzi Aziz Azka Zainur Azifa Bondan Ari Bowo Daud, Hanita Dicky Wahyu Hariyanto Diska Yunita Dita Martha Pratiwi Elroi Yoshua Ersy Ervina Evizal Abdul Kadir Fadhel, Muhammad Fadhil Hadi Fairuz Ahmad Hirzani Fathin, Felicia Talitha Fika Apriliani Fikri Ilham Guntur Prabawa Kusuma Hafshah Haudli Windjatika Hilda Fahlena Holle, Alfransis Perugia Bennybeng I Kadek Bayu Arys Wisnu Kencana I Nyoman Cahyadi Wiratama Ilham Rizki Hidayat Imelda Atastina Intan Nurma Yunita Intan Ramadhani Joshua Tanuraharja Keri Nurhidayat Kurniawan Adina Kusuma Latifa, Agisni Zahra M.Syahrul Mubarok Marcello Rasel Hidayatullah Moch Arif Bijaksana Mohamad Mubarok Mohamad Syahrul Mubarok Muh. Akib A. Yani Muhammad Fadhil Mubaraq Muhammad Husein Adnan Muhammad, Noryanti Niken Dwi Wahyu Cahya Nugraha, Endri Rizki Nugroho, Bayu Seno Nungki Selviandro Nur Ghaniaviyanto Ramadhan Nyoman Rizkha Emillia Pratama, Rio Ferdinand Putra Prati Hutari Gani Prati Hutari Gani Prisla Novia Anggreyani Pursita Kania Praisar Purwanto, Zadosaadi Brahmantio Putri Ester Sumolang Putri Samapa Hutapea Rachdian Habi Yahya Raihan Nugraha Setiawan Rasyad, Gerald Shabran Ria Aniansari Rianda Khusuma Rifki Wijaya Ryan Armiditya Pratama Salsabila Anza Salasa Sendika Panji Anom Serventine Andhara Evhen Setiawan, Abiyyu Daffa Haidar Suyanto Suyanto Tiara Nabila Tri Ayu Syifa'ur Rohmah Trysha Cintantya Dewi Tsaqif, Muhammad Abiyyu Veronikha Effendy Wijaya, Yaffazka Afazillah Yantrisnandra Akbar Maulino Yanuar Ega Ariska Yanuar Firdaus AW Yusup, Axel Haikal