Claim Missing Document
Check
Articles

Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques Dina Wulan Yekti rahayu; Khothibul Umam; Maya Rini Handayani
Journal of Applied Informatics and Computing Vol. 9 No. 3 (2025): June 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i3.9584

Abstract

This study explores the performance of five sentiment classification algorithms—Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest—on an imbalanced sentiment dataset, with the SMOTE technique applied as a comparison. The research follows the Knowledge Discovery in Databases (KDD) framework, which includes data selection, preprocessing, transformation, data mining, and evaluation. The evaluation uses metrics such as accuracy, precision, recall, F1-score, and macro average F1-score. Initial results show that all five algorithms performed fairly well even without using a balancing technique, with Naïve Bayes achieving the highest F1-score of 0.84 and recall of 0.81. After applying SMOTE, only small improvements were observed in some models, such as Random Forest (F1-score increased from 0.81 to 0.85), while other models like Naïve Bayes experienced a decrease in performance, dropping to 0.77. This suggests that the effect of balancing techniques like SMOTE can vary depending on the algorithm. Thus, this study provides empirical contributions that highlight the importance of selecting appropriate approaches and the need for a deep understanding of each algorithm's behavior in the context of imbalanced data. Researchers are encouraged to carefully consider these aspects when designing experiments and interpreting results.
Critical discourse analysis of netizens’ comments on the 2024–2029 presidential and vice-presidential debate based on a corpus Handayani, Maya Rini; Rahmi, Amelia; Hilmi, Mustofa; Chairullah, Dimas
Islamic Communication Journal Vol. 10 No. 1 (2025)
Publisher : Fakultas Dakwah dan Komunikasi Universitas Islam Negeri Walisongo Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21580/icj.2025.10.1.26055

Abstract

Presidential elections constitute a cornerstone of democratic governance, symbolizing the legitimacy and trust vested in government by the populace. The 2024 Indonesian presidential election provided a platform for public participation in shaping the nation's policy trajectory. Recognizing the pivotal role of social media, particularly YouTube, in contemporary campaigns, candidates leveraged these platforms to facilitate public discourse and gauge public sentiment towards presidential and vice-presidential candidates. This study employed a corpus-based approach to analyze netizens' comments during the debates. Through a qualitative methodology, 79,378 comments were collected from eight prominent YouTube channels: KPU RI, CNN Indonesia, INews, RCTI, TVRI, TVOne, Kompas TV, and Metro TV. The analysis draws upon Teun A. van Dijk's Critical Discourse Analysis (CDA) and Stewart L. Tubbs and Sylvia Moss' communication styles. Findings indicate that Anies Baswedan and Muhaimin Iskandar projected an image of academic prowess, intelligence, law-abidingness, and religiosity, while adopting a controlling communication style. In contrast, Prabowo and Gibran were associated with discourses of sincerity, millennial appeal, continuity of Jokowi's programs, and impressiveness. However, this pair exhibited a relinquishing communication style, characterized by passivity and a deferential approach. Finally, Ganjar Pranowo and Mahfud MD presented themselves as visionary, credible, and possessing integrity, employing an equalitarian communication style marked by respect and dialogue. ***** Pemilihan presiden (pilpres) adalah elemen vital dalam demokrasi yang mewakili legitimasi dan kepercayaan rakyat kepada pemerintah. Melalui pilpres Indonesia 2024, masyarakat ikut menentukan kebijakan negara. Para kandidat menyadari pentingnya media sosial, terutama YouTube, dalam kampanye mereka, yang memungkinkan publik untuk menyampaikan pendapat dan mencerminkan sentimen terhadap calon presiden dan wakil presiden. Penelitian ini menggunakan dataset berbentuk korpus yang berisi komentar warganet saat debat berlangsung. Menggunakan metodologi kualitatif, penelitian ini mengumpulkan 79.378 komentar dari channel KPU RI, CNN Indonesia, INews, RCTI, TVRI, TVOne, Kompas TV, dan Metro TV. Kami mengimplementasikan teori Teun A van Djik untuk Analisis Wacana Kritis (AWK), Stewart L Tubbs dan Sylvia Moss untuk gaya komunikasi. Hasil studi ini, Anies Baswedan dan Muhaimin Iskandar menunjukkan pribadi yang akademis, cerdas, taat hukum, dan religius serta bergaya controlling style. Sedangkan ihlas, milenial, melanjutkan program Jokowi, dan impresif merupakan wacana yang terlihat pada Prabowo dan Gibran, namun pasangan ini mengadopsi gaya relinquishing style, yaitu bersikap lebih pasif dan menyerahkan inisiatif kepada lawan debat. Paslon terakhir, Ganjar Pranowo dan Mahfud MD dalam menyampaikan pendapatnya terlihat visioner, kredibel, dan memiliki integritas, selain itu pasangan ini menerapkan gaya equalitarian style, yang mencerminkan sikap saling menghargai dan dialogis dalam menyampaikan pendapat.
Deteksi Dark patterns Biaya Layanan E-commerce Berdasarkan Perspektif Konsumen Menggunakan Algoritma Support Vector Machine Salmalina, Divana Taricha; Umam, Khothibul; Handayani, Maya Rini
Jurnal Sistem Komputer dan Informatika (JSON) Vol 6, No 4 (2025): Juni 2025
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/json.v6i4.8563

Abstract

Perkembangan industri e-commerce di Indonesia belakangan ini dibayangkan pada fenomena meningkatnya keluhan konsumen terkait kebijakan biaya layanan yang dinilai kurang transparan, termasuk indikasi adanya praktik pola gelap . Penelitian ini bertujuan mengkaji persepsi konsumen terhadap isu tersebut melalui pendekatan analisis sentimen berbasis machine learning dan deteksi pola manipulatif. Data penelitian diperoleh dari ulasan pengguna di platform media sosial X yang kemudian diproses melalui serangkaian tahapan text mining meliputi pembersihan data, tokenisasi, stopword removal , dan stemming . Analisis sentimen menggunakan algoritma Support Vector Machine (SVM) menunjukkan hasil yang signifikan, dimana 55-78% ulasan di platform ketiga e-commerce (Shopee, Tokopedia, Lazada) tergolong negatif. Analisis TF-IDF mengidentifikasi kata kunci seperti "biaya", "layan" (layanan), dan "mahal" sebagai istilah paling dominan dalam ulasan negatif. Model SVM menunjukkan kinerja yang cukup baik dengan akurasi mencapai 87% dalam mengklasifikasikan sentimen negatif. Lebih lanjut, analisis tematik terhadap ulasan negatif berhasil mengidentifikasi indikasi pola gelap , khususnya dalam kategori biaya tersembunyi (biaya tersembunyi) dan menyelinap ke keranjang (penambahan produk tanpa disadari) yang muncul secara konsisten di semua platform. Temuan ini tidak hanya menegaskan adanya pola manipulatif yang berulang dalam industri e-commerce Indonesia, tetapi juga menegaskan urgensi bagi para pelaku industri untuk meningkatkan transparansi dalam kebijakan biaya. Secara praktis, hasil penelitian ini dapat menjadi bahan pertimbangan penting bagi regulator dalam merumuskan kebijakan perlindungan konsumen di era digital yang lebih komprehensif.
Identifikasi Polaritas Sikap Pengguna Aplikasi X terhadap Coretax di Indonesia Menggunakan Algoritma Naïve Bayes Prasilda, Dina Rahma; Yuniarti, Wenty Dwi; Handayani, Maya Rini; Umam, Khothibul
JURIKOM (Jurnal Riset Komputer) Vol 12, No 3 (2025): Juni 2025
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/jurikom.v12i3.8548

Abstract

The Core Tax Administration System (Coretax) was launched by the Directorate General of Taxes (DGT) in January 2025 as a technology-based integrated tax system. While its initial goal was to improve tax efficiency and compliance, Coretax faced technical challenges, including system errors, slow processing speed, and criticism from the public. The main platform used to address these challenges is the X app (formerly known as Twitter). This research aims to understand the public's views and responses to Coretax's services by analyzing user sentiment patterns seen on social media. The research identifies the polarity of user attitudes by utilizing natural language processing (NLP) and Naïve Bayes algorithms, applied to a dataset of 1,628 tweets collected between January and March 2025. The analyzed data reflects a wide range of public reactions that include both positive and negative opinions towards the Coretax implementation, both in terms of functionality and ease of use. The results show that the model has an accuracy rate of 93.07%, a precision value of 95%, a recall value of 96%, and an F1-Score value of 96%. The results of this study are expected to be able to provide precise mapping related to changes in public opinion towards Coretax, so that it can be a valuable source of information for application developers, policy makers in the field of taxation, and analysis in the technology sector in responding to the needs and expectations of society in the digital era.
Digital Forensic Chatbot Using DeepSeek LLM and NER for Automated Electronic Evidence Investigation Qonita, Nuurun Najmi; Handayani, Maya Rini; Umam, Khothibul
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 3 (2025): JUTIF Volume 6, Number 3, Juni 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.3.4593

Abstract

The growing complexity of cybercrime necessitates efficient and accurate digital forensic tools for analyzing electronic evidence. This research presents an intelligent digital forensic chatbot powered by DeepSeek Large Language Model (LLM) and Named Entity Recognition (NER), designed to automate the analysis of various digital evidence, including system logs, emails, and image metadata. The chatbot is deployed on the Telegram platform, providing real-time interaction with investigators. The metric results show that the chatbot achieves a precision of 83.52%, a recall of 88.03%, and an F1-score of 85.71%. These results demonstrate the chatbot's effectiveness in accurately detecting forensic entities, significantly improving investigation efficiency. This study contributes to digital forensics by integrating LLM and NER for enhanced evidence analysis, offering a scalable and adaptive solution for automated cybercrime investigations. Future research may explore integrating anomaly detection and blockchain-based evidence integrity.
Implementation of Enhanced Confix Stripping Stemming and Chi-Squared Feature Selection on Classification UIN Walisongo Website with Naïve Bayes Classifier Muhadzib Al-Faruq, Muhammad Naufal; Yuniarti, Wenty Dwi; Handayani, Maya Rini; Umam, Khotibul
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 3 (2025): JUTIF Volume 6, Number 3, Juni 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.3.4670

Abstract

Academic news classification on university websites remains a challenge due to the growing volume of content and lack of efficient categorization systems. At UIN Walisongo Semarang, this problem hinders students, faculty, and the public from easily accessing relevant information. This study aims to develop an automated academic news classification system to address this issue. We applied a Naïve Bayes Classifier model, enhanced with Term Frequency weighting, the Enhanced Confix Stripping Stemmer for Indonesian language preprocessing, and Chi-Squared feature selection to identify the most informative terms. The dataset consisted of 880 academic news articles from UIN Walisongo’s website, split into 704 training and 176 testing documents. The system achieved 95% accuracy on the test set. To evaluate generalizability, we used a separate evaluation set of 12 new articles, obtaining 83.3% accuracy. The preprocessing stage played a vital role in reducing morphological complexity, while Chi-Squared scoring improved the relevance of selected features. This research highlights the importance of robust text classification techniques in academic information systems, particularly in Indonesian language contexts where language morphology poses unique challenges. The proposed model demonstrates strong performance, scalability, and potential for integration into academic portals to improve information retrieval. This study contributes significantly to the field of Natural Language Processing and applied machine learning in academic settings, especially for Indonesian-language content. It provides an effective solution for automated academic content management in institutional information systems.
Dinamika Opini Publik Terkait Quarter Life Crisis Pada Media Sosisal X Menggunakan Support Vector Machine Septyorini, Talitha Dwi; Umam, Khothibul; Handayani, Maya Rini
Jurnal Informatika: Jurnal Pengembangan IT Vol 10, No 3 (2025)
Publisher : Politeknik Harapan Bersama

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30591/jpit.v10i3.8648

Abstract

This study aims to analyze the dynamics of public opinion related to quarter life crisis on platform X through a sentiment analysis approach based on machine learning Support Vector Machine (SVM) algorithm is used to classify positive and negative sentiments from text data. A total of 6.312 tweets were collected with the keyword “quarter life crisis” from January 2024 to January 2025. The data was then processed through the stages of text cleaning, tokenization, stopword removal, stemming, and lexicon-based sentiment labeling. The classification process is carried out using SVM with a data division of 80% training and 20% test. The results showed an accuracy of 81.57% with a sentiment distribution of 59.3% negative and 40.7% positive. Implementation was done on Google Colab platform with evaluation using confusion matrix and classification report. The fingdings prove the effectiveness of SVM in analyzing psychosocial phenomena on social media and provide an empirical basis for the development of digital data-based mental health interventions. The machine learning pipeline optimized in this study can be used as a reference for other studies in analyzing psychological phenomena on social media
Sentiment Analysis of User Reviews on the Game GTA V Using Support Vector Machine Saputra, Adika Kaka; Handayani, Maya Rini; Wibowo, Nur Cahyo Hendro; Umam, Khothibul
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol. 14 No. 3 (2025): JULY
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v14i3.2368

Abstract

This study explores user sentiment toward the game Grand Theft Auto V (GTA V) by analyzing 101,540 user reviews collected from Steam and Kaggle. The reviews were processed using standard text preprocessing techniques including case folding, tokenization, stopword removal, and stemming. The TF-IDF method was used to convert text into numerical vectors, and sentiment classification was conducted using the Support Vector Machine (SVM) algorithm. The model was evaluated with accuracy, precision, recall, and F1-score as performance metrics. Results show that 88.8% of reviews are positive, while 11.2% are negative. The SVM model achieved an accuracy of 94.2% and an F1-score of 94.2%, indicating high reliability. Wordcloud analysis highlights key aspects valued by users such as graphics, story, and gameplay, while negative sentiment is often associated with technical issues like lag and bugs. This study demonstrates the effectiveness of combining TF-IDF and SVM for sentiment classification in the gaming domain, and it offers a scalable approach for understanding public opinion in digital platforms.
User Opinion Mining on the Maxim Application Reviews Using BERT-Base Multilingual Uncased Safitri, Sindy Eka; Yuniarti, Wenty Dwi; Handayani, Maya Rini; Umam, Khothibul
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol. 14 No. 3 (2025): JULY
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v14i3.2391

Abstract

Online transportation applications such as Maxim are increasingly used due to the convenience they offer in ordering services. As usage increases, the number of user reviews also grows, serving as a valuable source of information for evaluating customer satisfaction and service quality. Sentiment analysis of these reviews can help companies understand user perceptions and improve service quality. This study aims to analyze the sentiment of user reviews on the Maxim application using the BERT-Base Multilingual Uncased model. BERT was chosen for its ability to understand sentence context bidirectionally, and it has proven to outperform traditional models such as MultinomialNB and SVM in previous studies, with an accuracy of 75.6%. The dataset used consists of 10,000 user reviews with an imbalanced distribution: 4,000 negative, 2,000 neutral, and 4,000 positive reviews. The data was split into 90% training data (9,000 reviews) and 10% test data (1,000 reviews). From the 9,000 training data, 15% or 1,350 reviews were allocated as validation data, resulting in a final training set of 7,650 reviews. Evaluation results show that BERT is capable of classifying sentiment into three categories positive, neutral, and negative, with an accuracy of 94.7%. The highest F1-score was achieved in the positive class (0.9621), followed by the neutral class (0.9412), and the negative class (0.9246). The confusion matrix shows that most predictions match the actual labels. These findings indicate that BERT is an effective and reliable model for performing sentiment analysis on user reviews of online transportation applications such as Maxim.
Unveiling Public Sentiment on Quarter Life Crisis: A Comparative Performance Evaluation of Support Vector Machine and Naïve Bayes Algorithms on Social Media X Data Septyorini, Talitha Dwi; Umam, Khothibul; Handayani, Maya Rini
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol. 14 No. 3 (2025): JULY
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v14i3.2405

Abstract

Quarter Life Crisis (QLC) is one of the psychological issues experienced by many young adults and is characterized by uncertainty, anxiety, and emotional distress. In the digital era, public opinion about QLC is increasingly expressed through social media, particularly platform X. This study seeks to classify public opinion related to the QLC into positive and negative sentiments by employing two computational classification models, namely Support Vector Machine (SVM) and Naïve Bayes (NB). Despite the growing discourse, there has been no study specifically comparing classification algorithms to analyze public sentiment on QLC. Data collection was conducted through crawling techniques on platform X from November 2024 to January 2025, resulting in a total of 1120 tweets. The data underwent preprocessing, lexicon-based sentiment labeling, and TF-IDF word weighting. After preprocessing, classification using SVM and NB was evaluated by accuracy, precision, recall, and F1-score. Results indicate that SVM achieved superior performance with an accuracy of 83%, outperforming NB, which recorded 74%. These outcomes demonstrate that the SVM algorithm demonstrates superior performance in analyzing public sentiment regarding QLC. This research contributes by providing empirical evidence regarding algorithm performance for sentiment analysis in mental health topics, offering recommendations for effective early detection strategies utilizing social media data.