Claim Missing Document
Check
Articles

Application of SVM and Naive Bayes with PSO for the Classification of Saloka Amusement Park Reviews Putri, Indira Alifia; Umam, Khothibul; Handayani, Maya Rini; Mustofa, Hery
Journal La Multiapp Vol. 6 No. 6 (2025): Journal La Multiapp
Publisher : Newinera Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37899/journallamultiapp.v6i6.2505

Abstract

Visitor opinions on tourist destinations can be evaluated through sentiment analysis based on textual reviews. This study aimed to compare the performance of Support Vector Machine (SVM) and Naive Bayes (NB) algorithms in classifying visitor sentiments toward reviews of Saloka Theme Park, while also assessing the impact of parameter optimization using Particle Swarm Optimization (PSO). A total of 740 reviews were collected from the Traveloka platform and underwent text preprocessing. The optimization process targeted key parameters of each algorithm to improve the F1-score. Experimental results showed that the unoptimized SVM achieved an accuracy of 89 percent, while NB reached 86 percent. After applying PSO, SVM's accuracy dropped to 84 percent, whereas NB improved to 85 percent with more balanced classification across sentiment classes. These results recommend the integration of Naive Bayes with Particle Swarm Optimization as a potential approach for sentiment classification of tourism reviews, particularly in the case study of Saloka Theme Park.
Perbandingan Model SpaCy dan BERT untuk Persebaran Penggemar di Platform X (Twitter) Rahmadani, Nurul; Umam, Khothibul; Dwi Yuniarti, Wenty; Rini Handayani, Maya
Jurnal Algoritma Vol 22 No 2 (2025): Jurnal Algoritma
Publisher : Institut Teknologi Garut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33364/algoritma/v.22-2.2310

Abstract

This study was conducted to compare the performance of the SpaCy Named Entity Recognition (NER) model and the Bidirectional Encoder Representation from Transformers (BERT) model in identifying the distribution of Bernadya fans based on the mention of Geo-Political Entity (GPE) locations. The dataset used was collected from X users' tweets using a scraping method with Python and will be analyzed on both NER models. The SpaCy NER model will be built from scratch with manual annotation, while the BERT model will be built using the transforms approach. From the evaluation results, the SpaCy model achieved a precision of 1.00, a recall of 0.92, and an F1-score of 0.96 on the training data, as well as a recall of 0.98 and an F1-score of 0.99 on the test data. The BERT model recorded a precision of 1.00, a recall of 0.95 (training), and 1.00 (testing), with an F1-score of 0.98 and 1.00. The Spacy model can recognize more than two entities well in one test sentence. However, when tested with the entire dataset, it cannot consistently recognize GPE entities. Conversely, the BERT model is better at recognizing GPE entities, with 4 GPE entities identified, including: Karanganyar, Indonesia, Mongolia, and Bandung as regions capable of identifying GPE entities with the most mentions. Therefore, in this study, the BERT model is better at recognizing GPE entities from the dataset used.
Comparative Study of SVM and Decision Tree Algorithms on the Effect of SMOTE Technique on LinkAja Application Faruq, Muhammad Kholfan; Umam, Khothibul; Mustofa, Mokhamad Iklil; Mahfudh, Adzhal Arwani
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.9806

Abstract

The widespread adoption of digital wallets like LinkAja in Indonesia has led to a surge in user-generated reviews, which are valuable for assessing service quality. This study compares the classification performance of Support Vector Machine (SVM) and Decision Tree algorithms on user reviews from the LinkAja application. 7.000 reviews were gathered through web scraping and processed with standard text cleaning, tokenization, stopword removal, and stemming, resulting in 6,261 usable entries. These were divided into training and testing sets in a 70:30 ratio. The performance of each algorithm was evaluated both before and after the application of Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance. Prior to SMOTE, SVM recorded an accuracy of 77.97%, precision of 0.74, recall of 0.33, and F1 score of 0.45, while Decision Tree reached 72.01% accuracy, 0.50 precision, 0.62 recall, and 0.55 F1 score. After SMOTE, SVM accuracy slightly improved to 78.29%, with notable increases in recall (0.74) and F1 score (0.60); Decision Tree also saw an accuracy rise to 74.56% but experienced a slight decline in F1 score to 0.52. These findings demonstrate that SVM, particularly when used with SMOTE, offers better overall performance and class balance in classifying reviews with imbalanced sentiment distribution, making it more suitable than Decision Tree for this application.
Opinion Classification on IMDb Reviews Using Naïve Bayes Algorithm Putri, Amiliya; Umam, Khothibul; Mustofa, Hery
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.9831

Abstract

This study aims to classify user opinions on IMDb movie reviews using the Multinomial Naïve Bayes algorithm. The dataset consists of 50,000 reviews, evenly distributed between 25,000 positive and 25,000 negative reviews. The preprocessing stage includes cleaning, case folding, stopword removal, tokenization, and lemmatization using the NLTK library. Text features are represented through the TF-IDF method to capture the significance of each word in the documents. The Multinomial Naïve Bayes model was trained using the hold-out validation technique with an 80:20 split for training and testing data. Hyperparameter tuning of α (Laplace smoothing) was conducted to enhance model stability and accuracy. The model’s performance was evaluated using accuracy, precision, recall, and F1-score metrics, supported by a confusion matrix visualization. The results show that the model achieved an accuracy of 87%, with precision of 87.9%, recall of 85.4%, and an F1-score of 86.6%. In comparison, Logistic Regression as a baseline algorithm achieved an accuracy of 91%. Nevertheless, the Naïve Bayes algorithm remains competitive and computationally efficient for large-scale text data, making it highly relevant for sentiment analysis of movie reviews.
Comparative Analysis of Penetration Testing Frameworks: OWASP, PTES, and NIST SP 800-115 for Detecting Web Application Vulnerabilities Imtias, Muhamad Bunan; Umam, Khothibul; Mustofa, Hery; Subowo, Moh Hadi
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.9846

Abstract

Web application security faces increasingly complex challenges as digital architectures evolve, necessitating the selection of appropriate and effective penetration testing methods. This study presents a comparative analysis of the OWASP Testing Guide, PTES, and NIST SP 800-115 frameworks in detecting web application vulnerabilities. Through experiments on DVWA and OWASP Juice Shop, the frameworks were evaluated based on detection speed, vulnerability count, and severity. The results highlight a clear trade-off: OWASP proved the most efficient (85 minutes average, 59 total vulnerabilities), making it ideal for rapid assessments. PTES demonstrated the most comprehensive technical depth (63 vulnerabilities, highest severity) but required the most time, while NIST SP 800-115 (49 vulnerabilities) excelled in compliance and risk management integration. The study recommends selecting OWASP for efficiency, PTES for deep technical audits, and NIST for regulatory alignment.
Analisis Performa Metode Machine Learning dalam Mengidentifikasi Penyebab Ulasan Rating Satu Aplikasi MyBluebird Azziizah, Almira Farradinda; Mustofa, Hery; Umam, Khothibul; Handayani, Maya Rini
Jurnal Ilmiah Global Education Vol. 6 No. 4 (2025): JURNAL ILMIAH GLOBAL EDUCATION
Publisher : LPPM Institut Pendidikan Nusantara Global

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.55681/jige.v6i4.4704

Abstract

This study addresses the increasing prevalence of negative user reviews for the MyBluebird ride-hailing application, focusing on the identification and classification of the main causes of one-star ratings. The research aims to compare the effectiveness of Support Vector Machine, Random Forest, and Naïve Bayes algorithms in classifying user complaints. Employing a quantitative experimental approach, the study utilizes a dataset of 1,399 one-star reviews collected purposively from Google Play Store. Data preprocessing includes cleaning, tokenization, and feature extraction using TF-IDF. The classification models are evaluated using accuracy, precision, recall, and F1-score metrics. Results indicate that Random Forest achieves the highest accuracy (90%), outperforming the other algorithms, with bugs/errors as the most frequent complaint, followed by driver performance, other issues, and price. The study concludes that machine learning-based classification can effectively map user dissatisfaction, though data imbalance remains a limitation. Future research should apply data balancing techniques and expand the dataset for broader generalization. Practical implications suggest that developers can utilize automated classification to improve service quality and address user needs more efficient.
Public Opinion on The MBG Program: Comparative Evaluation of InSet and VADER Lexicon Labeling Using SVM on Platform X Zakiyah, Na'ilah Puti; Umam, Khothibul; Mahfudh, Adzhal Arwani
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.9978

Abstract

This study aims to examine public opinion regarding the MBG program on platform X by utilizing the Support Vector Machine (SVM) algorithm using two sentiment labeling methods, namely InSet Lexicon and VADER Lexicon. The data was then divided into 70% for training and 30% for testing, and extracted using Term Frequency–Inverse Document Frequency (TF-IDF) to convert the text into numerical representations. The SVM model was trained on both labeled data sets to compare their performance based on evaluation metrics such as accuracy, precision, recall, and F1 score. The results show that labeling with VADER produces a more dominant number of neutral sentiments, while InSet Lexicon produces a more balanced distribution between positive, negative, and neutral sentiments. At the modeling stage, SVM with InSet labels achieved an accuracy of 80.10%, with precision of 0.81, recall of 0.80, and an F1 score of 0.79. Meanwhile, SVM with VADER labels achieved an accuracy of 93.83%, precision of 0.94, recall of 0.94, and an F1 score of 0.93. Although VADER showed higher accuracy values, InSet Lexicon is considered more efficient and relevant for sentiment analysis in Indonesia because it is capable of producing more balanced and contextual classifications.
Evaluasi Kinerja Random Forest, SVM, dan Transformer untuk Klasifikasi Komentar Judi Online di Youtube Arroyan, Devina; Handayani, Maya Rini; Umam, Khothibul; Ulinuha, Masy Ari
JUSTIN (Jurnal Sistem dan Teknologi Informasi) Vol 14, No 1 (2026)
Publisher : Jurusan Informatika Universitas Tanjungpura

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26418/justin.v14i1.94059

Abstract

Maraknya komentar bermuatan promosi judi online di platform YouTube menimbulkan kekhawatiran terhadap kenyamanan dan keamanan digital, khususnya bagi pengguna muda. Penelitian ini bertujuan mengevaluasi kinerja tiga metode klasifikasi teks dalam mendeteksi komentar judi online berbahasa Indonesia, yaitu Transformer (IndoBERT), Support Vector Machine (SVM), dan Random Forest. Dataset yang digunakan terdiri dari 5.000 komentar hasil ekstraksi dari beberapa video YouTube yang kemudian melalui proses pelabelan manual dan prapemrosesan teks. Proses evaluasi dilakukan menggunakan skema pembagian data latih–uji sebesar 80:20 dengan metrik akurasi, precision, recall, dan F1-score sebagai ukuran performa. Hasil menunjukkan bahwa IndoBERT memberikan performa terbaik dengan akurasi 98,70% dan F1-score 0,98, lebih tinggi dibandingkan SVM (88,85%) dan Random Forest (79,62%). Studi ini memiliki keterbatasan pada jumlah dan keragaman dataset yang masih terbatas, sehingga performa model berpotensi berubah ketika diterapkan pada skala data yang lebih luas atau domain komentar lain. Penelitian lanjutan dapat mempertimbangkan penambahan data dari berbagai kategori konten YouTube serta penerapan teknik augmentasi data untuk meningkatkan generalisasi model.