Claim Missing Document
Check
Articles

Sentiment Classification Analysis of Tokopedia Reviews Using TF-IDF, SMOTE, and Traditional Machine Learning Models Barus, Herianta; Fajri, Ika Nur; Pristyanto, Yoga
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10524

Abstract

This study explores sentiment classification on Tokopedia user reviews using TF-IDF for feature extraction and SMOTE to handle class imbalance. From nearly one million raw reviews sourced from Kaggle ("E-Commerce Ratings and Reviews in Bahasa Indonesia"), a final set of 6,477 relevant entries was obtained after rigorous preprocessing, including case folding, noise removal (emojis, URLs, numbers), normalization to KBBI standards, tokenization, stopword removal, and stemming with Sastrawi. The dataset consisted of 5,213 positive and 1,264 negative reviews (80.4% positive). SMOTE balanced the classes to 10,426 reviews with a 1:1 ratio for training. Five traditional machine learning models were evaluated: Naive Bayes, Logistic Regression, Support Vector Machine (SVM), Decision Tree, and Random Forest. Assessments were based on accuracy, precision, recall, F1-score, ROC-AUC, and computational time, using an 80:20 stratified split and 5-fold cross-validation. Random Forest achieved the best overall performance (accuracy: 0.9163, F1-score: 0.9133, ROC-AUC: 0.9784), while tuned SVM (C=10, RBF kernel) attained the highest accuracy of 0.9473 and F1-score of 0.9321. Cross-validation on Naive Bayes showed consistent results with an average accuracy of 88.09%. Further analysis using Logistic Regression coefficients identified influential features: positive sentiment associated with words like "mantap", "mudah", and "sukses", while negative sentiment correlated with "kecewa", "parah", and "lemot". These insights provide practical value for Tokopedia's teams to enhance user experience, such as improving app speed and addressing complaints. The findings demonstrate the effectiveness and efficiency of traditional machine learning techniques for sentiment analysis in Bahasa Indonesia contexts.
Public Sentiment Analysis on Corruption Issues in Indonesia Using IndoBERT Fine-Tuning, Logistic Regression, and Linear SVM Kono, Maria Fatima; Fajri, Ika Nur; Pristyanto, Yoga
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10537

Abstract

Sentiment analysis is a method in Natural Language Processing (NLP) that aims to understand public perceptions based on textual data from social media. Opinions expressed in digital platforms play an important role as they reflect public trust and attitudes toward strategic issues in Indonesia. This study aims to compare the performance of three IndoBERT-based approaches for sentiment classification, namely IndoBERT with full fine-tuning, IndoBERT as a feature extractor combined with Logistic Regression, and IndoBERT as a feature extractor combined with Linear SVM. The dataset was collected through the Twitter API, consisting of 2,012 tweets, which after preprocessing and balancing resulted in 2,252 labeled data for positive and negative sentiments. The preprocessing stage included cleansing, normalization, tokenization, stopword removal, and stemming. The dataset was then split into 80% training data, 10% validation data, and 10% testing data. Experimental results show that IndoBERT with full fine-tuning achieved the best performance, with an accuracy of 82.67%, an F1-score of 82.35%, and an AUC value of 0.87. Logistic Regression and Linear SVM produced lower accuracies of 80.20% and 78.22%, respectively. These findings indicate that fine-tuned IndoBERT is more effective in capturing the semantic nuances of the Indonesian language, while the non fine-tuning approaches offer better computational efficiency at the cost of reduced accuracy. This study contributes to the development of NLP methods for the Indonesian language, particularly in sentiment analysis, and highlights the potential of transformer-based models for analyzing strategic issues in social media.
Comparison of Light Gradient Boosting Machine, eXtreme Gradient Boosting, and CatBoost with Balancing and Hyperparameter Tuning for Hypertension Risk Prediction on Clinical Dataset Murtiningsih, Dewi Ayu; Sari, Bety Wulan; Fajri, Ika Nur
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10400

Abstract

Hypertension is a long-lasting condition that is highly prevalent and significantly contributes to cardiovascular issues, making early identification a crucial preventive action. This research evaluates the efficacy of three boosting algorithms, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and CatBoost in forecasting hypertension risk. A publicly accessible dataset consisting of 4,363 samples was employed, followed by data preprocessing, feature selection through a voting method that integrates Boruta, Recursive Feature Elimination (RFE), and SelectKBest, as well as addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE) and ADASYN (Adaptive Synthetic Sampling Approach). The models were additionally fine-tuned through hyperparameter optimization using GridSearchCV and Repeated Stratified K-Fold Cross Validation. The evaluation results demonstrate that all three algorithms exhibited strong predictive capabilities, with CatBoost leading the way, achieving an accuracy of 0.992, precision of 0.992, recall of 0.992, F1-score of 0.992, and ROC-AUC of 0.9987. Analyzing the confusion matrix further validated that CatBoost had the lowest number of misclassifications when compared to XGBoost and LGBM. Additionally, the use of SHapley Additive exPlanations (SHAP) for model interpretability highlighted that the key factors influencing the prediction of hypertension risk are blood pressure, body mass index (BMI), overall physical activity, waist circumference, triglyceride levels, age, and LDL cholesterol levels, aligning with established medical knowledge. To facilitate real-world use, the top-performing model was implemented into a user-friendly website interface, allowing users to predict their hypertension risk interactively. These findings illustrate that boosting algorithms, especially CatBoost, offer an accurate, dependable, and interpretable machine learning method for creating hypertension risk prediction systems.
Sentiment Analysis of the Film "JUMBO" on Twitter Using the Naive Bayes Method and Support Vector Machine (SVM) with a Text Mining Approach Widodo, Tegar Robi; Fajri, Ika Nur; Sari, Bety Wulan
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10557

Abstract

This study aims to perform sentiment analysis on reviews of the film “JUMBO” collected from the Twitter platform, using the Naive Bayes and Support Vector Machine (SVM) methods. The data were gathered through a crawling process on Twitter, yielding 2,011 tweets, which were then processed through several pre-processing steps, including case folding, cleaning, normalization, tokenization, stopword removal, and stemming. Subsequently, the data were transformed into numerical representations using TF-IDF, followed by sentiment labeling into positive, negative, and neutral categories. For the Naive Bayes method, training and evaluation were conducted using 5-fold Cross Validation. The results showed that the Naive Bayes model achieved an accuracy of 80.60%, precision of 73.83%, recall of 73.50%, and an F1-score of 69.98%. Meanwhile, the SVM method obtained an accuracy of 75.87%, precision of 76.36%, recall of 62.45%, and an F1-score of 65.64%. Compared to the baseline random classifier, which only achieved an accuracy of 32.47%, both primary methods significantly outperformed it in classifying film review sentiments. The analysis also indicates that the F1-score is lower than the accuracy due to the imbalanced data distribution, with a considerably higher number of positive reviews. This study also presents visualizations of sentiment distribution and word clouds to provide a clearer understanding of audience opinions. The results demonstrate that the Naive Bayes method performs well and has potential for use in sentiment analysis of films on social media platforms. These findings are expected to provide valuable insights for the creative industry, particularly in evaluating audience responses and improving the quality of future film productions.
PREDICTION OF STROKE USING LOGISTIC REGRESSION WITH A MACHINE LEARNING APPROACH Rana Aphrodita, Ishiqa; Nur Fajri, Ika; Nugroho, Agung
JURTEKSI (jurnal Teknologi dan Sistem Informasi) Vol. 11 No. 4 (2025): September 2025
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) STMIK Royal Kisaran

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33330/jurteksi.v11i4.4161

Abstract

Abstract: Stroke is one of the leading causes of death and disability in various parts of the world, including in Indonesia. Along with the development of digital technology, the use of Machine Learning in the health sector is growing, one of which is in an effort to predict the occurrence of stroke. This study aims to implement the Logistic Regression algorithm in predicting the likelihood of a person having a stroke based on data from the Brain Stroke dataset. The research process includes data preprocessing (missing value handling, normalization, and label encoding), dividing the data into 80% training data and 20% test data, as well as model training. The model was then evaluated using several measures such as accuracy, precision, recall, F1-score, and ROC-AUC, as well as a confusion matrix. The results of the study showed that Logistic Regression was able to provide stroke classification results with an accuracy of 82.4%, precision of 80.1%, recall of 78.6%, F1-score of 79.3%, and a ROC-AUC value of 0.87. Then, the model is integrated into applications that use Streamlit, so it can be used interactively to predict stroke risk in new data. The results of this study show that the combination of Machine Learning and web-based applications has the potential to support efforts to detect early stroke risk. Keywords: logistic regression; machine learning; prediction; streamlit; stroke. Abstrak: Stroke adalah salah satu penyebab utama kematian dan kecacatan di berbagai belahan dunia, termasuk di Indonesia. Seiring perkembangan teknologi digital, penggunaan Machine Learning dalam bidang kesehatan semakin berkembang, salah satunya dalam upaya memprediksi terjadinya penyakit stroke. Penelitian ini bertujuan untuk mengimplementasikan algoritma Logistic Regression dalam memprediksi kemungkinan seseorang mengalami stroke berdasarkan data dari dataset Brain Stroke. Proses penelitian meliputi preprocessing data (penanganan missing value, normalisasi, dan label encoding), membagi data menjadi 80% data latih dan 20% data uji, serta pelatihan model. Model kemudian dievaluasi menggunakan beberapa ukuran seperti akurasi, precision, recall, F1-score, dan ROC-AUC, serta confusion matrix. Hasil penelitian menunjukkan bahwa Logistic Regression mampu memberikan hasil klasifikasi penyakit stroke dengan akurasi sebesar 82,4%, precision 80,1%, recall 78,6%, F1-score 79,3%, dan nilai ROC-AUC sebesar 0,87. Kemudian, model tersebut diintegrasikan ke dalam aplikasi yang menggunakan Streamlit, sehingga dapat digunakan secara interaktif untuk memprediksi risiko stroke pada data baru. Hasil penelitian ini menunjukkan bahwa kombinasi Machine Learning dan aplikasi berbasis web berpotensi mendukung upaya deteksi dini risiko stroke. Kata kunci: logistic regression; machine learning; prediksi; streamlit; stroke.
FROZEN FOOD SALES SYSTEM AT DAKON STORE USING FRAMEWORK FOR THE APPLICATION SYSTEM THINKING METHOD Mangli, Luh Ajeng Roro; Fajri, Ika Nur
ZONAsi: Jurnal Sistem Informasi Vol. 6 No. 3 (2024): Publikasi artikel ZONAsi: Jurnal Sistem Informasi Periode September 2024
Publisher : Universitas Lancang Kuning

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31849/zn.v6i3.21796

Abstract

Increasingly advanced information and communication technology has triggered various influences, including a significant need for the Internet. This technological development disrupts the business sector, especially trade, which requires a shift from conventional stores to online stores to accelerate and increase sales through e-commerce, which expands market share without limits. The Dakon frozen food shop, established in 2018, needs help with conventional sales, which force buyers to come to the shop, as well as time-consuming manual stock and sales data collection. To overcome this problem, the author proposes developing a website-based information system using the FAST (Framework for the Application of System Thinking) method, making it easier to design systems, analyze needs, and build appropriate systems. Implementing this system is expected to expand the reach of buyers, increase sales, and improve governance. With the FAST method, various operational challenges can be overcome more effectively. Payments have become more efficient through automation of the sales process, although improvements to the website's appearance are still needed to improve the user experience
Sistem Rekomendasi Wisata Magelang Menggunakan Metode Collaborative Filtering Siska, Siska; Fajri, Ika Nur; Rayhan, Radhita; Pratama, Akbar; Rohman, Arif Nur
Eksplora Informatika Vol 14 No 1 (2024): Jurnal Eksplora Informatika
Publisher : Institut Teknologi dan Bisnis STIKOM Bali

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30864/eksplora.v14i1.1084

Abstract

Pariwisata telah menjadi kegiatan yang populer dan digemari oleh banyak orang, termasuk di Indonesia yang memiliki berbagai destinasi terkenal. Magelang, salah satu daerah di Indonesia, memiliki potensi pariwisata yang besar dengan ragam objek wisata, mulai dari sejarah hingga alam. Penelitian ini membahas tentang pengembangan sistem rekomendasi tempat wisata di Magelang menggunakan metode collaborative filtering. Data yang digunakan berasal dari kaggle.com, mencakup informasi rating dan profil pengguna. Analisis umur menunjukkan partisipasi tinggi dari kelompok usia 21-30 tahun, yang merupakan segmen aktif dalam wisata. Mayoritas pengguna berasal dari Pulau Jawa, menambah dimensi kebudayaan dalam penelitian. Metode penelitian ini melibatkan penggunaan collaborative filtering untuk menghasilkan rekomendasi tempat wisata berdasarkan preferensi pengguna. Pengujian dilakukan pada User_Id 1, yang menghasilkan rekomendasi beragam dengan prediksi skor sekitar 3,81 untuk tiga tempat utama. Hasil ini menunjukkan bahwa sistem rekomendasi dapat membantu pengguna menemukan destinasi yang sesuai dengan preferensi mereka. Kesimpulan penelitian ini menggarisbawahi potensi sistem rekomendasi untuk meningkatkan pengalaman wisata dan mendukung pengembangan sektor pariwisata di Magelang.
Pemanfaatan Sistem Informasi Berbasis Website untuk Mendukung Pengelolaan Administrasi Data Karyawan Yayasan Taruna Alquran Sleman Yogyakarta Nurmasani, Atik; Dyah Anggita, Sharazita; Dwi Hartanto, Anggit; Pujastuti, Eli; Asti Astuti, Ika; Pristyanto, Yoga; Nur Fajri, Ika
Jurnal Pengabdian Masyarakat Inovasi Indonesia Vol 3 No 4 (2025): JPMII - Agustus 2025
Publisher : CV Firmos

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54082/jpmii.829

Abstract

Penerapan sistem informasi pada suatu institusi penting untuk mendukung proses bisnis. Yayasan Taruna Al-Quran ingin memaksimalkan teknologi dalam mengelola administrasi data unit kerja. Masalah yang dialami pada pengelolaan administrasi data yaitu keterbatasan dalam pengelolaan arsip dan tidak optimalnya proses pencarian data. Sistem informasi berbasis website dibuat untuk mengatasi masalah pengelolaan administrasi dan kemudahan akses bagi seluruh unit kerja. Metode yang diterapkan pada kegiatan terdiri dari perencanaan, pelaksanaan, dan evaluasi. Hasil kegiatan perencanaan berupa perencanaan yang sesuai kebutuhan sebagai dasar pelaksanaan.  Hasil kegiatan pelaksanaan berupa sistem informasi yang siap diserahkan kepada mitra. Hasil evaluasi berupa masukan pengguna dari mitra terhadap sistem informasi, dimana pengguna mudah menggunakan sistem informasi dengan skor 5.9 atau 86%. Sistem informasi yang diterapkan dapat membantu mitra mengelola administrasi data karyawan dengan mudah. Seluruh pengguna dapat mengakses data secara online sesuai kebutuhan.
IMPLEMENTATION OF RANDOM FOREST CLASSIFIER FOR STUDENT GRADUATION CLASSIFICATION Zaidan Putra, Bazil; Nur Fajri, Ika; Nugroho, Agung
JURTEKSI (jurnal Teknologi dan Sistem Informasi) Vol. 12 No. 1 (2025): Desember 2025
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) STMIK Royal Kisaran

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33330/jurteksi.v12i1.4160

Abstract

Abstract: Higher education plays an essential role in improving human resource quality, one of which is through the institution’s ability to monitor and predict student graduation outcomes. This study does not focus on a specific university but utilizes the publicly available Students Performance in Exams dataset from Kaggle, consisting of 1,000 student records containing mathematics, reading, and writing scores, along with demographic attributes such as gender, parental education level, lunch type, and test preparation participation. The data were processed through a feature engineering stage by adding an average score variable as an early indicator of graduation status. A predictive model was developed using the Random Forest Classifier, achieving an accuracy of 94.5%. The final model was integrated into a Streamlit-based web application to provide an accessible tool for academic stakeholders. The results indicate that the proposed model can serve as an effective decision-support tool for early evaluation of students’ likelihood of graduation. Keywords: prediction; random forest classifier, streamlit, student graduation. Abstrak: Pendidikan tinggi memegang peran penting dalam peningkatan kualitas sumber daya manusia, salah satunya melalui kemampuan institusi dalam memantau dan memprediksi tingkat kelulusan mahasiswa. Penelitian ini tidak berfokus pada perguruan tinggi tertentu, melainkan menggunakan dataset publik Students Performance in Exams dari Kaggle yang berisi 1.000 data mahasiswa, terdiri atas nilai matematika, membaca, menulis, serta atribut demografis seperti gender, tingkat pendidikan orang tua, jenis makan siang, dan partisipasi kursus persiapan. Data diolah melalui tahap feature engineering dengan menambahkan variabel average score sebagai indikator awal kelulusan. Model prediksi dibangun menggunakan algoritma Random Forest Classifier, yang menghasilkan tingkat akurasi sebesar 94,5%. Model ini kemudian diimplementasikan ke dalam aplikasi web berbasis Streamlit untuk memberikan layanan prediksi yang mudah diakses oleh pihak akademik. Hasil penelitian menunjukkan bahwa model mampu digunakan sebagai alat pendukung keputusan untuk melakukan evaluasi dini terhadap potensi kelulusan mahasiswa. Kata kunci: kelulusan mahasiswa; prediksi; random forest classifier; streamlit.
Perancangan dan Implementasi Sistem Informasi Berbasis Website pada Toko Sembako Sayur Amanah Radhita Rayhan; Ika Nur Fajri
Jurnal Teknologi Informasi dan Multimedia Vol. 7 No. 1 (2025): February
Publisher : Sekawan Institut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35746/jtim.v7i1.656

Abstract

In the midst of the rapid development of digital technology, various business sectors, including the trade sector, have begun to adopt digital-based information systems to improve operational efficiency and effectiveness. Toko Sembako Sayur Amanah currently still relies on a manual system for recording transactions, managing stock items, and financial reporting using a cash book. This manual system causes the sales process to be inefficient, time-consuming, and prone to errors such as misrecording or data loss. In addition, the manual system is unable to meet the needs of customers who have limited time and makes it difficult to manage transactions and stock items effectively. To overcome these problems, this research aims to design and implement a website-based information system using the Waterfall method, which includes requirements analysis, system design, implementation, and system testing. Testing is carried out with a Black-box Testing approach to ensure the suitability of system functionality with predetermined needs. The test results show that the developed system has succeeded in increasing the efficiency of managing categories and goods by the admin and making it easier for customers to place orders and make payments online. This research is expected to be a reference for the development of similar systems in other grocery stores with the potential to increase competitiveness in an increasingly competitive market. As a follow-up, this research opens up opportunities for further development, such as integration with mobile applications or more sophisticated inventory management systems.