Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Evaluation of the Decision Tree Model for Air Condition Classification on the Global Air Pollution Dataset Sabella, Cindy Dinda; Pristyanto, Yoga
Journal of Applied Informatics and Computing Vol. 8 No. 2 (2024): December 2024
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v8i2.8611

Abstract

Air pollution is an urgent global environmental problem, with significant impacts on public health and ecosystem stability. This research aims to develop an air quality classification model using the Global Air Pollution dataset from Kaggle, which consists of 23,463 rows of data and 12 features, including important variables such as Air Quality Index (AQI), PM2.5, NO2, and O3. Decision Tree, Random Forest, and Support Vector Machine (SVM) algorithms are applied to perform classification, with a focus on hyperparameter tuning to increase model accuracy. The research results show that the Decision Tree provides the best results with an accuracy of 99.89% after tuning hyperparameters using the Grid Search method. The SVM model showed an improvement of 94.89% to 99.32%, while Random Forest recorded an accuracy of 96.87% with no significant improvement after tuning. Importance feature analysis identified PM2.5 and AQI as the dominant factors in influencing air quality, with PM2.5 having the highest importance value of 0.93. This research confirms that machine learning can be an effective tool for integrating and classifying air pollution. It is hoped that the integration of this model into a real-time air quality monitoring system can help make more responsive and precise decisions in dealing with air pollution problems.
Generative AI Image Sentiment Analysis on Social Media X using TF-IDF and FastText Saputra, Rahman; Pristyanto, Yoga; Fajri, Ika Nur
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10627

Abstract

This research investigates public opinion on AI-generated images on Social Media X using machine learning-driven text classification. Three classification models were evaluated: Complement Naïve Bayes (CNB) utilizing TF-IDF features, Support Vector Machine (SVM) merging TF-IDF with FastText embeddings, and IndoBERT as a modern transformer-based baseline. A total of 1,958 Indonesian tweets were collected via web scraping with relevant keywords, followed by a pipeline involving text cleaning, manual labeling into positive, negative, and neutral categories, and data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) for the classical models (with class weighting applied for IndoBERT). Results show that the SVM model outperformed the others, achieving 68.7% accuracy with average precision, recall, and F1-score of 0.69, 0.69, and 0.68, respectively; CNB attained 64.1% accuracy with average metrics of 0.64; while IndoBERT recorded 58.2% accuracy with average precision, recall, and F1-score of 0.58, 0.58, and 0.57. Confusion matrix analysis revealed SVM's superior ability to distinguish positive and neutral sentiments in casual language, though IndoBERT demonstrated potential for capturing deeper semantic nuances despite underperforming due to dataset size and informal text. The findings highlight the efficacy of integrating statistical and semantic representations for improved sentiment analysis on unstructured, noisy social media data related to AI-generated imagery, while suggesting that transformer models like IndoBERT may benefit from larger datasets for optimal performance.
Sentiment Classification Analysis of Tokopedia Reviews Using TF-IDF, SMOTE, and Traditional Machine Learning Models Barus, Herianta; Fajri, Ika Nur; Pristyanto, Yoga
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10524

Abstract

This study explores sentiment classification on Tokopedia user reviews using TF-IDF for feature extraction and SMOTE to handle class imbalance. From nearly one million raw reviews sourced from Kaggle ("E-Commerce Ratings and Reviews in Bahasa Indonesia"), a final set of 6,477 relevant entries was obtained after rigorous preprocessing, including case folding, noise removal (emojis, URLs, numbers), normalization to KBBI standards, tokenization, stopword removal, and stemming with Sastrawi. The dataset consisted of 5,213 positive and 1,264 negative reviews (80.4% positive). SMOTE balanced the classes to 10,426 reviews with a 1:1 ratio for training. Five traditional machine learning models were evaluated: Naive Bayes, Logistic Regression, Support Vector Machine (SVM), Decision Tree, and Random Forest. Assessments were based on accuracy, precision, recall, F1-score, ROC-AUC, and computational time, using an 80:20 stratified split and 5-fold cross-validation. Random Forest achieved the best overall performance (accuracy: 0.9163, F1-score: 0.9133, ROC-AUC: 0.9784), while tuned SVM (C=10, RBF kernel) attained the highest accuracy of 0.9473 and F1-score of 0.9321. Cross-validation on Naive Bayes showed consistent results with an average accuracy of 88.09%. Further analysis using Logistic Regression coefficients identified influential features: positive sentiment associated with words like "mantap", "mudah", and "sukses", while negative sentiment correlated with "kecewa", "parah", and "lemot". These insights provide practical value for Tokopedia's teams to enhance user experience, such as improving app speed and addressing complaints. The findings demonstrate the effectiveness and efficiency of traditional machine learning techniques for sentiment analysis in Bahasa Indonesia contexts.
Public Sentiment Analysis on Corruption Issues in Indonesia Using IndoBERT Fine-Tuning, Logistic Regression, and Linear SVM Kono, Maria Fatima; Fajri, Ika Nur; Pristyanto, Yoga
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10537

Abstract

Sentiment analysis is a method in Natural Language Processing (NLP) that aims to understand public perceptions based on textual data from social media. Opinions expressed in digital platforms play an important role as they reflect public trust and attitudes toward strategic issues in Indonesia. This study aims to compare the performance of three IndoBERT-based approaches for sentiment classification, namely IndoBERT with full fine-tuning, IndoBERT as a feature extractor combined with Logistic Regression, and IndoBERT as a feature extractor combined with Linear SVM. The dataset was collected through the Twitter API, consisting of 2,012 tweets, which after preprocessing and balancing resulted in 2,252 labeled data for positive and negative sentiments. The preprocessing stage included cleansing, normalization, tokenization, stopword removal, and stemming. The dataset was then split into 80% training data, 10% validation data, and 10% testing data. Experimental results show that IndoBERT with full fine-tuning achieved the best performance, with an accuracy of 82.67%, an F1-score of 82.35%, and an AUC value of 0.87. Logistic Regression and Linear SVM produced lower accuracies of 80.20% and 78.22%, respectively. These findings indicate that fine-tuned IndoBERT is more effective in capturing the semantic nuances of the Indonesian language, while the non fine-tuning approaches offer better computational efficiency at the cost of reduced accuracy. This study contributes to the development of NLP methods for the Indonesian language, particularly in sentiment analysis, and highlights the potential of transformer-based models for analyzing strategic issues in social media.