Claim Missing Document
Check
Articles

Found 39 Documents
Search

The Effect of Feature Selection on Machine Learning Classification Pardede, Jasman; Dwianto, Rio
JOIV : International Journal on Informatics Visualization Vol 9, No 4 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.4.2926

Abstract

High-dimensional datasets can lead to overfitting and computationally expensive model building on machine learning. This study uses a dimensionality reduction technique, namely feature selection techniques, to overcome these problems. Five feature selection methods were used, i.e., Chi-Square (CS), Information Gain (IG), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Least Absolute Shrinkage and Selection Operator (LASSO), and three classifier methods viz. Naïve Bayes, Extreme Gradient Boosting (XGB), and RF Classifier. The dataset used is the Heart Attack Analysis & Prediction Dataset. In this study, three scenarios of the best feature selection were carried out, namely: 1. selection of the best feature using a specific feature selection, 2. the intersection of selection of the best feature from the same category, 3. the intersection of selection of the best feature from the five proposed feature selection methods. The performance model is measured using accuracy, precision, recall, f1-score, AUC, and training time. This study reveals that feature selection is very effective in improving the performance of prediction models. Based on the experiment results, the best feature selection is CS and IG in the Filter Category with the XGB model. The best feature selected improved the performance of accuracy, precision, recall, f1-score, and AUC, i.e., 1.7%, 1%, 2.3%, 1.6%, and 0.2%, respectively. Meanwhile, training time requirements decreased by 23.5%. Feature selection with specific techniques performs better than feature selection by selecting the best features from the same category feature selection technique or various other feature selection methods.  
IMAGE CAPTIONING MENGGUNAKAN METODE RESNET50 DAN LONG SHORT TERM MEMORY Raka Satria, Marius; Pardede, Jasman
Jurnal Tera Vol 2 No 2 (2022): Jurnal Tera (September 2022)
Publisher : Fakultas Teknik dan Informatika, Universitas Dian Nusantara

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (465.942 KB)

Abstract

Kesalahpahaman manusia dalam mencari makna arti dari sebuah gambar menimbulkan kebingungan. Hanya karena struktur kalimat atau penggunaan kata bermakna makna lebih dari satu yang biasa disebut ambiguitas. Ambiguitas terjadi apabila arti dari kata, frasa, atau kalimat tidak pasti, maknanya lebih dari satu. Karena adanya keterkatian dengan kecerdasan buatan dalam membantu klasifikasi gambar untuk menghindari ambiguitas, penggunaan Image Captioning dimanfaatkan pada penelitian ini. Image Captioning menghasilkan deskripsi berbahasa alami. Mengambil makna dari sebuah gambar dibutuhkan tingkat pemahaman yang lebih tinggi dari klasfikasi dan detesi gambar. Permasalahan yang muncul dapat diselesaikan dengan penggabungan antara kecerdasan buatan dan jaringan syaraf tiruan. Kedua metode yang digunakan dalam penelitian ini adalah Resnet50 dan Long Short Term Memory. Resnet50 berfungsi untuk klasifikasi gambar dan LSTM jaringan syaraf tiruan untuk generate caption. Penelitian ini menggunakan BLEU scoring satu gram untuk memberi nilai pada caption yang telah dibuat. Score BLEU tertinggi adalah 79,7455% dan akurasi tertinggi yang didapat adalah 85,74% pada 100 epoch.
Implementation of Generative Adversarial Network to Generate Fake Face Image Pardede, Jasman; Setyaningrum, Anisa Putri
JOIN (Jurnal Online Informatika) Vol 8 No 1 (2023)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v8i1.790

Abstract

In recent years, many crimes use technology to generate someone's face which has a bad effect on that person. Generative adversarial network is a method to generate fake images using discriminators and generators. Conventional GAN involved binary cross entropy loss for discriminator training to classify original image from dataset and fake image that generated from generator. However, use of binary cross entropy loss cannot provided gradient information to generator in creating a good fake image. When generator creates a fake image, discriminator only gives a little feedback (gradient information) to generator update its model. It causes generator take a long time to update the model. To solve this problem, there is an LSGAN that used a loss function (least squared loss). Discriminator can provide astrong gradient signal to generator update the model even though image was far from decision boundary. In making fake images, researchers used Least Squares GAN (LSGAN) with discriminator-1 loss value is 0.0061, discriminator-2 loss value is 0.0036, and generator loss value is 0.575. With the small loss value of the three important components, discriminator accuracy value in terms of classification reaches 95% for original image and 99% for fake image. In classified original image and fake image in this studyusing a supervised contrastive loss classification model with an accuracy value of 99.93%.
Egg Weight Estimation Based on Image Processing using Mask R-CNN and XGBoost Pardede, Jasman; Rawosi, Muhammad Fadlansyah Zikri Akhiruddin; Setyaningrum, Anisa Putri; Milenio, Rizka Milandga; Chazar, Chalifa
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.1004

Abstract

Manually measuring egg weight in the context of livestock and the food industry can pose various problems, including time and labor requirements, the risk of egg damage, consistency and accuracy, and limitations on production scale. To address these issues, an automated egg weight estimation system is essential. This study proposes integrating computer vision and machine learning into a unified workflow that combines segmentation, classification, and regression for practical weight estimation. The proposed pipeline employs Mask R-CNN for egg segmentation, Random Forest (RF) classifier for egg type classification based on color features, and XGBoost for regression using morphological, geometric, color features, and egg type as predictors. The dataset used is 720 images, consisting of 20 eggs (10 chicken and 10 duck), each photographed from 36 rotational angles, and was collected with Ground Truth (GT) weights obtained from a digital scale. Experimental findings show that the RF classifier achieved perfect accuracy (precision, recall, and F1-score = 1.00) in distinguishing chicken and duck eggs. The XGBoost regressor obtained a training performance of MAE = 1.07 g and R² = 0.68, and a validation performance of MAE = 0.23 g and R² = 0.80 under 10-fold grouped cross-validation. Although a Support Vector Regressor baseline reached higher training accuracy (MAE = 0.22 g, R² = 0.96), it failed to generalize on validation (R² 0), highlighting XGBoost’s robustness. The feature importance analysis revealed that there are 4 (four) important features for building an estimation model, namely: Hu moments, eccentricity, elongation, and diagonal length, while color statistics played a complementary role. The novelty of this work lies in combining deep segmentation, color-based classification, and feature-driven regression into a unified framework specifically for egg weight estimation, showing its feasibility as a proof of concept and laying the foundation for future large-scale, calibrated, and externally validated deployment.
The Impact of Balanced Data Techniques on Classification Model Performance Pardede, Jasman; Dika Prasetia Pamungkas
Scientific Journal of Informatics Vol. 11 No. 2: May 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i2.3649

Abstract

Purpose: The aim of this study is to examine the impact of balanced data techniques on the performance of classification models. Methods: To balance the imbalanced dataset, several resampling techniques are employed: The Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE (B-SMOTE), and SMOTE and Edited Nearest Neighbors (SMOTE-ENN). Classification is then performed using both balanced and unbalanced datasets to evaluate the impact of resampling techniques on classification model performance. Result: This study proposes the SMOTE, B-SMOTE, and SMOTE-ENN techniques for generating synthetic data. Experimental results showed that re-sampling can improve model performance on KNN, Naive Bayes, and Decision Tree. The best-balanced data technique is the SMOTE-ENN. The second best is B-SMOTE, and the last is SMOTE. If compared to the unbalanced dataset, the SMOTE technique encourages increasing the performance of Accuracy, Precision, Recall, F1-Score, G-mean, and Curve-ROC respectively by 4.79%, 35.89%, 35.32%, 35.63%, 46.94%, and 34.89%, respectively on DT method. The B-SMOTE technique on the DT method improves the performance of Accuracy, Precision, Recall, F1-Score, G-mean, and Curve-ROC respectively by 5.62%, 36.45%, 35.88%, 36.19%, 47.40%, and 35.46% if compared to the unbalanced dataset. The SMOTE-ENN technique improves the performance of Accuracy, Precision, Recall, F1-Score, G-mean, and Curve-ROC respectively by 8.11%, 34.53%, 43.25%, 41.63%, 62.85%, and 42.91% if compared to the unbalanced dataset. Novelty: Based on the experiment results, the best-balanced data technique is the SMOTE-ENN. The SMOTE-ENN technique improves the performance of Accuracy, Precision, Recall, F1-Score, G-mean, and Curve-ROC.
Folk Games Image Captioning using Object Attention Akbar, Saiful; Sitohang, Benhard; Pardede, Jasman; Amal, Irfan; Yunastrian, Kurniandha; Ahmada, Marsa; Prameswari, Anindya
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 7 No 4 (2023): August 2023
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v7i4.4708

Abstract

The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between the output of the attention model and the input expectation of the decoder can cause the decoder to give incorrect results. In this paper, we proposed an object-attention mechanism using object detection. Object detection outputs a bounding box and an object category label, which is then used as an image input into VGG16 for feature extraction and into a caption-based LSTM model. The experimental results showed that the system with object attention performed better than the system without object attention. BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDER scores for the image captioning system with object attention improved 12.48%, 17.39%, 24.06%, 36.37%, and 43.50% respectively compared to the system without object attention.
Hepatitis Identification using Backward Elimination and Extreme Gradient Boosting Methods Pardede, Jasman; Nurrohmah, Desita
Journal of Information Systems Engineering and Business Intelligence Vol. 10 No. 2 (2024): June
Publisher : Universitas Airlangga

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20473/jisebi.10.2.302-313

Abstract

Background: Hepatitis is a contagious inflammatory disease of the liver and is a public health problem because it is easily transmitted. The main factors causing hepatitis are viral infections, disease complications, alcohol, autoimmune diseases, and drug effects. Some hepatitis variants such as B, C, and D can also cause liver cancer if left untreated. Objective: This research aims to determine the effect of Backward Elimination feature selection on the performance of hepatitis disease identification compared to cases where Backward Elimination is not applied. Methods: XGBoost classification, capable of handling machine learning problems, was utilized. Additionally, Backward Elimination was used as a featured selection to increase accuracy by reducing the number of less important features in the data classification process. Results: The results for training XGBoost model with Backward Elimination, and applying Random Search for hyperparameter optimization, achieved an accuracy of 98.958% at 0.64 seconds. This performance was better than using Bayesian search, which produced the same accuracy of 98.958% but required a longer training time of 0.70 seconds. Conclusion: The use of features obtained from Backward Elimination process as well as the use of feature average values for missing value treatment, produced an accuracy of 98.958%.the precision in training XGBoost model with hyperparameter Bayesian search achieved accuracy, recall, and F1 score of 98.934%, 98.934%, and 98.934%, respectively. Consequently, the use of Backward Elimination in XGBoost model led to faster training, improved accuracy, and decreased overfitting.   Keywords: Hepatitis, Backward Elimination, XGBoost, Bayesian Search, Random Search
Impact of Feature Engineering on XGBoost Model for Forecasting Cayenne Pepper Prices Pardede, Jasman; Putri Setyaningrum, Anisa; Ilyas Al-Fadhlih, Muhammad
Scientific Journal of Informatics Vol. 12 No. 4: November 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i4.32157

Abstract

Purpose: Cayenne pepper represents one of Indonesia’s key horticultural commodities, widely utilized in both household culinary practices and the food processing industry. Nevertheless, its market price is subject to considerable volatility, driven by factors such as weather variability, limited supply, production costs, and inefficiencies in distribution systems. This price instability generates uncertainty that adversely impacts farmers, traders, and consumers. Consequently, the development of a reliable price forecasting model is crucial to facilitate price stabilization and enable data-driven decision-making across the supply chain. This study aims to investigate the extent to which feature engineering techniques can enhance the predictive performance of the Extreme Gradient Boosting (XGBoost) algorithm in forecasting cayenne pepper prices. Through the integration of lag features, moving averages, and seasonal indicators, the proposed model is expected to more effectively capture market dynamics and provide a robust analytical tool for relevant stakeholders. Methods: The forecasting model was constructed using the XGBoost algorithm in combination with various feature engineering methods. The dataset consists of daily price records obtained from Bank Indonesia’s PIHPS system and meteorological variables sourced from BMKG, encompassing the period between 2021 and 2024. The engineered features include lag variables identified through Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) analyses, Simple Moving Averages (SMA), seasonal indicators, and holiday-related variables designed to capture recurring patterns and event-driven price fluctuations. To enhance predictive performance, hyperparameter tuning was conducted using a grid search optimization approach. Result: The optimal model demonstrated substantial performance improvements under the following hyperparameter configuration: alpha = 0, gamma = 0.3, lambda = 1, learning_rate = 0.05, max_depth = 3, min_child_weight = 3, n_estimators = 200, and subsample = 0.6. The application of feature engineering markedly enhanced the model’s predictive capability, increasing the R² value by 99.10% while reducing the MAE, RMSE, and MAPE by 72.63%, 71.31%, and 72.04%, respectively. These outcomes signify a notable reduction in forecasting errors and demonstrate the model’s improved accuracy. Novelty: This study integrates multi-level price data with weather and holiday-related features, employing the ACF and the PACF analyses to determine optimal lag values (techniques commonly utilized in statistical modeling). This integration enhances both the accuracy and interpretability of the XGBoost algorithm, thereby providing a practical and effective tool for agricultural price forecasting and market planning.
Pendekatan Augmentasi Citra Fundus pada Model EfficientNet untuk Klasifikasi Tingkat Keparahan Retinopati Diabetik dengan Dataset Tidak Seimbang CHAZAR, CHALIFA; ADLI, MUHAMMAD ARKAN; PARDEDE, JASMAN; ICHWAN, MUHAMMAD
MIND (Multimedia Artificial Intelligent Networking Database) Journal Vol 10, No 2 (2025): MIND Journal
Publisher : Institut Teknologi Nasional Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/mindjournal.v10i2.180-194

Abstract

AbstrakRetinopati diabetik (RD) adalah komplikasi diabetes mellitus yang menyerang pembuluh darah retina dan berpotensi menyebabkan kebutaan jika tidak terdeteksi dini. Citra fundus retina berperan penting dalam mendeteksi serta mengklasifikasikan tingkat keparahan RD karena mampu menampilkan kelainan secara jelas. Tantangan utama dalam klasifikasi RD adalah ketidakseimbangan data antar kelas. Penelitian ini mengusulkan penggunaan EfficientNet-B0 dengan augmentasi gambar terarah pada dataset APTOS 2019. Hasil evaluasi menunjukkan peningkatan akurasi dari 73,84% menjadi 82,56% serta F1-score 0,8241. Peningkatan signifikan terlihat pada kelas minoritas, misalnya Mild dari 0,1429 menjadi 0,65 dan Severe dari 0,087 menjadi 0,4211. Temuan ini membuktikan bahwa augmentasi terarah efektif dalam mengurangi bias kelas mayoritas dan meningkatkan keandalan model.Kata kunci: augmentasi, EfficientNet, ketidakseimbangan kelas, retinopati diabetikAbstractDiabetic retinopathy (DR) is a complication of diabetes mellitus that affects the retinal blood vessels and may lead to blindness if not detected early. Fundus images play a crucial role in detecting and classifying the severity of DR as they clearly reveal pathological abnormalities. The main challenge in DR classification lies in the imbalance across severity classes. This study proposes the use of EfficientNet-B0 combined with targeted image augmentation on the APTOS 2019 dataset. The evaluation results show an improvement in accuracy from 73.84% to 82.56% and a F1-score of 0.8241. Significant gains are observed in minority classes, such as Mild (from 0.1429 to 0.65) and Severe (from 0.087 to 0.4211). These findings demonstrate that targeted augmentation is effective in reducing majority-class bias and improving model reliability.Keywords: class imbalance, data augmentation, diabetic retinopathy, EfficientNet