Claim Missing Document
Check
Articles

Found 26 Documents
Search

Multi-Disease Retinal Classification Using EfficientNet-B3 and Targeted Albumentations: A Benchmark on Kaggle Retinal Fundus Images Dataset Saputra, Kurniawan Aji; Alzami, Farrikh; Kurniawan, Defri; Naufal, Muhammad; Muslih, Muslih; Megantara, Rama Aria; Pramunendar, Ricardus Anggi
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15530

Abstract

Retinal diseases remain one of the leading causes of blindness worldwide. This study develops a deep learning pipeline for multiclass retinal disease classification using EfficientNet-B3 combined with Albumentations to improve generalization. We target four classes: cataract, diabetic retinopathy, glaucoma, and normal. We use the Kaggle Retinal Disease dataset (4,217 fundus images) divided into 70% training, 10% validation, and 20% testing. Images are resized to 224×224 and augmented with horizontal flip, random brightness contrast, CLAHE, shiftscale rotate, crop, gamma correction, and elastic transformation. The EfficientNet-B3 backbone is refined after head training with warm-up and learning rate regularization (batch normalization, dropout). After 50 epochs, the best validation performance reaches 0.9526, and on the hold-out test set, the model achieves 95.38% overall accuracy. The F1 scores per class were 1.0000 (diabetic retinopathy), 0.9685 (cataract), 0.9255 (normal), and 0.9184 (glaucoma). Confusion analysis showed that most errors involved glaucoma being misclassified as normal, likely due to optic disc similarities. These results demonstrate that EfficientNet-B3 with targeted augmentation provides accurate and reliable multi-disease screening of fundus images, with the potential to support faster and more consistent triage in clinical workflows. Future research should expand clinical validation and explore attention mechanisms or multimodal input to reduce glaucoma-normal ambiguity.
Implementasi K-Means sebagai Mekanisme Self-Labeling dalam Arsitektur Ensemble Voting Classifier untuk Prediksi Penjualan Usaha Mikro Kecil dan Menengah (UMKM) pada Data Tanpa Label Fahmi, Muhammad Aqil; Kurniawan, Defri
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8779

Abstract

Sales forecasting in the Micro, Small, and Medium Enterprises (MSME) sector faces challenges due to the fluctuating (noisy) nature of the data and the absence of class labels (unlabeled) required for training supervised learning models. This study proposes a sequential hybrid architecture in which the K-Means algorithm is employed as a Self-Labeling mechanism to automatically transform raw transaction data into class labels (“Low” and “High”). The resulting synthetic labels are then used to train an Ensemble Voting Classifier model that aggregates predictions from XGBoost, LightGBM, and CatBoost. The experimental evaluation results show that although the single XGBoost model achieves a slightly higher accuracy (96.24%) compared to the Ensemble model (96.07%), the hybrid Ensemble Voting model proves superior in terms of probability calibration, achieving the lowest Loss value of 0.1532. This value outperforms XGBoost (0.1646) and LightGBM (0.1772), indicating more reliable and stable prediction confidence. The model also demonstrates excellent balance with an F1-Score of 0.95 and a Recall of 0.96 for the majority class. This study confirms that the hybrid approach is effective in reducing uncertainty in MSME stock management.
Implemetasi TF-IDF N-Gram dan Algoritma Nearest Centroid untuk Klasifikasi Topik Tugas Akhir Hana, Rohima Choirul; Kurniawan, Defri
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8859

Abstract

This study presents a lightweight and explainable workflow for curating undergraduate thesis titles in the Informatics Engineering Study Program by combining TF-IDF n-gram (1–2) features with a cosine based Nearest Centroid classifier. Titles are grouped into three internal research area classes, RPLD, SC, and SKKKD, to support topic grouping and supervisor assignment. The approach is implemented as a Streamlit web application that supports Excel upload with preview and persistent saving, column standardization, text normalization, duplicate rejection using normalized titles, rapid training on labeled data, topic prediction for new titles, and retrieval of the most similar titles to assist curation. A key operational contribution is the direct linkage from predicted classes to the program maintained lecturer list for each area, enabling students to identify suitable supervisors and helping coordinators run a consistent and auditable workflow. On a multi semester corpus of 1,057 titles, stratified 5-fold cross-validation achieved 92.43 percent average accuracy, Macro F1 of 0.875, Micro F1 of 0.924, and Weighted F1 of 0.925, indicating a balance between accuracy, efficiency, and interpretability for short text. Decision inspection is supported by class specific top terms and nearest neighbor title lists. Limitations mainly stem from the minority class, therefore future work will expand labeled corpora, add character level n grams, and explore lightweight hybrid representations.
Application of ADASYN and Optuna in the XGBoost Algorithm for Stunting Detection Putra Sadewa, Fastabyq; Kurniawan, Defri
Journal of Applied Informatics and Computing Vol. 10 No. 1 (2026): February 2026
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v10i1.12035

Abstract

This study aims to develop an early detection model for childhood stunting risk using a machine learning approach based on Extreme Gradient Boosting (XGBoost), integrated with the Adaptive Synthetic Sampling (ADASYN) technique for data balancing and Optuna-based hyperparameter optimization. One of the main challenges in stunting prediction is class imbalance, where the number of stunting cases is significantly higher than non-stunting cases, thereby reducing the model’s ability to accurately identify the minority class. To address this issue, the study implements data deduplication, structured data splitting, and applies ADASYN exclusively to the training data to prevent data leakage and preserve the validity of the evaluation process. The proposed model (XGBoost with ADASYN and Optuna) is then compared with a baseline model that combines XGBoost and SMOTE. Experimental results show that the proposed model achieves an accuracy of 81.98%, a recall of 91.50%, and an F1-score of 89.14%, indicating improved sensitivity and a more balanced classification performance compared to the baseline. These findings demonstrate that the integration of ADASYN and Optuna-based hyperparameter optimization enhances model stability and generalization capability, making it a viable data-driven approach for stunting risk detection in environments with imbalanced class distributions.
Optimasi Hyperparameter Random Forest untuk Klasifikasi Depresi Mahasiswa Menggunakan GridSearchCV dan RandomizedSearchCV Utami, Eka Wahyu; Kurniawan, Defri
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9366

Abstract

Student mental health is an important issue that requires a data-driven approach to support the classification process of student depression. This study aims to analyze the factors that cause depression and optimize the performance of the classification model by applying the Random Forest algorithm. The data used in this research is secondary data from the Student Depression Dataset obtained from the Kaggle platform, with a total of 27,901 data points. The research stages begin with data collection followed by Exploratory Data Analysis (EDA), which includes descriptive statistical analysis and correlation between variables using a heatmap. Data preprocessing involves removing irrelevant features, handling missing values, encoding categorical data, and splitting the data into training and testing sets. Model development is carried out through three scenarios: a baseline model, hyperparameter optimization using GridSearchCV, and RandomizedSearchCV. Model performance evaluation is measured using a Confusion Matrix to analyze accuracy, precision, recall, and F1-score. The results show that all models produce relatively stable accuracy in the range of 0.84–0.85. The model with GridSearchCV optimization provides the best performance with a recall value of 0.8869 and an F1-score of 0.8719. This increase in recall is important to minimize the risk of false negatives in identifying students experiencing depression. It is hoped that these findings can contribute as a decision support system for educational institutions in more accurately detecting and managing students' mental health.
Bidirectional Long Short-Term Memory for Early Detection of Running Injuries in Imbalanced Data David, David; Kurniawan, Defri
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 2 (2026): Article Research April, 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i2.15928

Abstract

Running-related injuries are a common sports health issue that can impair athletic performance and potentially terminate an athlete’s career. Early injury detection is therefore critical, as injuries are cumulative in nature and influenced by training load patterns over time. Consequently, data-driven predictive approaches based on time-series analysis are required to support athlete monitoring systems with a safety-oriented focus. This study aims to develop an efficient, accurate, and safety-first injury prediction model for running athletes. The study utilizes daily running activity time-series data obtained from Kaggle. The proposed model is based on a Bi-Directional Long Short-Term Memory (Bi-LSTM) architecture to capture bidirectional temporal dependencies, combined with Focal Loss to address extreme class imbalance. In addition, domain-specific feature engineering is applied through the Acute:Chronic Workload Ratio (ACWR). Model performance is evaluated against tabular-data-based models, namely XGBoost and Balanced Bagging, across multiple experimental configurations. Experimental results indicate that the lightweight Bi-LSTM configuration achieves a Recall of 90.7%, outperforming the benchmark models while maintaining a competitive AUC. These findings demonstrate that sequential modeling is more effective in detecting rare injury events. Overall, this study confirms that Bi-LSTM-based sequential modeling is well suited for early detection of running injuries and suggests its potential applicability in athlete monitoring systems that prioritize safety.