Claim Missing Document
Check
Articles

Found 36 Documents
Search

Impact of SMOTE and ADASYN on Class Imbalance in Metabolic Syndrome Classification Using Random Forest Algorithm Nurhayati, Lutfiana Deka; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10657

Abstract

Metabolic Syndrome is a collection of medical conditions that can increase the risk of stroke, cardiovascular disease, and type 2 diabetes. Early detection of this condition requires a machine learning model capable of accurate classification to support timely treatment. However, class imbalance in data often hampers the performance of classification algorithms, particularly in recognizing minority classes, namely individuals diagnosed with Metabolic Syndrome. This study aims to analyze the effect of applying the SMOTE and ADASYN data balancing techniques in classifying Metabolic Syndrome using the Random Forest algorithm. These algorithms were chosen for their ability to produce accurate predictions, although their performance can decline when faced with imbalanced class distributions. The results showed that the model without data balancing techniques achieved 86% accuracy with a minority class recall of 75%. The application of SMOTE increased accuracy to 91% and recall to 93%, while ADASYN achieved 92% accuracy and a minority class recall of 95%. These findings indicate that the ADASYN technique combined with the Random Forest algorithm provides significant performance improvements in the classification of Metabolic Syndrome on imbalanced data.
Comparative Analysis of Random Forest and XGBoost Models for Cervical Cancer Risk Prediction using SHAP-based Explainable AI Yudha, Muhammad Agung Reza; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.10357

Abstract

Cervical cancer remains one of the leading causes of cancer-related deaths among women, particularly in developing countries such as Indonesia. This study aims to develop an accurate and interpretable predictive model for cervical cancer risk using Random Forest (RF) and Extreme Gradient Boosting (XGBoost) algorithms. The dataset used is the Cervical Cancer Risk Factors from the UCI Repository, consisting of 858 patient records and 36 clinical and demographic features. The preprocessing stages include missing value imputation, class balancing using Synthetic Minority Oversampling Technique (SMOTE), and hyperparameter optimization through Randomized Search CV. Experimental results show that both models achieved high performance, with accuracy exceeding 96% and AUC above 0.95, while the XGBoost (Tuned + SMOTE) model slightly outperformed RF in detecting positive cases. The interpretability analysis using SHapley Additive exPlanations (SHAP) identified clinical features such as Schiller Test, Hinselmann Test, and Cytology Result as the most influential factors in the classification process, consistent with established clinical evidence. Therefore, the integration of XGBoost, SMOTE, and SHAP provides a predictive framework that is not only highly accurate but also clinically explainable, supporting the development of decision-support systems for early cervical cancer detection.
Machine Learning Based Prediction of Osteoporosis Risk Using the Gradient Boosting Algorithm and Lifestyle Data Salim, Edwin Ibrahim; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.10483

Abstract

Osteoporosis is a degenerative disease characterized by decreased bone mass and an increased risk of fractures, particularly among the elderly population. Early detection is essential; however, standard diagnostic methods such as Dual-Energy X-ray Absorptiometry (DEXA) remain limited in terms of availability and cost. This study aims to develop a machine learning-based risk prediction model for osteoporosis by utilizing lifestyle data with the Gradient Boosting algorithm. The secondary dataset was obtained from the Kaggle platform, consisting of 1,958 samples covering lifestyle and clinical attributes such as age, gender, physical activity, smoking habits, calcium intake, vitamin D consumption, and family history. Preprocessing involved normalization and categorical feature encoding, along with a balance check of class distribution, which indicated that the dataset was relatively balanced. The data were then divided using stratified sampling with an 80% training set and 20% testing set. Model performance was evaluated using accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC). The results showed that the Gradient Boosting algorithm achieved an accuracy of 91%, precision of 90.8%, recall of 90.2%, F1-score of 90.5%, and an AUC of 0.92, outperforming baseline methods such as Logistic Regression and Random Forest. These findings demonstrate that Gradient Boosting is effective as a decision-support tool for early osteoporosis screening based on lifestyle data and has the potential to be integrated into clinical decision-making systems to enhance early detection in healthcare services. Nevertheless, since this study relied on a secondary dataset from Kaggle, the results require further validation using real clinical data from Indonesia to ensure representativeness for the local population.
A Hybrid Approach to Music Recommendations Based on Audio Similarity Using Autoencoder and LightGBM Aristawidya, Winda Ardelia; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.10516

Abstract

Music recommendation systems help users navigate large music collections by suggesting songs aligned with their preferences. However, conventional methods often overlook the depth of audio content, limiting personalization and accuracy. This study proposes a hybrid approach that uses PCA and Autoencoder to extract audio embeddings. These embeddings are processed using K-Nearest Neighbors to find similar tracks, followed by a reranking step with LightGBM based on predicted relevance. The system achieved strong results: 98% accuracy, 0.96 precision, 0.96 recall, and 0.96 F1-score for the Similar class, with 0.99 precision and recall for Not Similar. Cross-validation confirmed model robustness, with an average accuracy of 97.99%, precision of 0.9577, recall of 0.9624, and F1-score of 0.9600, all with low standard deviations. These outcomes show that combining deep audio features with machine learning ranking enhances recommendation quality. Future improvements may involve incorporating metadata and genre-based visualizations for more diverse and interpretable results.
Hyperparameter Optimization and Feature Selection Analysis on the XGBoost Model for Hepatitis C Infection Prediction Lefi, Nadia Martha; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.10876

Abstract

Hepatitis C is a liver disease that can progress to chronic conditions such as cirrhosis and liver cancer. Early detection is essential and can be supported through machine learning approaches. This study analyzes the effect of feature selection and hyperparameter tuning on the performance of the XGBoost model in classifying hepatitis C infection. The dataset, obtained from Kaggle, contains laboratory test attributes. The preprocessing stage involved handling missing values, encoding categorical variables, removing outlier classes, and normalizing data using StandardScaler. After stratified splitting, the training set was balanced using the SMOTE technique. Feature selection was carried out using the ANOVA F-score method, and hyperparameter tuning was performed using GridSearchCV. Three model scenarios were compared: baseline, with feature selection, and with combined feature selection and hyperparameter tuning. The evaluation results showed that the third model achieved the best performance with 96% accuracy, 79% precision, 81% recall, and a 78% F1-score, despite a slight decrease in the ROC AUC value. This approach has proven effective in improving model performance and is relevant for supporting more accurate hepatitis C diagnosis systems.
Multiclass Classification of Tomato Leaf Diseases Using GLCM, Color, and Shape Feature Extraction with Optimized XGBoost Laiskodat, Fransisko Andrade; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11273

Abstract

Automatic classification of tomato leaf diseases is an essential component in advancing precision agriculture based on artificial intelligence. This study aims to develop a multiclass classification model for tomato leaf diseases by utilizing texture, color, and shape features, and employing an optimized XGBoost algorithm. The public PlantVillage dataset was used, with preprocessing stages including feature extraction, normalization, dimensionality reduction using PCA, and class balancing using SMOTE. The experimental results showed that the model successfully classified ten disease classes with a high accuracy of 97.63%, and both macro and weighted f1-scores of 0.98. These findings indicate that the combination of handcrafted features and XGBoost offers an effective, efficient, and applicable solution for plant disease diagnostic systems.
Efficient Feature Extraction Using MobileNetV2 and EfficientNetB0 for Multi-Class Brain Tumor Classification Amelia, Hemas Anggita; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11354

Abstract

Brain tumor classification in MRI is complicated by the similarity of imaging features across multiple tumor classes.  This study evaluates the use of lightweight convolutional neural network (CNN) architectures as feature extractors combined with machine learning classifiers for multi-class classification. MobileNetV2 and EfficientNetB0 were used to extract fixed-length feature representations, which were then classified using Support Vector Machine (SVM), Logistic Regression, Random Forest, and K-Nearest Neighbors. The evaluation used stratified five-fold cross-validation, and performance was measured with accuracy, F1-score, and Matthews Correlation Coefficient (MCC). Results show that EfficientNetB0 features paired with SVM achieved the highest test accuracy (98.5%), while Logistic Regression also yielded competitive performance (97.1%). Class-wise analysis indicated strong results for pituitary and non-tumor cases. This work shows that lightweight CNN-based feature extraction may serve as a practical direction for improving multi-class brain tumor MRI classification, with potential benefits for applications in resource-limited environments.
Analysis of Deep Learning Algorithms Using ConvNeXt and Vision Transformer for Brain Tumor Disease Ekayanda, Gilang; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11438

Abstract

This study aims to conduct a comparative analysis and identify the most effective deep learning architecture between ConvNeXt and Vision Transformer (ViT) for the automated classification of brain tumors from MRI imagery. Rapid and accurate brain tumor diagnosis is crucial; however, the manual interpretation of MRI scans is time-consuming and reliant on specialist expertise, creating an urgent need for reliable automation in brain tumor diagnosis. This research utilizes a dataset of 4,600 images, balanced between 2,513 'Brain Tumor' and 2,087 'Healthy' instances. A robust 5-Fold Cross-Validation methodology was employed to evaluate model performance, wherein the data was divided into five folds, each consisting of 920 images, ensuring every image served as both training and testing data. The quantitative results demonstrated high efficacy from both models, although ConvNeXt achieved a slight, consistent advantage. ConvNeXt obtained an accuracy of 99.13%, precision of 99.13%, recall of 99.13%, and an F1-Score of 99.13%. In comparison, the ViT model scored an accuracy of 98.13%, precision of 98.14%, recall of 98.13%, and an F1-Score of 98.13%. This quantitative superiority was validated through qualitative analysis using saliency maps, which confirmed that the models' computational attention was accurately focused on the anatomical locations of the actual tumor lesions.
Comparative Analysis of Random Forest, SVM, and Naive Bayes for Cardiovascular Disease Prediction Rayadhani, Windy Aldora; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11451

Abstract

Cardiovascular disease is one of the leading causes of death worldwide; therefore, accurate early detection is essential to reduce fatal risks. This study aims to compare the performance of three machine learning algorithms — Random Forest, Support Vector Machine (SVM), and Naïve Bayes — in predicting cardiovascular disease risk using the Mendeley Cardiovascular Disease Dataset, which contains 1,000 patient records and 14 clinical attributes. The models were evaluated using accuracy, precision, recall, and F1-score metrics, and their performance differences were statistically tested using the paired t-test. The experimental results indicate that the Random Forest algorithm achieved the best performance with 99% accuracy, 100% recall, 98% precision, and an F1-score of 99%. The SVM model followed with 98% accuracy and 100% recall, while the Naïve Bayes algorithm obtained 94.5% accuracy and an F1-score of 95%. The p-value < 0.05 confirmed that the performance differences among the three models were statistically significant. From a clinical perspective, a model with high recall, such as Random Forest, is more desirable because it reduces the likelihood of false negatives, which are critical in heart disease diagnosis. The feature importance analysis also revealed that age, resting blood pressure, and cholesterol level were the most influential factors in predicting cardiovascular risk. These findings suggest that machine learning algorithms, particularly Random Forest, have strong potential to be implemented in Clinical Decision Support Systems (CDSS) for accurate and efficient early detection of cardiovascular disease.
Analysis of Naive Bayes Algorithm for Lung Cancer Risk Prediction Based on Lifestyle Factors Vabilla, Sheila Anggun; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11463

Abstract

Lung cancer is one of the types of cancer with the highest mortality rate in the world, which is often difficult to detect in the early stages due to minimal symptoms. This study aims to build a lung cancer risk prediction model based on lifestyle factors using the Gaussian Naive Bayes algorithm. Data fit is addressed using the Synthetic Minority Over-sampling Technique (SMOTE), and feature selection is carried out using the Mutual Information. The dataset used consists of 1000 patient data with 24 features related to lifestyle and environmental factors. Model validation is carried out using 5-fold Stratified Cross Validation, and evaluated based on accuracy, precision, recall, and confusion matrices. The results show that the application of SMOTE successfully increases the model accuracy to 91.00% with high precision and recall values in all risk classes (Low, Medium, High). The features "Passive Smoker" and "Coughing up Blood" are identified as the most influential factors in the prediction. The results of this study indicate that the combination of Gaussian Naive Bayes with SMOTE and Mutual Information is able to produce an accurate prediction model.