Claim Missing Document
Check
Articles

Comparative Analysis of XGBoost, KNN, and SVM Algorithms for Heart Disease Prediction Using SMOTE-Tomek Balancing Yuliana, Yuliana; Robet, Robet; Hoki, Leony
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15469

Abstract

Heart disease remains one of the leading causes of death worldwide, making early detection crucial for improving patient outcomes. This study aims to evaluate and compare the performance of several machine learning algorithms in detecting heart disease using the 2015 BRFSS dataset, which includes responses from 253,680 individuals. The three algorithms examined are Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The data preprocessing steps involved feature encoding, class imbalance handling using the Synthetic Minority Over-sampling Technique combined with Tomek Links (SMOTE-Tomek), and hyperparameter tuning through RandomizedSearchCV. The models were assessed on a hold-out validation set using several metrics, including accuracy, Receiver Operating Characteristic-Area Under the Curve (ROC-AUC), F1-score, precision, and recall. The results demonstrated that XGBoost achieved the highest performance, with an accuracy of 94%, a ROC-AUC score of 0.98, and an F1-score of 0.94. In comparison, KNN achieved an accuracy of 87% (ROC-AUC 0.95), while SVM attained an accuracy of 79% (ROC-AUC 0.86). These findings suggest that XGBoost is a robust model for large-scale heart disease classification and holds potential for implementation in clinical decision support systems.
IoT Sensor Data Analysis for Early Fire Detection Using Dynamic Threshold Br Tarigan, Widia; Robet, Robet; Tarigan, Feriani Astuti
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15478

Abstract

Early fire detection using Internet of Things (IoT) technology plays a vital role in minimizing potential material losses and casualties. Conventional systems generally still rely on static thresholds that are less adaptive to environmental dynamics, leading to high false alarm rates and delayed detection. This study proposes a dynamic threshold approach based on a hybrid method of Fuzzy Logic–Random Forest–Adaptive Z-Score and compares it with the static threshold method. Testing was conducted using publicly available secondary datasets, and the algorithms were implemented and tested in Jupyter Notebook. Evaluation was performed using accuracy, false alarm rate (FAR), detection time, F1-score, precision, and recall metrics. The test results show that the dynamic threshold method provides better performance with an increase in accuracy from 59.5% to 74.8%, a decrease in FAR from 31.1% to 14.3%, and a reduction in detection time from 21 seconds to 0 seconds. In addition, the F1-score increased from 0.459 to 0.638, precision from 0.473 to 0.716, and recall from 0.446 to 0.575. These results show that the dynamic threshold approach is more adaptive and reliable in IoT-based fire detection systems than conventional static threshold methods.
Comparison of XGBoost and Naive Bayes Models in Type 2 Diabetes Prediction with RFE Feature Selection Barus, Hanisa putri; Robet; Feriani Astuti Tarigan
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15509

Abstract

Type 2 diabetes mellitus is a chronic disease with an increasing prevalence rate that can cause serious complications if not detected early. The application of machine learning algorithms can aid prediction, but selecting the right model and features greatly determines the accuracy of the results. This study aims to compare the performance of the Extreme Gradient Boosting (XGBoost) and Naive Bayes algorithms in predicting type 2 diabetes with and without Recursive Feature Elimination (RFE) feature selection. The data used were from the UCI Machine Learning Repository, comprising 768 samples and eight clinical features. The research process included data preprocessing, dividing the data into 614 training data and 154 testing data, applying RFE to select the most influential features, model training, and evaluation using accuracy, precision, recall, F1-score, and AUC. The results show that Naive Bayes without RFE achieves 70.77% accuracy, 0.57377 precision, 0.648148 recall, F1-score 0.608696, and 0.772778 AUC, while Naive Bayes with RFE increases the accuracy to 74.02% and the AUC to 0.793333. Meanwhile, XGBoost with RFE provided the best results with an accuracy of 74.67%, precision of 0.653061, recall of 0.592593, F1-score of 0.621359, and the highest AUC of 0.804259. Besides, applying RFE also improves the computational efficiency. These findings indicate that applying RFE significantly improves classification and computation time performance. The practical implication is that this model could aid early detection of diabetes in clinical settings. Further research can be conducted by optimizing parameters and using more diverse datasets.
Comparative Analysis of Four Machine Learning Algorithms for Smoke Detection Using SMOTE-Rebalanced Sensor Data Liecero, Marcus; Robet, Robet; Hendrik, Jackri
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15546

Abstract

Smoke detection plays a critical role in preventing fire-related hazards, particularly in intelligent monitoring and early warning systems. Conventional smoke sensors often exhibit limited responsiveness in dynamic environmental conditions, prompting the adoption of IoT-based sensor data combined with machine learning techniques. This study presents a comparative evaluation of four supervised classification algorithms, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, and Gradient Boosting, using the Smoke Detection Dataset from Kaggle. The methodology integrates SMOTE to address class imbalance and Z-score normalization for feature standardization. Hyperparameter tuning was performed using GridSearchCV with 5-fold cross-validation, and model performance was assessed based on accuracy and execution time. Experimental results show that KNN achieved the highest accuracy (98.33%) with the lowest execution time (0.0327 s), whereas Decision Tree recorded the lowest accuracy (84.17%) but remained computationally fast (0.0406 s). Random Forest and Gradient Boosting demonstrated strong predictive capability (97.22% and 96.94%, respectively), but at higher computational costs (1.4338 s and 8.3819 s, respectively). Almost all models achieved perfect scores (1.00) for precision, recall, and F1-score following SMOTE-based balancing, except KNN which obtained slightly lower values (0.99). The findings indicate a trade-off between predictive performance and computational efficiency, suggesting that lightweight models such as KNN are better suited for real-time IoT-based smoke detection. In contrast, ensemble models may be more appropriate for backend analysis. This research contributes an integrated evaluation framework that combines data rebalancing, multi-model benchmarking, and time-based performance analysis, providing practical insights for the development of responsive and scalable early smoke detection systems.
Comparative Study of Baseline and CBAM-Enhanced ResNet50 and MobileNetV2 for Indonesian Rupiah Banknote Classification Alvin, Alvin; Robet, Robet; Feriani, Feriani Astuti Tarigan
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15558

Abstract

This study investigates the performance of Convolutional Neural Network (CNN) architectures enhanced with Convolutional Block Attention Module (CBAM) for Indonesian banknote classification. Although attention mechanisms have shown strong potential in improving fine-grained visual recognition, their effectiveness for the classification of banknotes with fine textures and similar color patterns remains underexplored, forming a key research gap addressed in this work. Four architectures, ResNet50, ResNet50+CBAM, MobileNetV2, and MobileNetV2+CBAM, were evaluated using K-Fold cross-validation on a dataset of 1,281 images representing seven banknote denominations. Experimental results show that ResNet50 achieves strong baseline performance with a weighted Train accuracy of 99.14% and a Val accuracy of 96.72%, while the integration of CBAM further improves feature discrimination, with ResNet50+CBAM obtaining the highest average accuracy across all folds with a weighted Train accuracy of 100% and a Val accuracy of 99.45%. MobileNetV2 showed lower performance due to its lightweight capacity with a Train accuracy of 91.88% and a decrease in Val accuracy of 85.71%. However, the addition of CBAM provided measurable improvements and greater stability with a Train accuracy of 99.61% and Val accuracy of 92.82%. Overall, CBAM improved CNN’s ability to focus on spatial information and salient channels, resulting in more reliable classification. ResNet50+CBAM emerged as the best-performing model, offering the best balance between accuracy and consistency. These findings support the development of reliable computer vision systems for financial technology applications, including automatic banknote recognition, counterfeit detection, and secure transaction verification.
Klasifikasi Multikelas Tingkat Diabetes Berdasarkan Indikator Kesehatan Pasien Menggunakan Strategi One-vs-Rest Panjaitan, Tabitha Martha Agustine; Robet; Octara Pribadi
Jurnal Sistem Komputer dan Informatika (JSON) Vol. 7 No. 2 (2025): Desember 2025
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/json.v7i2.8985

Abstract

Diabetes is a non-communicable disease with a steadily increasing global prevalence. It often remains undiagnosed in its early stages, particularly during the prediabetic phase, which typically lacks noticeable symptoms. This study aims to develop a multi-class classification model to predict diabetes severity levels non-diabetic, prediabetic, and diabetic based on patient health indicators. A One-vs-Rest (OvR) strategy was employed, training each class against a combination of the others. The dataset was derived from the 2015 National Health Survey, comprising over 250,000 patient records with features such as blood pressure, body mass index, cholesterol levels, history of heart disease, and physical activity. Two machine learning algorithms, Logistic Regression and Random Forest, were applied to train the models. Class imbalance was addressed using the Synthetic Minority Over-sampling Technique (SMOTE). Evaluation metrics included accuracy, precision, recall, F1-score, and confusion matrix. The results show that the Random Forest model achieved an average accuracy of 93% and consistently high F1-scores, particularly for the prediabetic class of 98%. The most influential predictors were high blood pressure, obesity, and insufficient physical activity. This study contributes to the development of a reliable and efficient data-driven system for early diabetes risk detection.
Predicting AI Job Salary Classes Through a Comparative Study of Machine Learning Algorithms Vincent, Vincent; Robet, Robet; Edi Wijaya
JURNAL RISET KOMPUTER (JURIKOM) Vol. 12 No. 6 (2025): Desember 2025
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/jurikom.v12i6.8979

Abstract

The rapid growth of Artificial Intelligence (AI) has brought significant transformation to the global job market, particularly in salary structures across various AI-related professions. This study aims to classify AI job salaries into three categories—Low, Medium, and High—using supervised machine learning algorithms. The dataset, sourced from Kaggle, combines two real-world datasets featuring key attributes such as experience level, job type, education level, technical skills, remote work ratio, and salary in USD. Preprocessing techniques include One-Hot Encoding for categorical data, StandardScaler for normalization, and MultiLabelBinarizer to handle multi-skill entries. Four machine learning models—Logistic Regression, Random Forest, Gradient Boosting, and XGBoost—were trained and evaluated using consistent pipelines, with evaluation metrics including accuracy, precision, recall, and F1-score, applying macro-averaging to address class imbalance. Logistic Regression achieved the highest performance with 85.4% accuracy and 77.6% F1-score, followed by Gradient Boosting with 84.8% accuracy and 76.3% F1-score. High-salary classes were predicted with higher precision and recall than low-salary classes, indicating skewness in class distribution. Feature importance analysis shows that experience, remote work ratio, and key skills such as Python and SQL significantly affect prediction accuracy. This study demonstrates that traditional machine learning methods, when applied with appropriate preprocessing, can effectively support salary classification and labor market analysis in the AI domain.
Analisis Komparatif Model Regresi Machine Learning untuk Prediksi Prestasi Akademik Siswa dengan Optimasi Hyperparameter Hose, Fernando; Robet, Robet; Hendri, Hendri
JURNAL RISET KOMPUTER (JURIKOM) Vol. 12 No. 6 (2025): Desember 2025
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/jurikom.v12i6.9240

Abstract

Low accuracy in the early identification of at-risk students often hinders timely academic intervention. This study analyzes and compares seven machine learning algorithms to predict student academic achievement, aiming to provide a foundation for a reliable early warning model. The dataset includes 2.392 students with 15 features covering demographics, learning behavior, and environmental support. Model training was performed using GridSearchCV optimization combined with stratified cross-validation to mitigate overfitting.Performance was evaluated using MAE, RMSE, and R². The results show CatBoost performed the best R² = 0,774; RMSE = 0,581; MAE = 0,306) followed by LightGBM (R² = 0,771) and Gradient Boosting (R² = 0,767), while MLP showed the lowest performance. Feature importance analysis placed GPA as the dominant predictor, followed by absenteeism and weekly study time. These findings affirm the superiority of boosting-based models in capturing complex nonlinear relationships and provide a practical framework for educational institutions to build data-driven early warning systems.
Comparative Performance of Machine Learning Algorithms for Detecting Online Gambling Promotional Comments on Youtube Michael Angelo; Robet; Hendrik, Jackri
Jurnal Teknologi dan Manajemen Informatika Vol. 11 No. 2 (2025): Desember 2025
Publisher : Universitas Merdeka Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26905/jtmi.v11i2.16286

Abstract

Online-gambling promoters increasingly exploit YouTube comment sections, using text obfuscation, Unicode characters, emojis, irregular spacing, and symbols to evade automated moderation. This study aims to identify the most effective machine-learning algorithm for detecting such promotional comments by comparing models on standard metrics (precision, recall, F1-score, accuracy). We employ semi-supervised pseudo-labelling to expand the labelled set from 1,648 to 9,111 comments without additional manual annotation, admitting only high-confidence predictions. The pipeline includes customised character normalization, selective cleaning, tokenization, stopword removal, and Nazief–Adriani stemming, followed by TF–IDF feature extraction. Four algorithms are evaluated: Multinomial Naive Bayes, Logistic Regression, Random Forest, and Support Vector Machine, with hyperparameter optimization and class balancing via SMOTE. On a 1,823-sample test set, all models achieve over 98% accuracy; SVM yields the most balanced performance, resulting in the highest F1-score for the promotion class (0.9908). Confusion matrices and learning curves indicate stable behavior without overfitting or underfitting. We therefore recommend SVM for operational deployment in automated moderation of gambling-promotion comments on YouTube. These findings provide practical guidance for platform safety teams and suggest methodological baselines for similar NLP moderation tasks. Future work should explore ensemble and deep learning approaches, incorporate character and subword-level features, and further evaluate robustness under adversarial obfuscation and domain shift.
Application of Bagging and Boosting Methods for Heart Disease Classification Parapak, Yehezkiel E.A; Robet, Robet; Hendrik, Jackri
Journal of Applied Computer Science and Technology Vol. 6 No. 2 (2025): Desember 2025
Publisher : Indonesian Society of Applied Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52158/we9asn06

Abstract

Cardiovascular disease remains a primary contributor to global mortality, underscoring the urgent need for accurate and early diagnostic tools. This study aims to develop a robust classification model for heart disease by conducting a comparative analysis of six ensemble machine learning algorithms, comprising three from the Bagging family (Random Forest, Bagged Decision Tree, Extra Trees) and three from the Boosting family (AdaBoost, Gradient Boosting, XGBoost). The research utilizes the publicly available UCI Cleveland Heart Disease dataset, which exhibits a mild class imbalance. To address this, the Synthetic Minority Over-sampling Technique (SMOTE) was strategically applied to the training data. The performance of each model was rigorously evaluated using accuracy, precision, recall, and F1-score. Experimental results revealed that the Extra Trees algorithm, when combined with SMOTE, achieved the highest overall performance with 90% accuracy, 96% precision, 82% recall, and an 88% F1-score. The primary contribution of this work lies in its comprehensive analysis demonstrating that the randomization strategy of Extra Trees provides a superior and more reliable framework for this classification task compared to other common ensemble techniques, particularly after data balancing. These findings confirm that an integrated approach of ensemble learning and proper data balancing can significantly enhance the development of fair and effective diagnostic tools to support medical professionals.