Sinkron : Jurnal dan Penelitian Teknik Informatika
Vol. 10 No. 1 (2026): Article Research January 2026

Comparative Study on Machine Learning Algorithms for Code Smell Detection

U, Hayya (Unknown)
Saputri, Theresia Ratih Dewi (Unknown)



Article Info

Publish Date
03 Jan 2026

Abstract

Detecting code smells is crucial for maintaining software quality, but rule-based methods are often not very adaptive. On the other side, existing machine learning studies often lack large-scale comparisons on modern datasets. The goal of this research is to comprehensively compare the performance of various machine learning algorithms for multi-label code smells classification in terms of effectiveness and efficiency. The dataset used in this research is SmellyCode++, containing more than 100,000 samples. Seven models: Logistic Regression, Linear SVM, Naive Bayes, Random Forest, Extra Trees, XGBoost, and LightGBM combined with Binary Relevance were trained on data balanced using random undersampling and multi-label synthetic minority over-sampling. The performance of each model was evaluated using the F1-Macro, Hamming Loss, and Jaccard Score metrics. A non-parametric statistical analysis was also conducted to validate the findings. The experiment found that ensemble-based models statically significantly outperformed the linear and probabilistic models. The performance among the top ensemble models was found to be statistically equivalent. With this statistical equivalence in accuracy, computational efficiency measured with training time became the critical tiebreaker. BR_RandomForest, BR_XGBoost, and BR_ExtraTrees proved highly efficient, while BR_LightGBM was significantly slower. This study concludes that BR_RandomForest offers the best overall trade-off in providing top tier accuracy combined with excellent computational efficiency, making it a robust choice for practical applications.

Copyrights © 2026






Journal Info

Abbrev

sinkron

Publisher

Subject

Computer Science & IT

Description

Scope of SinkrOns Scientific Discussion 1. Machine Learning 2. Cryptography 3. Steganography 4. Digital Image Processing 5. Networking 6. Security 7. Algorithm and Programming 8. Computer Vision 9. Troubleshooting 10. Internet and E-Commerce 11. Artificial Intelligence 12. Data Mining 13. Artificial ...