Journal of Computing Theories and Applications
Vol. 3 No. 2 (2025): JCTA 3(2) 2025

Evaluating Open-Source Machine Learning Project Quality Using SMOTE-Enhanced and Explainable ML/DL Models

Hamza, Ali (Unknown)
Hussain, Wahid (Unknown)
Iftikhar, Hassan (Unknown)
Ahmad, Aziz (Unknown)
Shamim, Alamgir Md (Unknown)



Article Info

Publish Date
16 Nov 2025

Abstract

The rapid growth of open-source software (OSS) in machine learning (ML) has intensified the need for reliable, automated methods to assess project quality, particularly as OSS increasingly underpins critical applications in science, industry, and public infrastructure. This study evaluates the effectiveness of a diverse set of machine learning and deep learning (ML/DL) algorithms for classifying GitHub OSS ML projects as engineered or non-engineered using a SMOTE-enhanced and explainable modeling pipeline. The dataset used in this research includes both numerical and categorical attributes representing documentation, testing, architecture, community engagement, popularity, and repository activity. After handling missing values, standardizing numerical features, encoding categorical variables, and addressing the inherent class imbalance using the Synthetic Minority Oversampling Technique (SMOTE), seven different classifiers—K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), XGBoost (XGB), Logistic Regression (LR), Support Vector Machine (SVM), and a Deep Neural Network (DNN)—were trained and evaluated. Results show that LR (84%) and DNN (85%) outperform all other models, indicating that both linear and moderately deep non-linear architectures can effectively capture key quality indicators in OSS ML projects. Additional explainability analysis using SHAP reveals consistent feature importance across models, with documentation quality, unit testing practices, architectural clarity, and repository dynamics emerging as the strongest predictors. These findings demonstrate that automated, explainable ML/DL-based quality assessment is both feasible and effective, offering a practical pathway for improving OSS sustainability, guiding contributor decisions, and enhancing trust in ML-based systems that depend on open-source components.

Copyrights © 2025






Journal Info

Abbrev

jcta

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Journal of Computing Theories and Applications (JCTA) is a refereed, international journal that covers all aspects of foundations, theories and the practical applications of computer science. FREE OF CHARGE for submission and publication. All accepted articles will be published online and accessed ...