Purpose – This study aims to evaluate the effectiveness of a hybrid machine learning approach for classifying robusta coffee (Coffea canephora) leaves into healthy and diseased categories, addressing challenges in manual field inspection and limited comparative analyses across classifiers. Design/methods/approach – A hybrid framework was implemented by combining MobileNetV2 as a feature extractor with four machine learning classifiers: Random Forest, K-Nearest Neighbor, Linear Support Vector Machine, and Gaussian Naive Bayes. The dataset comprised 1,560 images (791 healthy and 769 diseased), split into 70% training, 10% validation, and 20% testing using a hash-based grouped strategy to prevent data leakage from duplicate images. Model performance was evaluated using accuracy, F1-score, ROC-AUC, and McNemar’s statistical test. Findings – Gaussian Naive Bayes achieved the highest accuracy (93.89%) and F1-score (93.85%), while Random Forest obtained the highest ROC-AUC (96.94%). However, McNemar’s test showed no statistically significant differences among the models (p > 0.05), indicating comparable classification performance. The results demonstrate that lightweight hybrid approaches can achieve strong performance even with relatively small datasets. Research implications/limitations – The study is limited to binary classification and a relatively small dataset, which may restrict generalizability to more complex, multi-class disease scenarios. Further research with larger and more diverse datasets is recommended. Originality/value – This study provides a systematic comparison of multiple machine learning classifiers using a unified MobileNetV2 feature representation, offering practical insights into efficient and reliable approaches for early-stage coffee leaf disease screening in resource-constrained environments.
Copyrights © 2026