Claim Missing Document
Check
Articles

Found 1 Documents
Search

Multimethodology Analysis of Determinants of Breast Cancer Diagnosis Machine Learning Dita Anggriani Lubis; Yuli Irnawati; Ayu Trisni Pamilih; Ria Fazelita Br Gultom
Jurnal Penelitian Pendidikan IPA Vol 12 No 1 (2026)
Publisher : Postgraduate, University of Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29303/jppipa.v12i1.12497

Abstract

Breast cancer remains one of the most prevalent and life-threatening diseases worldwide, highlighting the urgent need for accurate and interpretable diagnostic models. While machine learning has shown promise in classification tasks, many existing models lack transparency and overlook the individual contribution of cellular features essential for clinical decision-making.This study proposes an integrative and explainable framework to identify the most influential cellular-level features in distinguishing between benign and malignant breast tumors. Using a publicly available dataset comprising 569 observations and 32 numerical features, we conducted a multi-step analysis. Feature relevance was first evaluated using Pearson correlation. Random Forest and Recursive Feature Elimination (RFE) were employed to rank and refine the feature subset, followed by Principal Component Analysis (PCA) for dimensionality reduction and pattern visualization. SHapley Additive exPlanations (SHAP) were utilized to interpret individual predictions. Complementary statistical tests, including t-tests and chi-square analyses, assessed associations between tumor characteristics and diagnosis. A logistic regression model was developed to evaluate predictive performance.Key cellular features—such as mean radius, texture, and concavity—were consistently identified as highly predictive of diagnosis. RFE demonstrated that fewer than 10 features were sufficient for optimal classification. The logistic regression model achieved high accuracy, offering a simpler yet effective alternative for prediction.By combining statistical methods with interpretable machine learning, this study presents a transparent and clinically relevant approach to breast cancer diagnosis. The integration of SHAP values bridges the gap between model performance and interpretability, supporting more informed and personalized clinical decisions. Future work should consider external validation, image-based features, and patient demographic variables to enhance generalizability.