Jurnal Penelitian Pendidikan IPA (JPPIPA)
Vol 12 No 1 (2026)

Multimethodology Analysis of Determinants of Breast Cancer Diagnosis Machine Learning

Dita Anggriani Lubis (Universitas Satya Terra Bhinneka)
Yuli Irnawati (STIkes Bakti Utama Pati)
Ayu Trisni Pamilih (STIkes Bakti Utama Pati)
Ria Fazelita Br Gultom (Universitas Satya Terra Bhinneka)



Article Info

Publish Date
31 Jan 2026

Abstract

Breast cancer remains one of the most prevalent and life-threatening diseases worldwide, highlighting the urgent need for accurate and interpretable diagnostic models. While machine learning has shown promise in classification tasks, many existing models lack transparency and overlook the individual contribution of cellular features essential for clinical decision-making.This study proposes an integrative and explainable framework to identify the most influential cellular-level features in distinguishing between benign and malignant breast tumors. Using a publicly available dataset comprising 569 observations and 32 numerical features, we conducted a multi-step analysis. Feature relevance was first evaluated using Pearson correlation. Random Forest and Recursive Feature Elimination (RFE) were employed to rank and refine the feature subset, followed by Principal Component Analysis (PCA) for dimensionality reduction and pattern visualization. SHapley Additive exPlanations (SHAP) were utilized to interpret individual predictions. Complementary statistical tests, including t-tests and chi-square analyses, assessed associations between tumor characteristics and diagnosis. A logistic regression model was developed to evaluate predictive performance.Key cellular features—such as mean radius, texture, and concavity—were consistently identified as highly predictive of diagnosis. RFE demonstrated that fewer than 10 features were sufficient for optimal classification. The logistic regression model achieved high accuracy, offering a simpler yet effective alternative for prediction.By combining statistical methods with interpretable machine learning, this study presents a transparent and clinically relevant approach to breast cancer diagnosis. The integration of SHAP values bridges the gap between model performance and interpretability, supporting more informed and personalized clinical decisions. Future work should consider external validation, image-based features, and patient demographic variables to enhance generalizability.

Copyrights © 2026






Journal Info

Abbrev

jppipa

Publisher

Subject

Agriculture, Biological Sciences & Forestry Biochemistry, Genetics & Molecular Biology Chemical Engineering, Chemistry & Bioengineering Chemistry Education Materials Science & Nanotechnology Physics

Description

Science Educational Research Journal is international open access, published by Science Master Program of Science Education Graduate Program University of Mataram, contains scientific articles both in the form of research results and literature review that includes science, technology and teaching ...