Tia Risky Yasmin Saketang
Ilmu Komputer, FMIPA, Universitas Negeri Medan

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Penerapan Random Forest untuk Klasifikasi Diagnosis Kanker Payudara Berbasis Dataset WBCD Naufal Aqiilah Asra; Maulana Al Nouri; Tia Risky Yasmin Saketang; Repi Meilani Putri
Jurnal Sistem Informasi Dan Informatika Vol 4 No 1 (2026): Januari 2026
Publisher : Prodi Sistem Informasi Universitas Dharma Andalas

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47233/jiska.v4i1.2633

Abstract

Breast cancer is one of the most critical global health challenges, with Indonesia recording 66,271 new cases in 2022 according to GLOBOCAN data published by the International Agency for Research on Cancer (IARC/WHO). Early and accurate detection is essential to improving patient survival rates, yet conventional diagnosis remains time-consuming and dependent on expert availability. This study implements the Random Forest algorithm to classify breast cancer diagnosis using the Wisconsin Breast Cancer Diagnostic (WBCD) dataset from the UCI Machine Learning Repository. The dataset consists of 569 samples with 30 numerical features extracted from fine-needle aspirate (FNA) cell images, labeled as benign or malignant. Data preprocessing involved removing non-predictive columns, converting categorical labels to binary format, handling outliers using IQR Clipping, and applying StandardScaler normalization. The dataset was split into 80% training and 20% testing using stratified splitting, with the Random Forest Classifier configured using 100 decision trees and class_weight=balanced to handle class imbalance. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics alongside confusion matrix analysis and 5-Fold Stratified Cross Validation. The model achieved 97.37% accuracy on the test set, with zero False Positive predictions, meaning no benign patient was misdiagnosed as malignant. Cross-validation confirmed generalization ability with a mean accuracy of 96.31%, indicating no overfitting. Feature importance analysis identified area_worst, concave points_worst, and perimeter_worst as the most dominant features, consistent with the clinical morphological characteristics of malignant cancer cells. These findings demonstrate the strong potential of Random Forest as a reliable and interpretable tool for supporting breast cancer diagnosis.