Jurnal Sistem Informasi dan Informatika
Vol 4 No 1 (2026): Januari 2026

Penerapan Random Forest untuk Klasifikasi Diagnosis Kanker Payudara Berbasis Dataset WBCD

Naufal Aqiilah Asra (Ilmu Komputer, FMIPA, Universitas Negeri Medan)
Maulana Al Nouri (Ilmu Komputer, FMIPA, Universitas Negeri Medan)
Tia Risky Yasmin Saketang (Ilmu Komputer, FMIPA, Universitas Negeri Medan)
Repi Meilani Putri (Ilmu Komputer, FMIPA, Universitas Negeri Medan)



Article Info

Publish Date
12 Apr 2026

Abstract

Breast cancer is one of the most critical global health challenges, with Indonesia recording 66,271 new cases in 2022 according to GLOBOCAN data published by the International Agency for Research on Cancer (IARC/WHO). Early and accurate detection is essential to improving patient survival rates, yet conventional diagnosis remains time-consuming and dependent on expert availability. This study implements the Random Forest algorithm to classify breast cancer diagnosis using the Wisconsin Breast Cancer Diagnostic (WBCD) dataset from the UCI Machine Learning Repository. The dataset consists of 569 samples with 30 numerical features extracted from fine-needle aspirate (FNA) cell images, labeled as benign or malignant. Data preprocessing involved removing non-predictive columns, converting categorical labels to binary format, handling outliers using IQR Clipping, and applying StandardScaler normalization. The dataset was split into 80% training and 20% testing using stratified splitting, with the Random Forest Classifier configured using 100 decision trees and class_weight=balanced to handle class imbalance. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics alongside confusion matrix analysis and 5-Fold Stratified Cross Validation. The model achieved 97.37% accuracy on the test set, with zero False Positive predictions, meaning no benign patient was misdiagnosed as malignant. Cross-validation confirmed generalization ability with a mean accuracy of 96.31%, indicating no overfitting. Feature importance analysis identified area_worst, concave points_worst, and perimeter_worst as the most dominant features, consistent with the clinical morphological characteristics of malignant cancer cells. These findings demonstrate the strong potential of Random Forest as a reliable and interpretable tool for supporting breast cancer diagnosis.

Copyrights © 2026






Journal Info

Abbrev

jiska

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Materials Science & Nanotechnology

Description

Jurnal Sistem Informasi dan Informatika (JISKA) merupakan jurnal yang diterbitkan oleh Program Studi Sistem Informasi Universitas Dharma Andalas dengan nomor E-ISSN : 2985-9735. Jurnal JISKA Volume 1 Nomor 1 terbit pada bulan Januari 2023 dan dapat diterbitkan tepat waktu. Jurnal JISKA direncanakan ...