Akbar, Ananda Ikhwana Khairur
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Lung Cancer Classification using the Naïve Bayes Method with SMOTE Akbar, Ananda Ikhwana Khairur; Astuti, Yani Parti
Sistemasi: Jurnal Sistem Informasi Vol 14, No 6 (2025): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v14i6.5607

Abstract

The primary challenges addressed in this study include delays in the early detection of lung cancer due to non-specific initial symptoms, the limitations of the Naïve Bayes algorithm in processing categorical data such as symptoms, gender, and smoking habits, as well as class imbalance issues in the dataset that can affect model accuracy. To overcome these challenges, the SMOTE (Synthetic Minority Over-sampling Technique) method was applied to improve classification performance. This study aims to implement the Naïve Bayes algorithm for lung cancer classification and compare its performance on imbalanced data versus data balanced using SMOTE. The methodology consists of data preprocessing, encoding, applying SMOTE for balancing, and classification using Naïve Bayes. Evaluation was performed using three data split ratios: 80:20, 70:30, and 60:40. The results show that applying SMOTE led to performance improvements, with the most significant gains observed at the 60:40 split ratio. In this case, model accuracy improved from 88.29% to 93.19%. For the “Yes” (positive) class, precision remained at 0.96, recall at 0.91, and F1-score at 0.93. However, for the “No” (negative) class, precision improved from 0.40 to 0.90, recall from 0.60 to 0.96, and F1-score from 0.48 to 0.93. Conversely, slight decreases in accuracy were observed for the 80:20 and 70:30 ratios after SMOTE application. These findings demonstrate that SMOTE significantly enhances model performance at the 60:40 ratio, not only in terms of accuracy but also in recall and F1-score, which are crucial for reducing false negatives in the minority (“Yes”) class. This is especially critical in early detection, as correctly identifying actual cancer cases is more important than merely maintaining overall accuracy. Although SMOTE did not always improve accuracy at other ratios, it still contributed to better cancer case detection. Therefore, its application should be considered carefully, balancing overall accuracy with clinically meaningful metrics.