The primary challenges addressed in this study include delays in the early detection of lung cancer due to non-specific initial symptoms, the limitations of the Naïve Bayes algorithm in processing categorical data such as symptoms, gender, and smoking habits, as well as class imbalance issues in the dataset that can affect model accuracy. To overcome these challenges, the SMOTE (Synthetic Minority Over-sampling Technique) method was applied to improve classification performance. This study aims to implement the Naïve Bayes algorithm for lung cancer classification and compare its performance on imbalanced data versus data balanced using SMOTE. The methodology consists of data preprocessing, encoding, applying SMOTE for balancing, and classification using Naïve Bayes. Evaluation was performed using three data split ratios: 80:20, 70:30, and 60:40. The results show that applying SMOTE led to performance improvements, with the most significant gains observed at the 60:40 split ratio. In this case, model accuracy improved from 88.29% to 93.19%. For the “Yes” (positive) class, precision remained at 0.96, recall at 0.91, and F1-score at 0.93. However, for the “No” (negative) class, precision improved from 0.40 to 0.90, recall from 0.60 to 0.96, and F1-score from 0.48 to 0.93. Conversely, slight decreases in accuracy were observed for the 80:20 and 70:30 ratios after SMOTE application. These findings demonstrate that SMOTE significantly enhances model performance at the 60:40 ratio, not only in terms of accuracy but also in recall and F1-score, which are crucial for reducing false negatives in the minority (“Yes”) class. This is especially critical in early detection, as correctly identifying actual cancer cases is more important than merely maintaining overall accuracy. Although SMOTE did not always improve accuracy at other ratios, it still contributed to better cancer case detection. Therefore, its application should be considered carefully, balancing overall accuracy with clinically meaningful metrics.