Machine learning applications in healthcare are increasingly important for disease classification using categorical data. The Chi-square Automatic Interaction Detection (CHAID) method is widely used, but it often produces biased results, especially with small or imbalanced datasets. To overcome this limitation, the Improved CHAID (I-CHAID) was developed by integrating bias correction on Cramér’s V. Further performance gains on imbalanced data can be achieved by combining I-CHAID with the Random Oversampling Examples (ROSE) technique. This study aims to determine significant factors influencing heart disease and to evaluate the classification accuracy of the I-CHAID method with bias correction on Cramér’s V. The research was conducted in two stages: (1) balancing the dataset with ROSE and (2) constructing a classification tree of heart disease occurrences using I-CHAID with bias correction. The proposed I-CHAID model correctly classified 98 individuals with heart disease and 110 without heart disease out of 253 test cases. However, 30 cases were undetected (false negatives), and 15 were misclassified (false positives). Overall, the model achieved an accuracy of 84.60%, outperforming the standard CHAID method without bias correction, which reached only 71.15%. The I-CHAID method with Cramér’s V bias correction proved effective in identifying key factors associated with heart disease in Yogyakarta, including generational differences, smoking habits, and dietary patterns rich in fatty and savory foods. These findings highlight the potential of the proposed framework to support more reliable early risk identification and data-driven public health decision-making, particularly when dealing with imbalanced categorical health data.