Heart disease is one of the leading causes of death worldwide, making early detection crucial to reduce the risk of complications and mortality. The advancement of machine learning technology enables fast and accurate analysis of medical data to support the diagnostic process. This study aims to develop a classification model for heart disease risk using the Random Forest algorithm. The dataset used is the Heart Disease Dataset from Kaggle, consisting of 1,025 patient records with 14 medical attributes, such as age, gender, blood pressure, cholesterol level, and maximum heart rate. The methodology applied is CRISP-DM, which includes Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. Model Evaluation is conducted using a confusion matrix, cross-validation, and ROC-AUC. The results show that the Random Forest algorithm achieves a high Accuracy of 99.96% and a cross-validation score of 0.996. The variables chest pain, ca, and thalach are identified as the most influential factors in the prediction.Keywords: Heart Disease; Random Forest; Machine learning; Classification; CRISP-DMÂ AbstrakPenyakit jantung merupakan salah satu penyebab utama kematian di dunia sehingga deteksi dini sangat penting untuk mengurangi risiko komplikasi dan kematian. Perkembangan teknologi machine learning memungkinkan analisis data medis secara cepat dan akurat dalam membantu proses diagnosis. Penelitian ini bertujuan membangun model klasifikasi risiko penyakit jantung menggunakan Algoritma Random Forest. Dataset yang digunakan adalah Heart Disease Dataset dari Kaggle yang terdiri dari 1025 data pasien dengan 14 atribut medis, seperti usia, jenis kelamin, tekanan darah, kadar kolesterol, dan detak jantung maksimum. Metode yang digunakan adalah CRISP-DM meliputi Data Understanding, Data Preparation, Modeling, Evaluation, dan Deployment. Evaluasi model dilakukan menggunakan confusion matrix, cross validation, dan ROC-AUC. Hasil penelitian menunjukkan bahwa Random Forest menghasilkan akurasi tinggi dengan nilai 99,96% serta cross validation sebesar 0,996. Variabel chest pain, ca, dan thalach menjadi faktor paling berpengaruh dalam prediksi.Kata kunci: Penyakit jantung; Random Forest; Machine learning; Klasifikasi; CRISP-DM.
Copyrights © 2026