Lawrence Supriyono
Universitas Jakarta Internasional

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

EVALUATING LOGISTIC REGRESSION, SVM, KNN, AND ENSEMBLE MODELS FOR ACCURATE HEART DISEASE RISK PREDICTION Amalia Shifa Aldila; Lawrence Supriyono
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 11 No. 3 (2026): JITK Issue February 2026
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v11i3.6738

Abstract

Cardiovascular disease remains the most significant contributor to global mortality, highlighting the importance of early and precise risk assessment within preventive healthcare frameworks. Alongside the rapid growth of clinical data availability, machine learning approaches have increasingly been adopted to assist medical decision-making, particularly for interpreting complex and high-dimensional health information. This research investigates the predictive capability of six supervised machine learning models in determining the likelihood of cardiovascular disease incidence: Logistic Regression, Support Vector Machine, k-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosting. The Cleveland Heart Disease dataset from the UCI Machine Learning Repository served as the study's foundation. It includes 303 patient samples with a total of 76 recorded attributes. From this dataset, 14 clinically significant variables frequently reported in previous studies were selected for analysis. Considering the relatively small dataset size and the possibility of redundant or low-impact features, a feature selection approach was implemented to improve model robustness, minimize overfitting, and enhance interpretability. The data preparation process involved cleaning, normalization, feature selection, and division into datasets for testing and training. Metrics like accuracy, precision, recall, and F1-score were used to evaluate the model. The results of the experiment show that Random Forest and Logistic Regression models produced the highest predictive performance, followed by k-Nearest Neighbours and Support Vector Machine. These results indicate that supervised machine learning techniques, when supported by appropriate feature selection methods, are effective as decision-support tools for the early detection of cardiovascular disease.
Regression-Based Prediction of Benzene Concentration Using PT08.S1 and PT08.S2 Gas Sensors Setyo Hartono; Ida Ernawati; Lawrence Supriyono
JUKI : Jurnal Komputer dan Informatika Vol. 8 No. 1 (2026): JUKI : Jurnal Komputer dan Informatika, Edisi Mei 2026
Publisher : Yayasan Kita Menulis

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.53842/juki.v8i1.2414

Abstract

Air pollution, particularly benzene (C6H6), is a serious urban environmental issue with significant public health impacts. Benzene is a carcinogenic compound originating from motor vehicle emissions and industrial processes. This study aims to develop a prediction model for benzene concentration using PT08.S1 (CO) and PT08.S2 (NMHC) gas sensor data along with meteorological factors (temperature, relative humidity, absolute humidity). Data was obtained from the UCI Machine Learning Repository, totaling 9,357 samples collected from five metal oxide sensors in an urban area. Preprocessing was performed by removing -200 values representing missing data, resulting in 8,779 valid samples. The methods employed are Multiple Linear Regression and Random Forest Regressor. Evaluation results show that Random Forest outperforms with MAE of 0.0155, RMSE of 0.1311, and R² of 0.9997, while Linear Regression yields MAE of 0.9966, RMSE of 1.3864, and R² of 0.9666. Feature importance analysis reveals that absolute humidity (AH) is the most dominant predictor with a weight of 0.9049, followed by PT08.S2(NMHC) with 0.0276. This study demonstrates that gas sensor data can be reliably used for benzene estimation and Random Forest is more accurate than linear regression due to its ability to capture non-linear relationships among variables.