Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : JURNAL MATEMATIKA STATISTIKA DAN KOMPUTASI

Pengaruh Penggunaan Random Undersampling, Oversampling, dan SMOTE terhadap Kinerja Model Prediksi Penyakit Cardiovascular (CVD) Uswatun Hasanah; Agus Mohamad Soleh; Kusman Sadik
Jurnal Matematika, Statistika dan Komputasi Vol. 21 No. 1 (2024): SEPTEMBER 2024
Publisher : Department of Mathematics, Hasanuddin University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20956/j.v21i1.35552

Abstract

Cardiovascular Disease (CVD) or commonly known as Heart Disease is a leading cause of mortality globally, prompting extensive research into predictive models to assess individual risk and plan preventive measures. Machine learning approaches such as Random Forest, Support Vector Machine (SVM), and LASSO Logistic Regression have showed promise. Recent studies have indicated that traditional resampling methods like Random Oversampling, Random Undersampling, and SMOTE may not significantly improve model discrimination. This study aims to evaluate the impact of these techniques on the performance of Cardiovascular Disease (CVD) prediction models, utilizing data from the UCI Machine Learning Heart Disease database. By employing LASSO Logistic Regression, Random Forest, and Support Vector Machine (SVM) with resampling techniques, including Random Oversampling, Random Undersampling, and SMOTE. This research seeks to enhance understanding of model performance in addressing class imbalances within the dataset and contribute to refining cardiovascular disease (CVD) prediction strategies. This study demonstrates that the use of the SMOTE technique significantly enhances the performance of cardiovascular disease (CVD) prediction models. Specifically, when combined with the Random Forest algorithm, SMOTE achieves the best performance in terms of accuracy, sensitivity, and specificity. This highlights the importance of selecting appropriate resampling techniques to handle class imbalance in datasets. Consequently, this research contributes to refining CVD prediction strategies and provides new insights into improving prediction accuracy in imbalanced medical data.
Metode Machine Learning-Based Univariate Time Series Imputation Method untuk Estimasi Nilai Hilang pada Data Non-Stasioner Dini Ramadhani; Agus Mohamad Soleh; Erfiani Erfiani
Jurnal Matematika, Statistika dan Komputasi Vol. 21 No. 1 (2024): SEPTEMBER 2024
Publisher : Department of Mathematics, Hasanuddin University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20956/j.v21i1.36468

Abstract

Handling missing values in time series data is crucial because they can disrupt data analysis and interpretation. Sequentially missing values in time series often pose a more complex challenge compared to randomly missing values. One of the promising recent methods is Machine Learning-Based Univariate Time Series Imputation (MLBUI), although it is still not widely used and its accessibility is limited. MLBUI employs Random Forest Regression (RFR) and Support Vector Regression (SVR) algorithms. This study evaluates the performance of MLBUI in addressing missing data scenarios in non-stationary univariate time series data. The data used in this research is the average temperature data from Bogor Regency. The missing data scenarios considered include rates of 6%, 10%, and 14%. Besides MLBUI, five other comparison methods are used: Kalman StructTS, Kalman Auto-ARIMA, Spline Interpolation, Stine Interpolation, and Moving Average. The results show that MLBUI performs poorly for non-stationary data, although the obtained Mean Absolute Percentage Error (MAPE) is below 10%.