UNP Journal of Statistics and Data Science
Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science

Comparison of Error Rate Prediction Methods in Binary Logistic Regression Modeling for Imbalanced Data

Bahri Annur Sinaga (Unknown)
Dodi Vionanda (Unknown)
Dony Permana (Unknown)
Admi Salma (Unknown)



Article Info

Publish Date
28 Aug 2023

Abstract

Binary logistic regression is a regression analysis used in classification modeling. The performance of binary logistic regression can be seen from the accuracy of the model formed. Accuracy can be measured by predicting the error rate. One method of predicting the error rate that is often used is cross-validation. There are three algorithms in cross-validation: leave one out, hold out, and k-fold. Leave one out is a method that divides data based on the number of observations so that each observation has the opportunity to become testing data but requires a long time in the analysis process when the number of observations is large. Hold out is the simplest algorithm that only divides the data into two parts randomly, so there is a possibility that important data does not become training data. K-fold is an algorithm that divides data into several groups, but k-fold is not suitable for data that has a small number of observations. In reality, real data is often imbalanced. In logistic regression,when the data is increasingly imbalanced, the prediction results will approach the number of minority classes. This research focuses on the comparison of error rate prediction methods in binary logistic regression modeling with imbalanced data. This study uses three types of data, namely univariate, bivariate, and multivariate, which are generated by differences in population mean and correlation between independent variables.The results obtained show that the k-fold algorithm is the most suitable error rate prediction algorithm applied to binary logistic regression.

Copyrights © 2023






Journal Info

Abbrev

ujsds

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Mathematics Social Sciences

Description

UNP Journal of Statistics and Data Science is an open access journal (e-journal) launched in 2022 by Department of Statistics, Faculty of Science and Mathematics, Universitas Negeri Padang. UJSDS publishes scientific articles on various aspects related to Statistics, Data Science, and its ...