Jurnal Rekayasa Sistem Industri
Vol. 12 No. 1 (2023): Jurnal Rekayasa Sistem Industri

Oversampling Sintetis Berbasis Kopula untuk Model Klasifikasi dengan Data yang Tidak Seimbang

Fransiscus Rian Pratikto (Universitas Katolik Parahyangan, Indonesia)



Article Info

Publish Date
23 Apr 2023

Abstract

A machine learning classification model for detecting abnormality is usually developed using imbalanced data where the number of abnormal instances is significantly smaller than the normal ones. Since the data is imbalanced, the learning process is dominated by normal instances, and the resulting model may be biased. The most common method for coping with this problem is synthetic oversampling. Most synthetic oversampling techniques are distance-based, usually based on the k-Nearest Neighbor method. Patterns in data can be based on distance or correlation. This research proposes a synthetic oversampling technique that is based on correlations in the form of the joint probability distribution of the data. The joint probability distribution is represented using a Gaussian copula, while the marginal distribution uses three alternatives distribution: the Pearson distribution system, empirical distribution, and the Metalog distribution system. This proposed technique is compared with several commonly used synthetic oversampling techniques in a case study of credit card default prediction. The classification model uses the k-Nearest Neighbor and is validated using the k-fold cross-validation. We found that the classification model using the proposed oversampling method with the Metalog marginal distribution has the greatest total accuracy.  

Copyrights © 2023






Journal Info

Abbrev

jrsi

Publisher

Subject

Industrial & Manufacturing Engineering

Description

Data and Analytics Decision Analysis E-Business and E-Commerce Engineering Economy and Cost Analysis Human Factors Information Systems Intelligent Systems Manufacturing Systems Operations Research Production Planning and Control Project Management Quality Control and Management Reliability and ...