International Journal Of Computer, Network Security and Information System (IJCONSIST)
Vol 2 No 1 (2020): September

Multiclass Classification with Imbalanced Class and Missing Data

Pratama, Irfan (Unknown)
Putri Taqwa Prasetyaningrum (Unknown)



Article Info

Publish Date
30 Nov 2020

Abstract

In any data mining field, the presence of a good shaped data is needed. Yet in the reality, the data condition is far from the expectation as there are possible to have missing values, redundant data, and inconsistent data. There are problems with the dataset to begin with before we overcome the problem of data mining process interpretation. In the raw data level, possible problem such as missing values and data redundancy or inconsistency can be solved by some certain process called preprocessing. On the preprocessing step, the raw dataset is adjusted to the needs of the whole process, one of the adjustments is to handle missing values. Missing values is a certain condition where the expected values of the data are not recorded. The other problems that happen in the real-world dataset especially in categorical data with label or class is the imbalance distribution of the instance for each class. The imbalanced class is a condition where the distribution of the class is skewed or biased. This study emphasizing on the problem solving of missing values and imbalanced class on the dataset. K-NN imputation is a missing value handling method of this study. As for the imbalanced class problem, this study utilizes SMOTE and ADASYN for the comparison. While the dataset will further be tested by various classification methods such as Decision tree, Random Forest, and Stacking. The original dataset produced bad score from the classification process due to the imbalanced data. Then the data undergoing an oversampling process using SMOTE and ADASYN methods in hope that the accuracy will be hugely better. Yet the reality is the accuracy score do not move to the expected number at all with only averaging in 32%-37% of accuracy score in any scheme of process.

Copyrights © 2020






Journal Info

Abbrev

ijconsist

Publisher

Subject

Computer Science & IT

Description

Focus and Scope The Journal covers the whole spectrum of intelligent informatics, which includes, but is not limited to : • Artificial Immune Systems, Ant Colonies, and Swarm Intelligence • Autonomous Agents and Multi-Agent Systems • Bayesian Networks and Probabilistic Reasoning • ...