In any data mining field, the presence of a good shaped data is needed. Yet in the reality, the data condition is far from the expectation as there are possible to have missing values, redundant data, and inconsistent data. There are problems with the dataset to begin with before we overcome the problem of data mining process interpretation. In the raw data level, possible problem such as missing values and data redundancy or inconsistency can be solved by some certain process called preprocessing. On the preprocessing step, the raw dataset is adjusted to the needs of the whole process, one of the adjustments is to handle missing values. Missing values is a certain condition where the expected values of the data are not recorded. The other problems that happen in the real-world dataset especially in categorical data with label or class is the imbalance distribution of the instance for each class. The imbalanced class is a condition where the distribution of the class is skewed or biased. This study emphasizing on the problem solving of missing values and imbalanced class on the dataset. K-NN imputation is a missing value handling method of this study. As for the imbalanced class problem, this study utilizes SMOTE and ADASYN for the comparison. While the dataset will further be tested by various classification methods such as Decision tree, Random Forest, and Stacking. The original dataset produced bad score from the classification process due to the imbalanced data. Then the data undergoing an oversampling process using SMOTE and ADASYN methods in hope that the accuracy will be hugely better. Yet the reality is the accuracy score do not move to the expected number at all with only averaging in 32%-37% of accuracy score in any scheme of process.
                        
                        
                        
                        
                            
                                Copyrights © 2020