Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : International Journal Of Computer, Network Security and Information System (IJCONSIST)

Multiclass Classification with Imbalanced Class and Missing Data Pratama, Irfan; Putri Taqwa Prasetyaningrum
IJCONSIST JOURNALS Vol 2 No 1 (2020): September
Publisher : International Journal of Computer, Network Security and Information System

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (481.493 KB) | DOI: 10.33005/ijconsist.v2i1.25

Abstract

In any data mining field, the presence of a good shaped data is needed. Yet in the reality, the data condition is far from the expectation as there are possible to have missing values, redundant data, and inconsistent data. There are problems with the dataset to begin with before we overcome the problem of data mining process interpretation. In the raw data level, possible problem such as missing values and data redundancy or inconsistency can be solved by some certain process called preprocessing. On the preprocessing step, the raw dataset is adjusted to the needs of the whole process, one of the adjustments is to handle missing values. Missing values is a certain condition where the expected values of the data are not recorded. The other problems that happen in the real-world dataset especially in categorical data with label or class is the imbalance distribution of the instance for each class. The imbalanced class is a condition where the distribution of the class is skewed or biased. This study emphasizing on the problem solving of missing values and imbalanced class on the dataset. K-NN imputation is a missing value handling method of this study. As for the imbalanced class problem, this study utilizes SMOTE and ADASYN for the comparison. While the dataset will further be tested by various classification methods such as Decision tree, Random Forest, and Stacking. The original dataset produced bad score from the classification process due to the imbalanced data. Then the data undergoing an oversampling process using SMOTE and ADASYN methods in hope that the accuracy will be hugely better. Yet the reality is the accuracy score do not move to the expected number at all with only averaging in 32%-37% of accuracy score in any scheme of process.
Implementation Of Machine Learning To Determine The Best Employees Using Random Forest Method Taqwa Prasetyaningrun, Putri; Pratama, Irfan; Yakobus Chandra, Albert
IJCONSIST JOURNALS Vol 2 No 02 (2021): March
Publisher : International Journal of Computer, Network Security and Information System

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (376.533 KB) | DOI: 10.33005/ijconsist.v2i02.43

Abstract

In the world of work the presence of the best employees becomes a benchmark of progress of the company itself. In the determination usually by looking at the performance of the employee e.g. from craft, discipline and also other achievements. The goal is to optimize in decision making to the best employees. Models obtained for employee predictions tested on real data sets provided by IBM analytics, which includes 29 features and about 22005 samples. In this paper we try to build system that predicts employee attribution based on A collection of employee data from kaggle website. We have used four different machines learning algorithms such as KNN (Neighbor K-Nearest), Naïve Bayes, Decision Tree, Random Forest plus two ensemble technique namely stacking and bagging. Results are expressed in terms of classic metrics and algorithms that produce the best result for the available data sets is the Random Forest classifier. It reveals the best withdrawals (0,88) as good as the stacking and bagging method with the same value