Claim Missing Document
Check
Articles

Found 14 Documents
Search

Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets Muhammad Misdram; Edi Noersasongko; Purwanto Purwanto; Muljono Muljono; Fandi Yulian Pamuji
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol 9, No 4 (2023): December
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v9i4.26881

Abstract

The problem of dataset imbalance needs special handling, because it often creates obstacles to the classification process. A very important problem in classification is to overcome a decrease in classification performance. There have been many published researches on the topic of overcoming dataset imbalances, but the results are still unsatisfactory. This is proven by the results of the average accuracy increase which is still not significant. There are several common methods that can be used to deal with dataset imbalances. For example, oversampling, undersampling, Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, Adasyn, Cluster-SMOTE methods. These methods in testing the results of the classification accuracy average are still relatively low. In this research the selected dataset is a medical dataset which is classified as a small dataset of less than 200 records. The proposed method is Gaussian Based-SMOTE which is expected to work in a normal distribution and can determine excess samples for minority classes. The Gaussian Based-SMOTE method is a contribution of this research and can produce better accuracy than the previous research. The way the Gaussian Based-SMOTE method works is to start by determining the random location of synthesis candidates, determining the Gaussian distribution. The results of these two methods are substituted to produce perfect synthetic values. Generated synthetic values are combined with SMOTE sampling of the majority data from the training data, produce balanced data. The result of the balanced data classification trial from the influence of the Gaussian Based SMOTE result in a significant increase in accuracy values of 3% on average.
Customer Segmentation with RFM Model using Fuzzy C-Means and Genetic Programming Anas Syaifudin; Purwanto Purwanto; Heribertus Himawan; M. Arief Soeleman
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 22 No. 2 (2023)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v22i2.2408

Abstract

One of the strategies a company uses to retain its customers is Customer Relationship Management (CRM). CRM manages interactions and supports business strategies to build mutually beneficial relationships between companies and customers. The utilization of information technology, such as data mining used to manage the data, is critical in order to be able to find out patterns made by customers when processing transactions. Clustering techniques are possible in data mining to find out the patterns generated from customer transaction data. Fuzzy C-Means (FCM) is one of the best-known and most widely used fuzzy grouping methods. The iteration process is carried out to determine which data is in the right cluster based on the objective function. The local minimum is the condition where the resulting value is not the lowest value from the solution set. This research aims to solve the minimum local problem in the FCM algorithm using Genetic Programming (GP), which is one of the evolution-based algorithms to produce better data clusters. The result of the research is to compare the application of fuzzy c-means (FCM) and genetic programming fuzzy c-means (GP-FCM) for customer segmentation applied to the Cahaya Estetika clinic dataset. The test results of the GP-FCM yielded an objective function of 20.3091, while for the FCM algorithm, it was 32.44741. Furthermore, evaluating cluster validity using Partition Coefficient (PC), Classification Entropy (CE), and Silhouette Index proves that the results of cluster quality from gp-fcm are more optimal than fcm. The results of this study indicate that the application of genetic programming in the fuzzy c-means algorithm produces more optimal cluster quality than the fuzzy c-means algorithm.
Optimizing Chronic Kidney Disease Diagnosis Using the C4.5 Algorithm and Missing Value Imputation Strategies Ahmad Riyanto; Purwanto Purwanto; Farrikh Al Zami; Ridodio Andreuw Meda
Jurnal Penelitian Pendidikan IPA Vol 11 No 9 (2025): September
Publisher : Postgraduate, University of Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29303/jppipa.v11i9.12456

Abstract

The occurrence of missing values in data mining is a significant challenge that can hinder the knowledge extraction process. Incomplete data not only reduces efficiency in data management and analysis, but also has the potential to bias decision-making. This study aims to improve the performance of the C4.5 algorithm in dealing with missing value problems through the application of imputation techniques and GridSearchCV optimization. In this study, we propose an approach to handling missing values by combining several imputation methods, including minimum, maximum, mean-mode, median, and k-Nearest Neighbors (k-NN). These methods are applied to the Chronic Kidney Disease dataset obtained from the UCI Repository. After the imputation process, we performed hyperparameter optimization using GridSearchCV to find the best parameter combination for the C4.5 algorithm. Experimental results show that the application of imputation techniques and GridSearchCV optimization significantly improves the classification accuracy of the C4.5 algorithm. The comparison results show that the application of missing value handling, combined with GridSearchCV optimization, successfully improves the accuracy of the model by 2.25% compared to without using missing values. This proves that handling missing values along with proper GridSearchCV optimization can improve the prediction quality of the model.
Optimizing Stacked KNN, Naive Bayes, and LDA Models Using Random Forest as a Meta-Learner for Diabetes Classification Ridodio Andreuw Meda; Purwanto Purwanto; Farrikh Al Zami; Ahmad Riyanto
Jurnal Penelitian Pendidikan IPA Vol 11 No 10 (2025): October
Publisher : Postgraduate, University of Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29303/jppipa.v11i10.12546

Abstract

Diabetes is one of the chronic diseases with a high mortality rate that requires proper treatment and early detection. This study proposes a stacking model approach with a combination of K-Nearest Neighbor (KNN), Naive Bayes, and Linear Discriminant Analysis (LDA) as the base-learner, and Random Forest as the meta-learner. The main objective of this study is to improve the classification accuracy of diabetes datasets that have an unbalanced class distribution. The experiment was conducted on the Pima Indians Diabetes dataset from the UCI Machine Learning Repository. The test results showed that the proposed stacking model was able to achieve an accuracy of 96.30%, True Positive Rate (TPR) of 88.89%, True Negative Rate (TNR) of 100%, and G-Mean of 94.28%. This performance is significantly better than the previous single classifier model and stacking approach. Thus, the proposed stacking model can be used as an effective solution in the classification of diabetic diseases under conditions of unbalanced class distribution.