In data mining, clustering is an unsupervised learning technique often used to group data by similarity. Clustering, especially the K-means clustering algorithm, is a feasible tool for expanding a dataset label by increasing the cluster's number according to the label's categories. This research extends the credit loan label data set from two categories (non-performing and performing loans) to four risk levels (high risk, medium risk, low risk, and no risk). The combination of three K-nearest neighbor’s distance metrics, Euclidean, Manhattan, and Chebyshev distance, with four different K values (K = 3, K = 5, K = 7, and K = 9) produced the best model with accuracy, precision, and recall values of 90%, 90.53571%, and 90%, from the model using the Euclidean distance with K = 9
Copyrights © 2023