Jurnal Nasional Teknik Elektro dan Teknologi Informasi
Vol 13 No 2: Mei 2024

Optimasi Algoritma K-Nearest Neighbors Berdasarkan Perbandingan Analisis Outlier (Berbasis Jarak, Kepadatan, LOF)

Fitri Ayuning Tyas (Program Studi Sistem Informasi, STMIK Muhammadiyah Paguyangan Brebes, Brebes, Jawa Tengah 52276, Indonesia)
Mahda Nurayuni (Program Studi Sistem Informasi, STMIK Muhammadiyah Paguyangan Brebes, Brebes, Jawa Tengah 52276, Indonesia)
Hidayatur Rakhmawati (Program Studi Sistem Informasi, STMIK Muhammadiyah Paguyangan Brebes, Brebes, Jawa Tengah 52276, Indonesia)



Article Info

Publish Date
31 May 2024

Abstract

The current data growth affects data analysis in various fields, such as astronomy, business, medicine, education, and finance. The collected and stored data contain extreme values or observation values different from most other observation value results. These extreme values are called outliers. Outliers on some data often hold valuable information, necessitating thorough examination to determine whether to retain or discard them prior to data mining application. Outlier detection can be performed as a part of data preprocessing using outlier analysis techniques. Commonly utilized outlier analysis techniques encompass distance-based methods, density-based methods, and the local outlier factor (LOF) method. k-nearest neighbors (KNN) are a data mining algorithm susceptible to outliers due to its reliance on the value of k. Hence, having an appropriate handling mechanism is essential when employing KNN on datasets that contain outliers. The experimental method was selected to apply the proposed approach, aiming to optimize the KNN algorithm through a comparison of outlier analysis methods (KNN-distance, KNN-density, and KNN-LOF). The results revealed that KNN-density outperformed the others significantly: achieving an average accuracy of 99.34% at k=3 and k=5 for Wisconsin Breast Cancer, 85.25% at k=7 for Glass, and 85.45% at k=5 for Lymphography. Moreover, both the Friedman and Nemenyi tests validate a notable distinction between KNN-density and KNN-LOF.

Copyrights © 2024






Journal Info

Abbrev

JNTETI

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy Engineering

Description

Topics cover the fields of (but not limited to): 1. Information Technology: Software Engineering, Knowledge and Data Mining, Multimedia Technologies, Mobile Computing, Parallel/Distributed Computing, Artificial Intelligence, Computer Graphics, Virtual Reality 2. Power Systems: Power Generation, ...