Indonesian Journal of Electrical Engineering and Computer Science
Vol 16, No 2: November 2019

A comparative study on dimensionality reduction between principal component analysis and k-means clustering

Norsyela Muhammad Noor Mathivanan (Universiti Teknologi MARA)
Nor Azura Md.Ghani (Universiti Teknologi MARA)
Roziah Mohd Janor (Universiti Teknologi MARA)



Article Info

Publish Date
01 Nov 2019

Abstract

The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.

Copyrights © 2019