International Journal of Electrical and Computer Engineering
Vol 12, No 5: October 2022

Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents

Rosmayati Mohemad (Universiti Malaysia Terengganu)
Nazratul Naziah Mohd Muhait (Universiti Malaysia Terengganu)
Noor Maizura Mohamad Noor (Universiti Malaysia Terengganu)
Zulaiha Ali Othman (Universiti Kebangsaan Malaysia)



Article Info

Publish Date
01 Oct 2022

Abstract

Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method.

Copyrights © 2022






Journal Info

Abbrev

IJECE

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

International Journal of Electrical and Computer Engineering (IJECE, ISSN: 2088-8708, a SCOPUS indexed Journal, SNIP: 1.001; SJR: 0.296; CiteScore: 0.99; SJR & CiteScore Q2 on both of the Electrical & Electronics Engineering, and Computer Science) is the official publication of the Institute of ...