Nurshazwani Muhamad Mahfuz
Universiti Teknologi MARA

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Review of single clustering methods Nurshazwani Muhamad Mahfuz; Marina Yusoff; Zakiah Ahmad
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 8, No 3: September 2019
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (435.105 KB) | DOI: 10.11591/ijai.v8.i3.pp221-227

Abstract

Clustering provides a prime important role as an unsupervised learning method in data analytics to assist many real-world problems such as image segmentation, object recognition or information retrieval. It is often an issue of difficulty for traditional clustering technique due to non-optimal result exist because of the presence of outliers and noise data. This review paper provides a review of single clustering methods that were applied in various domains. The aim is to see the potential suitable applications and aspect of improvement of the methods. Three categories of single clustering methods were suggested, and it would be beneficial to the researcher to see the clustering aspects as well as to determine the requirement for clustering method for an employment based on the state of the art of the previous research findings.
Clustering heterogeneous categorical data using enhanced mini batch K-means with entropy distance measure Nurshazwani Muhamad Mahfuz; Marina Yusoff; Zainura Idrus
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 1: February 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i1.pp1048-1059

Abstract

Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.