Bulletin of Electrical Engineering and Informatics
Vol 7, No 1: March 2018

A Modified Overlapping Partitioning Clustering Algorithm for Categorical Data Clustering

Mohammad Alaqtash (The British University in Dubai)
Moayad A.Fadhil (Philadelphia University)
Ali F. Al-Azzawi (Philadelphia University)



Article Info

Publish Date
01 Mar 2018

Abstract

Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data by partitioning data into clusters with similar patterns. Over the past decades, many clustering algorithms have been developed for various clustering problems. An overlapping partitioning clustering (OPC) algorithm can only handle numerical data. Hence, novel clustering algorithms have been studied extensively to overcome this issue. By increasing the number of objects belonging to one cluster and distance between cluster centers, the study aimed to cluster the textual data type without losing the main functions. The proposed study herein included over twenty newsgroup dataset, which consisted of approximately 20000 textual documents. By introducing some modifications to the traditional algorithm, an acceptable level of homogeneity and completeness of clusters were generated. Modifications were performed on the pre-processing phase and data representation, along with the number methods which influence the primary function of the algorithm. Subsequently, the results were evaluated and compared with the k-means algorithm of the training and test datasets. The results indicated that the modified algorithm could successfully handle the categorical data and produce satisfactory clusters.

Copyrights © 2018






Journal Info

Abbrev

EEI

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Engineering

Description

Bulletin of Electrical Engineering and Informatics ISSN: 2302-9285 is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication, computer engineering, computer science, information technology and informatics from the global ...