Garuda - Garba Rujukan Digital

Techno Nusa Mandiri : Journal of Computing and Information Technology

Vol. 22 No. 1 (2025): Techno Nusa Mandiri : Journal of Computing and Information Technology Period o

Fitriyanto, Rachmad (Unknown)
Mohamad Ardi (Unknown)

Publish Date
17 Mar 2025

In the era of big data, Knowledge Discovery in Databases (KDD) is vital for extracting insights from extensive datasets. This study investigates feature selection for clustering categorical data in an unsupervised learning context. Given that an insufficient number of features can impede the extraction of meaningful patterns, we evaluate two techniques—Chi-Square and Mutual Information—to refine a dataset derived from questionnaires on college library visitor characteristics. The original dataset, containing 24 items, was preprocessed and partitioned into five subsets: one via Chi-Square and four via Mutual Information using different dependency thresholds (a low-mid-high scheme and dynamic quartile thresholds: Q1toMax, Q2toMax, and Q3toMax). K-Means clustering was applied across nine variations of K (ranging from 2 to 10), with clustering performance assessed using the silhouette score and Davies-Bouldin Index (DBI). Results reveal that while the Mutual Information approach with a Q3toMax threshold achieves an optimal silhouette score at K=7, it retains only 4 features—insufficient for comprehensive analysis based on domain requirements. Conversely, the Chi-Square method retains 18 features and yields the best DBI at K=9, better capturing the intrinsic characteristics of the data. These findings underscore the importance of aligning feature selection techniques with both clustering quality and domain knowledge, and highlight the need for further research on optimal dependency threshold determination in Mutual Information.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Techno Nusa Mandiri : Journal of Computing and Information Technology

Website

Abbrev

techno

Publisher

Universitas Nusa Mandiri

Subject

Computer Science & IT

Description

Jurnal TECHNO Nusa Mandiri, merupakan Jurnal yang diterbitkan oleh Pusat Penelitian Pengabdian Masyarakat (PPPM) STMIK Nusa Mandiri Jakarta. Jurnal TECHNO Nusa Mandiri, berawal diperuntukan menampung paper-paper ilmiah yang dibuat oleh dosen-dosen program studi Teknik ...

Article Info

Abstract

FEATURE SELECTION COMPARATIVE PERFORMANCE FOR UNSUPERVISED LEARNING ON CATEGORICAL DATASET

Article Info

Abstract