Garuda - Garba Rujukan Digital

Journal of Applied Data Sciences

Vol 7, No 2: May 2026

Efori Bu'ulolo (Universitas Sumatera Utara)
Poltak Sihombing (Universitas Sumatera Utara)
Sutarman Sutarman (Universitas Sumatera Utara)
Mohammad Andri Budiman (Universitas Sumatera Utara)

Publish Date
26 Apr 2026

High-dimensional and multidimensional cube data structures (K-Cube) are posing a significant challenge for conventional clustering algorithms due to the effect of dimensionality, uniform feature weight assumptions, and loss of hierarchical information. Therefore, this study aimed to propose K-Cube Consensus Clustering framework, which integrates Variance-Based Centroid Refinement, Weighted Distance Metrics, and consensus voting mechanism to overcome the challenges of high-dimensional cube data. The proposed method systematically clustered all dimensions and sub-dimensions of cube data, refined centroid by emphasizing more stable low-variance attributes, and applied adaptive distance weighting based on variance-derived feature weights integrated into the distance metric to improve cluster assignment. The final clusters were obtained through majority voting of the clustering results for each dimension. Unlike existing consensus clustering methods that operate on flat data representations or combine independent clustering results, the proposed framework explicitly exploits the hierarchical structure of multidimensional cube data by clustering dimensions and sub-dimensions prior to consensus integration. Moreover, variance-based centroid refinement and weighted distance metrics are jointly embedded within each cube dimension rather than applied as isolated enhancements. This hierarchy-aware design preserves cube semantics while simultaneously improving centroid stability and distance adaptivity, resulting in a distinct and scalable clustering framework for complex high-dimensional cube data. The framework processes cube dimensions independently with iterative convergence control, enabling scalable application to large-scale cube data. The results of synthetic and real-world high-dimensional datasets, including cube data with approximately 2.2 million instances, showed that the proposed method consistently outperformed K-Means, K-Medoids, and Hamiltonian formulations. The method produced lower SSE such as 3,179,328 on Arcene and 1,422.21 on Lung Cancer, higher Silhouette Score of approximately 0.5718 and 0.4905 for consensus results, better cluster stability of 0.9947, and faster convergence. These results confirmed the effectiveness of K-Cube Consensus Clustering in producing stable and meaningful clusters in large-scale high-dimensional data applications.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Applied Data Sciences

Website

Abbrev

JADS

Publisher

Bright Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...

Article Info

Abstract

K-Cube Consensus Clustering with Centroid Improvement and Variance-Based Metrics on High-Dimensional Data

Article Info

Abstract