This research introduces a novel approach that combines hybrid grid partitioning and rough set theory for enhanced dataset clustering and interpretable rule generation in big data analysis. The proposed method addresses the challenges of scalability, high dimensionality, and interpretability, which are common in analyzing large and complex datasets. The hybrid approach leverages grid partitioning to efficiently handle large datasets by dividing them into manageable subsets. This enables parallel processing and reduces computational complexity. Additionally, rough set theory is incorporated to identify essential attributes that contribute to cluster formation, thereby reducing the dimensionality of the data and enhancing clustering accuracy. One of the key contributions of this research is the generation of interpretable rules based on the clustering results. By applying rough set-based attribute selection, the method identifies the crucial attributes that determine cluster assignments. These interpretable rules provide valuable insights into the relationships between attributes and clusters, aiding in understanding the underlying patterns in the data. A numerical example is provided to demonstrate the effectiveness of the proposed method. The results show improved clustering accuracy and the generation of clear and interpretable rules based on the dataset attributes. While the research presents significant advancements, it is important to consider the limitations, including potential challenges in generalizability, sensitivity to parameter settings, and computational complexity. Future research should focus on further validation and evaluation of the method on diverse datasets and comparisons with other state-of-the-art clustering algorithms. In conclusion, the hybrid grid partitioning and rough set method offer a promising solution for enhanced dataset clustering and interpretable rule generation in big data analysis. The research contributes to the advancement of data analytics methodologies and provides practical approaches for extracting knowledge from complex datasets, supporting decision-making processes, and enabling better understanding of underlying data patterns.
Copyrights © 2020