Riyanto, Rifqi
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Unsupervised YouTube Video Segmentation of “Bendera One Piece” Content Using Medoid-Based Clustering with Statistical Significance Testing Budiaji, Weksi; Kumenap, Patricia; Delano, M Fabian; Wijaya, Ferdian; Riyanto, Rifqi
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2025 No. 1 (2025): Proceedings of 2025 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2025i1.639

Abstract

The curse of dimensionality and sparsity are well-documented phenomena in applied statistics where the data’s dimensionality (number of features) far outnumbers the observations. This work aims to present an integrated applied statistics framework to distill semantic structure from high-dimensional data by combining pre-processing, dimensionality reduction via principal component analysis, medoid-based clustering (partitioning around medoids and simple k medoids), and a modified Statistical Significance Clustering (SigClust) test for validation and inference in the context of viral media. In this case study, we demonstrate an approach that segments and interprets YouTube videos from the lens of the Indonesian viral media “Bendera One Piece” through its user commentary. The PCA-based dimensionality reduction helped resolve the curse of dimensionality, where the first principal component alone explained 80% of the variance in text-based features and captured a dominant socio-political pattern. Internal validation and the SigClust test agreed on the presence of a statistically significant three-cluster solution that could be labelled as the audiences of “Pop-Culture Enthusiasts”, “Cautious Observers”, and “Political Protesters”. The study presents the utility of integrating established statistical methods with a modified validation step for high-dimensional text data analysis and pattern recognition.