Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : International Journal of Informatics and Data Science

Clustering of YouTube Viewer Data Based on Preferences using Leiden Algorithm Erlin Windia Ambarsari; Aulia Paramita; Desyanti
International Journal of Informatics and Data Science Vol. 1 No. 2 (2024): June 2024
Publisher : ADA Research Center

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64366/ijids.v1i2.45

Abstract

This study aims to analyze YouTube viewer engagement patterns by applying the Leiden algorithm for clustering based on user interactions such as likes, dislikes, and subscription behaviors in correlation with video duration. Therefore, the method that we used begins with data cleaning to ensure completeness, followed by selecting relevant features and applying z-score normalization to equalize their contributions. A similarity graph is constructed using cosine similarity, representing instances as nodes and their relationships as edges. The Leiden algorithm is then applied to optimize modularity and extract clusters, with results integrated into the original dataset for analysis. Dimensionality reduction using PCA facilitates cluster visualization, while statistical summaries and distribution plots provide deeper insights into cluster characteristics. Subsequently, we obtained a dataset sourced from the YouTube content creator @ArmanVesona, which includes 237 instances with ten features: Shares, Comments Added, Dislikes, Likes, Subscribers Lost, Subscribers Gained, Views, Watch Time (hours), Impressions, and Click-Through Rate (%). The analysis reveals two distinct clusters: Cluster 0, characterized by lower engagement and stable audience, and Cluster 1, exhibiting higher engagement but higher subscriber churn. The findings highlight the effectiveness of the Leiden algorithm in detecting well-connected communities and provide insights into viewer behavior, aiding in the development of improved content strategies and targeted marketing approaches.
Film Popularity Analysis through Combined K-Means Clustering and Gradient Boosted Trees Agi Candra Bramantia; Desyanti; Jeperson Hutahaean; Erlin Windia Ambarsari
International Journal of Informatics and Data Science Vol. 2 No. 2 (2025): June 2025
Publisher : ADA Research Center

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64366/ijids.v2i2.81

Abstract

The dynamic and competitive nature of the global film industry presents complex challenges in predicting film popularity, as success is shaped by the interplay of production investment, casting decisions, and audience preferences. This research addresses the limitations of previous studies that have focused primarily on direct relationships, such as budget versus box office returns, by introducing an integrated analytical framework that combines K-Means clustering and Gradient Boosted Trees (GBT) with explainable AI techniques. Utilizing the TMDB movie dataset and constructing features such as actor influence and studio power, the study segments films and predicts audience ratings while providing interpretable visualizations. The results reveal four distinct film clusters and demonstrate that actor influence and budget allocation are the most significant predictors of popularity. The proposed model achieves an R² score of 0.75 and a mean squared error of 0.35 in predicting audience ratings, while cluster analysis shows that Blockbuster films reach the highest average ratings (6.76), and Underperforming films the lowest (2.42). By integrating interpretable predictive modeling and interactive scenario tools, this research offers both theoretical advancement and practical value for industry stakeholders. However, the findings are limited by the available metadata and do not account for factors such as marketing or real-time audience trends, suggesting opportunities for future research to expand the analytical framework.