International Journal of Informatics and Data Science
Vol. 2 No. 2 (2025): June 2025

Film Popularity Analysis through Combined K-Means Clustering and Gradient Boosted Trees

Agi Candra Bramantia (Unknown)
Desyanti (Unknown)
Jeperson Hutahaean (Unknown)
Erlin Windia Ambarsari (Unknown)



Article Info

Publish Date
30 Jun 2025

Abstract

The dynamic and competitive nature of the global film industry presents complex challenges in predicting film popularity, as success is shaped by the interplay of production investment, casting decisions, and audience preferences. This research addresses the limitations of previous studies that have focused primarily on direct relationships, such as budget versus box office returns, by introducing an integrated analytical framework that combines K-Means clustering and Gradient Boosted Trees (GBT) with explainable AI techniques. Utilizing the TMDB movie dataset and constructing features such as actor influence and studio power, the study segments films and predicts audience ratings while providing interpretable visualizations. The results reveal four distinct film clusters and demonstrate that actor influence and budget allocation are the most significant predictors of popularity. The proposed model achieves an R² score of 0.75 and a mean squared error of 0.35 in predicting audience ratings, while cluster analysis shows that Blockbuster films reach the highest average ratings (6.76), and Underperforming films the lowest (2.42). By integrating interpretable predictive modeling and interactive scenario tools, this research offers both theoretical advancement and practical value for industry stakeholders. However, the findings are limited by the available metadata and do not account for factors such as marketing or real-time audience trends, suggesting opportunities for future research to expand the analytical framework.

Copyrights © 2025






Journal Info

Abbrev

ijids

Publisher

Subject

Computer Science & IT

Description

International Journal of Informatics and Data Science publishes manuscripts of Computer Science, but is not limited to the fields of: 1. Natural Language Processing Pattern Classification, 2. Speech recognition and synthesis, 3. Robotic Intelligence, 4. Big Data, 5. Informatics Techniques, 6. Image ...