The rapid growth of the Python ecosystem has led to an increasing number of packages on the Python Package Index (PyPI), generating a massive volume of download data. This data can be utilized to analyze popularity levels and growth trends of libraries used by the developer community. This study aims to identify popularity patterns and growth trends of Python packages using the K-Means Clustering algorithm. The dataset was obtained from PyPI via the Google BigQuery platform with a one-year observation period using a 1% sampling technique. The pre-processing stage included a filtering process to select the 100 packages with the highest number of downloads and the formation of six main features representing the characteristics of library usage patterns. The data was then normalized using Standard Scaling, while the optimal number of clusters was determined using the Elbow Method and evaluated using the Davies-Bouldin Index (DBI) and Silhouette Score. The results showed that the optimal number of clusters is four, with a DBI value of 0.5534 and a Silhouette Score of 0.5748 (the highest among k = 2-10 ), representing the categories of ecosystem foundation libraries, medium-popularity libraries, libraries with concentrated download spikes, and libraries with very rapid usage growth. These results indicate that K-Means Clustering is effective for identifying popularity patterns and library growth trends in large-scale PyPI datasets.
Copyrights © 2026