Abstract The massive collection of digital artifacts in the Internet Archive and Wayback Machine represents a historical encyclopedia of modern civilization. However, the sheer volume of unstructured data poses challenges in extracting meaningful information, demanding advanced computational analytic approaches. This study aims to demonstrate the architectural evaluation of digital heritage stacks using a comprehensive Big Data 5V framework (Volume, Velocity, Variety, Veracity, Value), designed to map the dynamic trends of web topic evolution over three decades (1996–2026). The methodology relies on 3,000 metadata corpora extracted using K-Means clustering (K=10) with Term Frequency-Inverse Document Frequency (TF-IDF) matrix weighting for text grouping, followed by Apriori association rules
Copyrights © 2026