The rapid growth of publications in different fields, such as computer science, required well-structured datasets to support data-driven research. This paper presents an open large-scale dataset of computer science research papers published between 2020 and 2025, collected from Crossref metadata using the Crossref REST API. A structured keyword-based retrieval framework was developed to collect papers and their associated metadata. Preprocessing techniques, including cleaning, normalization, and validation were also made on the collected data. The introduced dataset has 4,313,328 research paper records which represents one of the largest structured collections of computer science publications for the specified period. The dataset provides comprehensive metadata fields that enable large-scale analysis, research trend identification, collaboration network exploration, and the recommendation systems development.
Copyrights © 2026