Garuda - Garba Rujukan Digital

Sistemasi: Jurnal Sistem Informasi

Vol 15, No 4 (2026): Sistemasi: Jurnal Sistem Informasi

Zaid Mundher (University of Mosul)
Manar Talat Ahmad (University of Mosul)

Publish Date
28 Apr 2026

The rapid growth of publications in different fields, such as computer science, required well-structured datasets to support data-driven research. This paper presents an open large-scale dataset of computer science research papers published between 2020 and 2025, collected from Crossref metadata using the Crossref REST API. A structured keyword-based retrieval framework was developed to collect papers and their associated metadata. Preprocessing techniques, including cleaning, normalization, and validation were also made on the collected data. The introduced dataset has 4,313,328 research paper records which represents one of the largest structured collections of computer science publications for the specified period. The dataset provides comprehensive metadata fields that enable large-scale analysis, research trend identification, collaboration network exploration, and the recommendation systems development.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Sistemasi: Jurnal Sistem Informasi

Website

Abbrev

stmsi

Publisher

Universitas Islam Indragiri

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

Sistemasi adalah nama terbitan jurnal ilmiah dalam bidang ilmu sains komputer program studi Sistem Informasi Universitas Islam Indragiri, Tembilahan Riau. Jurnal Sistemasi Terbit 3x setahun yaitu bulan Januari, Mei dan September,Focus dan Scope Umum dari Sistemasi yaitu Bidang Sistem Informasi, ...

Article Info

Abstract

A Large-Scale Open Dataset of Computer Science Research Papers (2020–2025)

Article Info

Abstract