G-Tech : Jurnal Teknologi Terapan
Vol 10 No 1 (2026): G-Tech, Vol. 10 No. 1 January 2026

Analysis of the Accuracy and Completeness of SINTA Author Data Extraction

Muhammad Arfah Asis (Universitas Muslim Indonesia, Indonesia)
Nia Kurniati (Universitas Muslim Indonesia, Indonesia)
Muhammad Alfarid Jufda (Universitas Muslim Indonesia, Indonesia)



Article Info

Publish Date
16 Jan 2026

Abstract

The advancement of information technology has increased the use of web scraping for scientific data collection, including from the SINTA (Science and Technology Index) platform, which provides researcher profiles, affiliations, publications, and citation data. However, scraping SINTA poses challenges, particularly when multiple authors share identical scores that trigger changes in display order. This instability can lead to duplicated or missing entries when using a single-pass scraping approach. This study evaluates the accuracy and completeness of SINTA author data collection by implementing repeated scraping as a strategy to handle dynamic data ordering. Experiments were conducted on the Universitas Muslim Indonesia (UMI) affiliation, targeting 915 active authors. The methodology involved page-structure analysis, spider development using Python and Scrapy, sequential scraping through pagination, and validation of data completeness and uniqueness. A three-second delay between requests was applied to maintain responsible scraping practices. The results show that a single scraping attempt failed to retrieve all authors, capturing an average of only 877.2 authors (95.86%). Due to unstable ordering, repeated iterations were required. Through 4–8 scraping cycles per trial, all 915 authors were successfully collected without duplication. These findings indicate that for platforms with dynamic data structures such as SINTA, repeated scraping provides a more reliable method for ensuring data completeness and accuracy, supporting the development of stable and responsible publication-data automation systems.

Copyrights © 2026






Journal Info

Abbrev

g-tech

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Energy Engineering

Description

Jurnal G-Tech bertujuan untuk mempublikasikan hasil penelitian asli dan review hasil penelitian tentang teknologi dan terapan pada ruang lingkup keteknikan meliputi teknik mesin, teknik elektro, teknik informatika, sistem informasi, agroteknologi, ...