Claim Missing Document
Check
Articles

Found 1 Documents
Search

Analysis of the Accuracy and Completeness of SINTA Author Data Extraction Muhammad Arfah Asis; Nia Kurniati; Muhammad Alfarid Jufda
G-Tech: Jurnal Teknologi Terapan Vol 10 No 1 (2026): G-Tech, Vol. 10 No. 1 January 2026
Publisher : Universitas Islam Raden Rahmat, Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.70609/g-tech.v10i1.8832

Abstract

The advancement of information technology has increased the use of web scraping for scientific data collection, including from the SINTA (Science and Technology Index) platform, which provides researcher profiles, affiliations, publications, and citation data. However, scraping SINTA poses challenges, particularly when multiple authors share identical scores that trigger changes in display order. This instability can lead to duplicated or missing entries when using a single-pass scraping approach. This study evaluates the accuracy and completeness of SINTA author data collection by implementing repeated scraping as a strategy to handle dynamic data ordering. Experiments were conducted on the Universitas Muslim Indonesia (UMI) affiliation, targeting 915 active authors. The methodology involved page-structure analysis, spider development using Python and Scrapy, sequential scraping through pagination, and validation of data completeness and uniqueness. A three-second delay between requests was applied to maintain responsible scraping practices. The results show that a single scraping attempt failed to retrieve all authors, capturing an average of only 877.2 authors (95.86%). Due to unstable ordering, repeated iterations were required. Through 4–8 scraping cycles per trial, all 915 authors were successfully collected without duplication. These findings indicate that for platforms with dynamic data structures such as SINTA, repeated scraping provides a more reliable method for ensuring data completeness and accuracy, supporting the development of stable and responsible publication-data automation systems.