Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI)
Vol. 14 No. 2 (2025)

Multi-Source Data Fusion For Data Extraction and Integration of Scientific Publications in Academic Institution STIS

Maulidya, Luthfi (Unknown)
Suadaa, Lya Hulliyyatus (Unknown)
Wijayanto, Arie Wahyu (Unknown)
Ridho, Farid (Unknown)



Article Info

Publish Date
16 Jul 2025

Abstract

Scientific research publication data is one of the most important data required by academic and research institution because it can be used as a reference to measure the performance of lecturers in research activities, to assess study programs and university accreditation, to identify research trends, and to plan research development policies and strategies. However, to fulfill these data needs, research data must be collected and integrated from various data sources due to the diversity of databases. One of the portals that provides scientific research publication data for universities in Indonesia is Sinta (Science and Technology Index). The integrated research databases in Sinta are Scopus, Web of Science (WoS), Garba Rujukan Digital (Garuda), and Google Scholar. However, there are limitations, namely that some scientific research publication metadata in Sinta are still not covered, such as Digital Object Identifier (DOI), abstract, author's full name, publication/journal name, publication type, and number of citations. In addition, each data source has a different data format, which requires data processing so that it can be integrated. Processing and integrating research data from different sources will be very inefficient if it is done manually and not computerized. Therefore, this study proposes a data engineering pipeline framework for the extraction and integration of scientific research publication data from various data sources using the multi-source data fusion method with the Unified Cube methodology approach, which is then implemented by building a web interface. We use Politeknik Statistika STIS, Jakarta as a case study. This framework refers to the data engineering lifecycle and multi-source data fusion method based on abstraction levels for the extraction and integration of scientific research publication data. Then, the transformed data will be classified using rule-based classification. The results show that the accuracy of the framework was more than 90% and the accuracy of the classification results was 87.5%.

Copyrights © 2025






Journal Info

Abbrev

janapati

Publisher

Subject

Computer Science & IT Education Engineering

Description

Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI) is a collection of scientific articles in the field of Informatics / ICT Education widely and the field of Information Technology, published and managed by Jurusan Pendidikan Teknik Informatika, Fakultas Teknik dan Kejuruan, Universitas ...