Sistemasi: Jurnal Sistem Informasi
Vol 13, No 3 (2024): Sistemasi: Jurnal Sistem Informasi

A Web Scraper for Data Mining Purposes

Mahmood, Yasir Ali (Unknown)
Mahmood, Bassim (Unknown)



Article Info

Publish Date
22 May 2024

Abstract

The current revolution in technology makes data a crucial part of real-life applications due to its importance in making decisions. In the era of big data and the massive expansion of data streams on Internet networks and platforms, the process of data collection, mining, and analysis has become a not easy matter. Therefore, the presence of auxiliary applications for data mining and gathering has become a necessary need. Usually, companies offer special APIs to collect data from particular destinations, which needs a high cost. Generally, there is a severe lack in the literature in providing approaches that offer flexible, low, or free of cost tools for web scraping. Hence, this article provides a free tool that can be used for data mining and data collection purposes from the web. Specifically, an efficient Google Scholar web scraper is introduced. The extracted data can be used for analysis purposes and making decisions about an issue of interest. The proposed scraper can also be modified for crawling web links and retrieving specific data from a particular website. It can also formalize the collected data as a ready dataset to be used in the analysis phase. The efficiency of the proposed scraper is tested in terms of the time consumption, accuracy, and quality of the data collected. The findings showed that the proposed approach is highly feasible for data collection and can be adopted by data analysts.

Copyrights © 2024






Journal Info

Abbrev

stmsi

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

Sistemasi adalah nama terbitan jurnal ilmiah dalam bidang ilmu sains komputer program studi Sistem Informasi Universitas Islam Indragiri, Tembilahan Riau. Jurnal Sistemasi Terbit 3x setahun yaitu bulan Januari, Mei dan September,Focus dan Scope Umum dari Sistemasi yaitu Bidang Sistem Informasi, ...