Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2020 - 2025

0.23

P-Index

This Author published in this journals

All Journal International Journal of Engineering, Science and Information Technology

Ciptaningrum, Wahyu

Unknown Affiliation

Author-ID : 1946757

Astronomy Biochemistry, Genetics & Molecular Biology Chemical Engineering, Chemistry & Bioengineering Chemistry Civil Engineering, Building, Construction & Architecture Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Earth & Planetary Sciences Education Electrical & Electronics Engineering Energy Engineering Industrial & Manufacturing Engineering Library & Information Science Materials Science & Nanotechnology Mathematics Mechanical Engineering Physics Social Sciences Transportation

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Building a Web Crawler for Text Data Indexing on Online Newspaper Web Hakim, Jamaludin; Sah, Andrian; Nurhayati, Siti; Ciptaningrum, Wahyu; Suryo Sasono, Damar
International Journal of Engineering, Science and Information Technology Vol 4, No 4 (2024)
Publisher : Department of Information Technology, Universitas Malikussaleh, Aceh Utara, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v4i4.677

The Internet has become a vast repository of information, often filled with distractions that can hinder the user experience. News content, for example, is usually interspersed with advertisements that interrupt the flow of reading. In addition, the fast pace of news publication is also a challenge, with potentially more than 50 new articles appearing in 20 minutes. This high-speed data flow is valuable for various applications, including Social Media Analytics Services. In this context, the speed and efficiency of data acquisition (crawling) and processing (scraping) are critical. These processes must be optimized to ensure comprehensive data collection without gaps, focusing on the latest information. To meet this need, we propose developing an application capable of capturing news data in its entirety, minimizing the risk of missing important information. At the core of this solution is a web crawler- a sophisticated program designed to automatically browse the hyperlink structure of the web, systematically downloading linked pages to local storage. This crawling methodology is often the basis for web mining initiatives and search engine development. Since web information is distributed across billions of pages hosted on millions of servers worldwide, our application utilizes the PHP programming language to capture and process this data effectively. The main goal is to present pure news content to users without any irrelevant elements. We use a Data Flow Diagram (DFD) to model the system architecture and data flow. This approach provides a clear visualization of how web users can navigate through hyperlinks to efficiently access the desired news information. By implementing this system, we aim to improve the user experience of consuming news content, facilitate more effective data analysis, and contribute to the broader web information search and processing field.

Co-Authors Hakim, Jamaludin Sah, Andrian Siti Nurhayati Suryo Sasono, Damar

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search