Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Journal of ICT Research and Applications

Amandeep Verma

Department of Computer Science, Punjabi University Patiala, Punjab, 147002 India

Author-ID : 3285937

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Development of Focused Crawlers for Building Large Punjabi News Corpus Gurjot Singh Mahi; Amandeep Verma
Journal of ICT Research and Applications Vol. 15 No. 3 (2021)
Publisher : LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.5614/itbj.ict.res.appl.2021.15.3.1

Web crawlers are as old as the Internet and are most commonly used by search engines to visit websites and index them into repositories. They are not limited to search engines but are also widely utilized to build corpora in different domains and languages. This study developed a focused set of web crawlers for three Punjabi news websites. The web crawlers were developed to extract quality text articles and add them to a local repository to be used in further research. The crawlers were implemented using the Python programming language and were utilized to construct a corpus of more than 134,000 news articles in nine different news genres. The crawler code and extracted corpora were made publicly available to the scientific community for research purposes.

Co-Authors Gurjot Singh Mahi

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search