Garuda - Garba Rujukan Digital

Innovative Technologica: Methodical Research Journal

Vol. 3 No. 3 (2024): September

Dalieva, Madina (Unknown)

Publish Date
24 Oct 2024

Corpus compilation is a critical process in linguistics that involves gathering and organizing large datasets for language analysis and model training. This article examines key aspects of corpus compilation, with a particular focus on data collection. It explores the sources of data, strategies for ensuring representativeness, and challenges such as copyright constraints and data quality issues. Ethical considerations, such as anonymization and consent, are also discussed. By understanding these factors, researchers can build effective and ethically sound corpora for linguistic research and computational applications.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Innovative Technologica: Methodical Research Journal

Website

Abbrev

Innovative

Publisher

Indonesian Journal Publisher

Subject

Humanities Chemical Engineering, Chemistry & Bioengineering Electrical & Electronics Engineering Mechanical Engineering

Description

Innovative Technologica: Methodical Research Journal is a monthly double-blind peer-reviewed international journal of science and technological advancements. The journal ensures the quality of the articles with the strict double-blind peer review with the plagiarism check at all stages from ...

Article Info

Abstract

Methods, Challenges, and Ethical Considerations in Data Collection of Corpus Compilation

Article Info

Abstract