Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Advanced Computer Knowledge and Algorithms

Effective Data Preprocessing in Data Science: From Method Selection to Domain-Specific Optimization Shahidi, Shahwali; Wahid Samadzai, Abdul; Shahbazi, Hafizullah
Journal of Advanced Computer Knowledge and Algorithms Vol. 2 No. 4 (2025): Journal of Advanced Computer Knowledge and Algorithms - October 2025
Publisher : Department of Informatics, Universitas Malikussaleh

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29103/jacka.v2i4.22886

Abstract

In the era of big data and artificial intelligence, data preprocessing has emerged as a critical step in the data science pipeline, influencing the quality, performance, and reliability of machine learning models. Despite its importance, the diversity of techniques, challenges, and evolving practices necessitate a structured understanding of this domain. This study conducts a systematic literature review (SLR) to explore current data preprocessing techniques, their domain-specific applications, associated challenges, and emerging trends. A total of 21 peer-reviewed articles from 2016 to 2024 were analyzed using well-defined inclusion and exclusion criteria, with a focus on machine learning and big data contexts. The results reveal that normalization, data cleaning, feature selection, and dimensionality reduction are the most commonly applied techniques. Key challenges identified include handling missing values, high dimensionality, and imbalanced data. Moreover, recent trends such as automated preprocessing (AutoML), privacy-preserving methods, and scalable preprocessing for distributed systems are gaining momentum. The review concludes that while traditional methods remain foundational, there is a shift toward adaptive and intelligent preprocessing strategies to meet the growing complexity of data environments. This study offers valuable insights for researchers and practitioners aiming to optimize data preparation processes in modern data science workflows