In the era of big data and artificial intelligence, data preprocessing has emerged as a critical step in the data science pipeline, influencing the quality, performance, and reliability of machine learning models. Despite its importance, the diversity of techniques, challenges, and evolving practices necessitate a structured understanding of this domain. This study conducts a systematic literature review (SLR) to explore current data preprocessing techniques, their domain-specific applications, associated challenges, and emerging trends. A total of 21 peer-reviewed articles from 2016 to 2024 were analyzed using well-defined inclusion and exclusion criteria, with a focus on machine learning and big data contexts. The results reveal that normalization, data cleaning, feature selection, and dimensionality reduction are the most commonly applied techniques. Key challenges identified include handling missing values, high dimensionality, and imbalanced data. Moreover, recent trends such as automated preprocessing (AutoML), privacy-preserving methods, and scalable preprocessing for distributed systems are gaining momentum. The review concludes that while traditional methods remain foundational, there is a shift toward adaptive and intelligent preprocessing strategies to meet the growing complexity of data environments. This study offers valuable insights for researchers and practitioners aiming to optimize data preparation processes in modern data science workflows