The rapid growth of big data has significantly increased the demand for efficient and scalable data processing methods, particularly within cloud computing environments. This study aims to evaluate the effectiveness of distributed computing frameworks, specifically Apache Hadoop and Apache Spark, in optimizing big data processing. A qualitative approach using a Systematic Literature Review (SLR) method is employed to analyze existing studies related to distributed systems, cloud computing architectures, and performance optimization techniques. The analysis focuses on key performance indicators, including processing speed, resource utilization, and scalability, as well as the suitability of each framework for different data processing scenarios. The findings indicate that Apache Hadoop is highly effective for batch processing and storage-intensive tasks due to its disk-based architecture, while Apache Spark demonstrates superior performance in real-time and iterative processing through its in-memory computing capabilities. Additionally, system configuration factors such as cluster size, memory allocation, and network bandwidth are identified as critical elements influencing overall performance. The study also highlights emerging trends, including the adoption of hybrid cloud environments, the integration of artificial intelligence and machine learning, and the utilization of edge computing to enhance real-time data processing. In conclusion, distributed computing frameworks play a vital role in improving the efficiency and scalability of big data processing in cloud environments. The selection of an appropriate framework, combined with optimized system configuration, can significantly enhance operational performance and support data-driven decision-making.
Copyrights © 2024