JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 8 No. 2 (2024): December 2024

Comparison of Hadoop Mapreduce and Apache Spark in Big Data Processing with Hgrid247-DE

Utami, Firmania Dwi (Unknown)
Astuti, Femi Dwi (Unknown)



Article Info

Publish Date
12 Nov 2024

Abstract

In today's rapidly evolving information technology landscape, managing and analyzing big data has become one of the most significant challenges. This paper explores the implementation of two major frameworks for big data processing: Hadoop MapReduce and Apache Spark. Both frameworks were tested in three scenarios sorting, summarizing, and grouping using HGrid247-DE as the primary tool for data processing. A diverse set of datasets sourced from Kaggle, ranging in size from 3 MB to 260 MB, was employed to evaluate the performance of each framework. The findings reveal that Apache Spark generally outperforms Hadoop MapReduce in terms of processing speed due to its in-memory data handling capabilities. However, Hadoop MapReduce proved to be more efficient in specific scenarios, particularly when dealing with smaller tasks or when memory resources are limited. This is largely because Apache Spark can experience overhead when initializing tasks for smaller jobs. Furthermore, Hadoop MapReduce's reliance on disk I/O makes it more suitable for tasks involving vast amounts of data that surpass available memory. In contrast, Spark excels in situations where quick iterative processing and real-time data analysis are essential. This study provides valuable insights into the strengths and limitations of each framework, offering guidance for practitioners and researchers when selecting the appropriate tool for specific big data processing requirements, particularly with respect to speed, memory usage, and task complexity.

Copyrights © 2024






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...