Journal of Computer Science and Engineering (JCSE)
Vol 4, No 1: February (2023)

A fault-tolerance model for Hadoop rack-aware resource management system

Moses, Timothy (Unknown)
Abiodun, Oladunjoye John (Unknown)



Article Info

Publish Date
05 Apr 2023

Abstract

The central resource manager of Hadoop Yet Another Resource Manager (YARN) has posed a major concern to big data analysis and exploration. The central arbiter is overwhelmed whenever there are resource requests by application masters and heartbeat communication from several name nodes in the Hadoop cluster; thereby, degrading the performance of the framework. An attempt to decentralize the resource manager's responsibilities by introducing a new layer in the cluster named the Rack Unit Resource Manager (RU_RM) layer increased cluster performance but introduced a fault-tolerance concern. This work, therefore, developed a fault-tolerant model to allow for efficient and effective data analysis in the Hadoop cluster. A pseudo-distributed computation was set up with the help of the YARN Scheduler Load Simulator (SLS) and WordCount operation performed with varying input sizes. Two fault scenarios were presented and the results obtained showed that with an increase in input size (workload), the running time of the developed fault-tolerant model though slightly higher than that of the existing model, is significantly negligible when compared to the computation bottleneck incurred anytime RU_RM fails. The developed model, therefore, has good performance in the presence of failure of a unit (RU_RM) in the cluster.

Copyrights © 2023






Journal Info

Abbrev

JCSE

Publisher

Subject

Computer Science & IT

Description

Computer Architecture, Processor design, operating systems, high-performance computing, parallel processing, computer networks, embedded systems, theory of computation, design and analysis of algorithms, data structures and database systems, theory of computation, design and analysis of algorithms, ...