Venkataseshaiah C
Multimedia University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance Md. Armanur Rahman; J. Hossen; Venkataseshaiah C; CK Ho; Tan Kim Geok; Aziza Sultana; Jesmeen M. Z. H.; Ferdous Hossain
International Journal of Electrical and Computer Engineering (IJECE) Vol 8, No 3: June 2018
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (475.262 KB) | DOI: 10.11591/ijece.v8i3.pp1854-1862

Abstract

The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this paper, the concept of critical issues of Hadoop system, big data and machine learning have been highlighted and an analysis of some machine learning techniques applied so far, for improving the Hadoop performance is presented. Then, a promising machine learning technique using deep learning algorithm is proposed for Hadoop system performance improvement.
Towards machine learning-based self-tuning of Hadoop-Spark system Md. Armanur Rahman; Abid Hossen; J. Hossen; Venkataseshaiah C; Thangavel Bhuvaneswari; Aziza Sultana
Indonesian Journal of Electrical Engineering and Computer Science Vol 15, No 2: August 2019
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v15.i2.pp1076-1085

Abstract

Apache Spark is an open source distributed platform which uses the concept of distributed memory for processing big data. Spark has more than 180 predominant configuration parameter. Configuration settings directly control the efficiency of Apache spark while processing big data, to get the best outcome yet a challenging task as it has many configuration parameters.  Currently, these predominant parameters are tuned manually by trial and error. To overcome this manual tuning problem in this paper proposed and developed a self-tuning approach using machine learning. This approach can tune the parameter value when it’s required. The approach was implemented on Dell server and experiment was done on five different sizes of the dataset and parameter. A comparison is provided to highlight the experimented result of the proposed approach with default Spark configuration system. The results demonstrate that the execution is speeded-up by about 33% (on an average) compared to the default configuration.