Journal of Computers for Society
Vol 5, No 1 (2024): JCS: June 2024

Apache Spark Implementation on Algorithms Boyer-Moore Horspool for Case Studies Internal Transcribed Spacer and Restriction Enzyme

Fidela Zhafirah (Universitas Pendidikan Indonesia)
Topik Hidayat (Universitas Pendidikan Indonesia)
Lala Septem Riza (Universitas Pendidikan Indonesia)



Article Info

Publish Date
07 Jun 2024

Abstract

The huge increase in the amount of data is a problem today. The increase in large amounts of data makes storage very large and processing data becomes very long. Meanwhile, the speed of the process is very necessary to streamline time. This research is dedicated to solving storage and process problems as a big data processing solution by creating a string matching computational model using the Boyer-Moore Horspool algorithm using the Big Data platform, Apache Spark where the Hadoop Distributed File System as data storage on the cluster. In this study, a comparison of string matching process time between stand-alone, the use of Apache Spark single nodes, the use of Apache Spark 3 nodes, 5 nodes, 11 nodes and 16 nodes using Hadoop Distributed File System storage on clusters on Google Cloud Platform. The case study used is bioinformatics by solving two problems in the field of biology, namely the search for motives related to determining the group of flowering plants with other plant groups and the search for motives as detection of begomovirous symptoms as the cause of curly leaf disease. In the results of the study, insignificant time was obtained because the data used could still be processed by classical programs so that the execution time was not much different. The accuracy of the program run on Apache Spark is 83.5%.

Copyrights © 2024






Journal Info

Abbrev

JCS

Publisher

Subject

Computer Science & IT Engineering Library & Information Science Mathematics

Description

The Journal invites original articles and not simultaneously submitted to another journal or conference. The whole spectrum of computer science are welcome, which includes, but is not limited to - Artificial Intelligence, IoT and Robotics - Data Analysis and Big Data - Multimedia and Design, - ...