IAES International Journal of Artificial Intelligence (IJ-AI)
Vol 10, No 1: March 2021

Massively scalable density based clustering (DBSCAN) on the HPCC systems big data platform

Yatish H. R. (RV College of Engineering)
Shubham Milind Phal (RV College of Engineering)
Tanmay Sanjay Hukkeri (RV College of Engineering)
Lili Xu (HPCC Systems)
Shobha G (RV College of Engineering)
Jyoti Shetty (RV College of Engineering)
Arjuna Chala (HPCC Systems)



Article Info

Publish Date
01 Mar 2021

Abstract

Dealing with large samples of unlabeled data is a key challenge in today’s world, especially in applications such as traffic pattern analysis and disaster management. DBSCAN, or density based spatial clustering of applications with noise, is a well-known density-based clustering algorithm. Its key strengths lie in its capability to detect outliers and handle arbitrarily shaped clusters. However, the algorithm, being fundamentally sequential in nature, proves expensive and time consuming when operated on extensively large data chunks. This paper thus presents a novel implementation of a parallel and distributed DBSCAN algorithm on the HPCC Systems platform. The algorithm seeks to fully parallelize the algorithm implementation by making use of HPCC Systems optimal distributed architecture and performing a tree-based union to merge local clusters. The proposed approach* was tested both on synthetic as well as standard datasets (MFCCs Data Set) and found to be completely accurate. Additionally, when compared against a single node setup, a significant decrease in computation time was observed with no impact to accuracy. The parallelized algorithm performed eight times better for higher number of data points and takes exponentially lesser time as the number of data points increases.

Copyrights © 2021






Journal Info

Abbrev

IJAI

Publisher

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...