International Journal of Electrical and Computer Engineering
Vol 14, No 4: August 2024

Big data anonymization using Spark for enhanced privacy protection

Graba, Abdelmadjid Guessoum (Unknown)
Toumouh, Adil (Unknown)



Article Info

Publish Date
01 Aug 2024

Abstract

This article introduces an advanced solution for anonymizing large-scale sensitive data, addressing the limitations of traditional approaches when applied to vast datasets. By leveraging the Spark distributed computing framework, we propose a method that parallelizes the data anonymization process, enhancing efficiency and scalability. Utilizing Spark's resilient distributed datasets (RDD), the approach integrates two primary operations, Map_RDD and ReduceByKey_RDD, to execute the anonymization tasks. Our comprehensive experimental evaluation demonstrates our solution's effectiveness and improved performance in preserving data privacy while balancing data utility and confidentiality. A significant contribution of our study is the development of a wide array of solutions for data owners, particularly notable for a 500 MB dataset at an anonymity level of K=100, where our methodology produces 832 unique solutions. This study also opens avenues for future research in applying different privacy models within the Spark ecosystem, such as l-diversity and t-closeness.

Copyrights © 2024






Journal Info

Abbrev

IJECE

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

International Journal of Electrical and Computer Engineering (IJECE, ISSN: 2088-8708, a SCOPUS indexed Journal, SNIP: 1.001; SJR: 0.296; CiteScore: 0.99; SJR & CiteScore Q2 on both of the Electrical & Electronics Engineering, and Computer Science) is the official publication of the Institute of ...