Claim Missing Document
Check
Articles

Found 1 Documents
Search

Big data anonymization using Spark for enhanced privacy protection Graba, Abdelmadjid Guessoum; Toumouh, Adil
International Journal of Electrical and Computer Engineering (IJECE) Vol 14, No 4: August 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v14i4.pp4686-4696

Abstract

This article introduces an advanced solution for anonymizing large-scale sensitive data, addressing the limitations of traditional approaches when applied to vast datasets. By leveraging the Spark distributed computing framework, we propose a method that parallelizes the data anonymization process, enhancing efficiency and scalability. Utilizing Spark's resilient distributed datasets (RDD), the approach integrates two primary operations, Map_RDD and ReduceByKey_RDD, to execute the anonymization tasks. Our comprehensive experimental evaluation demonstrates our solution's effectiveness and improved performance in preserving data privacy while balancing data utility and confidentiality. A significant contribution of our study is the development of a wide array of solutions for data owners, particularly notable for a 500 MB dataset at an anonymity level of K=100, where our methodology produces 832 unique solutions. This study also opens avenues for future research in applying different privacy models within the Spark ecosystem, such as l-diversity and t-closeness.