MUALIFA, RAHMA
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Development of a Data Cleaning System for Consumer Master Data using Sorted Neighborhood and N-Gram Methods LESTANTO, YUSUF; MUALIFA, RAHMA
ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika Vol 13, No 1: Published January 2025
Publisher : Institut Teknologi Nasional, Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/elkomika.v13i1.57

Abstract

This study developed a data cleaning system for master data using the Sorted Neighborhood Method (SNM) and N-gram methods to detect and eliminate duplicates and standardize name and address formats. The proposed SNM algorithm handles precleaning tasks, removes specific characters and titles, and forms tokens for comparison. The N-gram algorithm calculates record similarity using user-defined N-gram values and thresholds. The effectiveness was evaluated using recall, precision, and F-measure metrics on small and large datasets. The optimal threshold, token length, and N-gram values were 0.7, 5, and 2, respectively, yielding the highest F-measure scores. The results confirm the successful implementation and improvement of data quality. Identifying optimal parameters provides a benchmark for future data-cleaning efforts, potentially streamlining processes and reducing resources.