Garuda - Garba Rujukan Digital

ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika

Vol 13, No 1: Published January 2025

LESTANTO, YUSUF (Unknown)
MUALIFA, RAHMA (Unknown)

Publish Date
17 Feb 2025

This study developed a data cleaning system for master data using the Sorted Neighborhood Method (SNM) and N-gram methods to detect and eliminate duplicates and standardize name and address formats. The proposed SNM algorithm handles precleaning tasks, removes specific characters and titles, and forms tokens for comparison. The N-gram algorithm calculates record similarity using user-defined N-gram values and thresholds. The effectiveness was evaluated using recall, precision, and F-measure metrics on small and large datasets. The optimal threshold, token length, and N-gram values were 0.7, 5, and 2, respectively, yielding the highest F-measure scores. The results confirm the successful implementation and improvement of data quality. Identifying optimal parameters provides a benchmark for future data-cleaning efforts, potentially streamlining processes and reducing resources.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika

Website

Abbrev

elkomika

Publisher

Institut Teknologi Nasional Bandung

Subject

Electrical & Electronics Engineering Engineering

Description

Jurnal ELKOMIKA diterbitkan 3 (tiga) kali dalam satu tahun pada bulan Januari, Mei dan September. Jurnal ini berisi tulisan yang diangkat dari hasil penelitian dan kajian analisis di bidang ilmu pengetahuan dan teknologi, khususnya pada Teknik Energi Elektrik, Teknik Telekomunikasi, dan Teknik ...

Article Info

Abstract

Development of a Data Cleaning System for Consumer Master Data using Sorted Neighborhood and N-Gram Methods

Article Info

Abstract