Garuda - Garba Rujukan Digital

Jurnal Pilar Nusa Mandiri

Vol. 21 No. 2 (2025): Pilar Nusa Mandiri : Journal of Computing and Information System Publishing Pe

Marcelly, Frizca Fellicita (Unknown)
Saputra, Irwansyah (Unknown)
Andra, Muhammad Bagus (Unknown)

Publish Date
23 Sep 2025

Information Retrieval (IR) systems are pivotal for efficient data management, particularly in tasks involving name searches and entity identification. This study evaluates text preprocessing techniques, including case folding, phonetic normalization, and gender tagging, that affect the performance of classical (TF-IDF, LSI) and CNN-based retrieval models for multilingual name matching. Using a dataset of 365,468 globally diverse names, this study implements a preprocessing pipeline featuring: Double Metaphone phonetic preprocessing (92% validation accuracy), gender disambiguation for unisex names (92% accuracy), and optimized n-gram tokenization for short names. Evaluation metrics include precision, recall, F1-score, and our novel Name Similarity Score (NSS), combining orthographic and phonetic preprocessing. Results show our full pipeline improves recall to 1.00 and F1-score by 37% while reducing false negatives by 63%. Key findings reveal: TF-IDF achieves superior recall (0.98 vs CNN’s 0.85), LSI handles cultural variants effectively, and CNNs deliver the highest precision (0.91 vs TF-IDF’s 0.70), particularly for unisex names. This work contributes both a scalable multilingual preprocessing framework and the NSS evaluation metric for robust name retrieval systems.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Pilar Nusa Mandiri

Website

Abbrev

pilar

Publisher

STMIK Nusa Mandiri

Subject

Computer Science & IT

Description

Jurnal Pilar merupakan jurnal ilmiah yang diterbitkan oleh program studi sistem informasi STMIK Nusa Mandiri. Jurnal ini berisi tentang karya ilmiah yang bertemakan: Rekayasa Perangkat Lunak, Sistem Pakar, Sistem Penunjang, Keputusan, Perancangan Sistem Informasi, Data Mining, Pengolahan ...

Article Info

Abstract

EVALUATING PREPROCESSING EFFECTS IN NAME RETRIEVAL USING CLASSICAL IR AND CNN-BASED MODELS

Article Info

Abstract