Jurnal Pilar Nusa Mandiri
Vol. 21 No. 2 (2025): Pilar Nusa Mandiri : Journal of Computing and Information System Publishing Pe

EVALUATING PREPROCESSING EFFECTS IN NAME RETRIEVAL USING CLASSICAL IR AND CNN-BASED MODELS

Marcelly, Frizca Fellicita (Unknown)
Saputra, Irwansyah (Unknown)
Andra, Muhammad Bagus (Unknown)



Article Info

Publish Date
23 Sep 2025

Abstract

Information Retrieval (IR) systems are pivotal for efficient data management, particularly in tasks involving name searches and entity identification. This study evaluates text preprocessing techniques, including case folding, phonetic normalization, and gender tagging, that affect the performance of classical (TF-IDF, LSI) and CNN-based retrieval models for multilingual name matching. Using a dataset of 365,468 globally diverse names, this study implements a preprocessing pipeline featuring: Double Metaphone phonetic preprocessing (92% validation accuracy), gender disambiguation for unisex names (92% accuracy), and optimized n-gram tokenization for short names. Evaluation metrics include precision, recall, F1-score, and our novel Name Similarity Score (NSS), combining orthographic and phonetic preprocessing. Results show our full pipeline improves recall to 1.00 and F1-score by 37% while reducing false negatives by 63%. Key findings reveal: TF-IDF achieves superior recall (0.98 vs CNN’s 0.85), LSI handles cultural variants effectively, and CNNs deliver the highest precision (0.91 vs TF-IDF’s 0.70), particularly for unisex names. This work contributes both a scalable multilingual preprocessing framework and the NSS evaluation metric for robust name retrieval systems.

Copyrights © 2025






Journal Info

Abbrev

pilar

Publisher

Subject

Computer Science & IT

Description

Jurnal Pilar merupakan jurnal ilmiah yang diterbitkan oleh program studi sistem informasi STMIK Nusa Mandiri. Jurnal ini berisi tentang karya ilmiah yang bertemakan: Rekayasa Perangkat Lunak, Sistem Pakar, Sistem Penunjang, Keputusan, Perancangan Sistem Informasi, Data Mining, Pengolahan ...