Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Comparative Analysis of Python Text Matching Libraries: A Multilingual Evaluation of Capabilities, Performance and Resource Utilization Elmobark, Nagwa
International Journal of Environment, Engineering and Education Vol. 7 No. 1 (2025)
Publisher : Three E Science Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.55151/ijeedu.v7i1.188

Abstract

Python text-matching libraries have become essential tools in data cleaning and natural language processing; however, researchers have not thoroughly examined their performance, accuracy, and resource efficiency across multilingual scenarios. This study evaluates five major libraries—FuzzyWuzzy, RapidFuzz, Difflib, Levenshtein, and Jellyfish—using a dataset of 50,000 test cases in English, Spanish, French, German, and Italian. We introduce controlled variations in text complexity, error types, and string lengths to measure processing speed, matching accuracy, and resource consumption. The experimental results reveal significant performance differences among the libraries. RapidFuzz processes text 40% faster than others while maintaining efficient memory usage. However, its performance varies depending on language and error type. Levenshtein achieves higher accuracy when handling non-Latin characters, while FuzzyWuzzy consistently performs well across different text lengths. Difflib, despite its built-in availability, runs slower and consumes more resources. Jellyfish specializes in phonetic matching but struggles with long text inputs. Memory usage fluctuates between 20 and 200 Megabytes for identical workloads, revealing substantial efficiency differences. These findings enable developers to select the most suitable library based on their specific needs and computational constraints. Our study introduces a standardized evaluation framework and a multilingual benchmarking dataset, enabling researchers to compare text-matching methods more effectively. By identifying key performance trade-offs, we provide a practical guide for optimizing text-matching efficiency in real-world applications. This research contributes to the broader field of natural language processing by offering data-driven insights and a structured methodology for evaluating text similarity techniques.