Diabetes mellitus is a rapidly progressing non-communicable disease that significantly affects quality of life. Clinical information in electronic medical records, such as prescriptions and laboratory results, often appears as unstructured text and therefore requires text-mining techniques for accurate classification. This research compares the performance of the Support Vector Machine (SVM) classifier on diabetes mellitus data processed with and without feature extraction using Regular Expressions (Regex). The workflow includes data preprocessing, feature extraction, TF-IDF weighting, model training, and evaluation using accuracy, precision, recall, and F1-score. Results show that both approaches achieve high accuracy (98.8–98.9%), with the non-Regex model performing slightly better at 98.93% compared to 98.83% for the Regex-based model. These findings indicate that SVM is effective for classifying text-based clinical data, while Regex provides potential benefits but requires further optimization to ensure its suitability for various medical text contexts.
Copyrights © 2026