Akmal Fauzan Ananta
Politeknik Negeri Cilacap

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Improving Diagnostic Accuracy on Prescription Text Data Using SMOTE-Optimized SVM Linda Perdana Wanti; Nur Wachid Adi Prasetya; Riyadi Purwanto; Rahmat Mulyadi; Akmal Fauzan Ananta
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 10 No 2 (2026): April - In progress
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v10i2.7441

Abstract

Disease classification based on drug prescription data plays a crucial role in helping healthcare professionals understand patient health conditions and supporting clinical decision-making. Drug prescription data actually contains a wealth of information regarding disease indications, but is generally presented in unstructured, free-text form. Furthermore, the data distribution across disease classes is often imbalanced, with some diseases receiving less data than others. This can lead to inaccurate classification models that favor disease classes with more data. This study aims to enhance the performance of disease classification based on drug prescription data by combining text mining approaches, the Synthetic Minority Oversampling Technique (SMOTE), and the Support Vector Machine (SVM) algorithm. The research process begins with text preprocessing, which includes case folding, tokenization, stopword removal, and stemming, to clean and normalize the prescription data. Next, the text data is converted into numeric features using the Term Frequency–Inverse Document Frequency (TF-IDF) method to enable processing by machine learning algorithms. To address the class imbalance issue, the SMOTE method is applied to training data by generating synthetic data for a limited number of disease classes. A classification model was then built using the SVM algorithm, known to be effective in handling high-dimensional text data. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results showed that the application of SMOTE and parameter optimization in Support Vector Machine significantly improved classification performance, with an accuracy of 92.6%, a precision of 91.8%, a recall of 93.4%, and an F1-score of 92.6%. The increased recall value in the class of patients diagnosed with diabetes indicates that the model is able to correctly identify most diabetes cases based on medical prescription data.