Dalhatu, Sirajo Muhammad
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Model for Enhancing Pattern Recognition in Clinical Narrative Datasets through Text-Based Feature Selection and SHAP Technique Dalhatu, Sirajo Muhammad; Azmi Murad, Masrah Azrifah
JOIV : International Journal on Informatics Visualization Vol 8, No 4 (2024)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.4.3664

Abstract

Clinical narratives contain crucial patient information for predicting cardiac failure. Accurate and timely cardiac failure recognition (CFR) significantly impacts patient outcomes but faces challenges like limited dataset sizes, feature space sparsity, and underutilization of vital sign data. This study addresses these issues by developing a methodology to improve CFR accuracy and interpretability within clinical narratives. Four datasets—the Framingham Heart Study, Heart Disease from Kaggle, Cleveland Heart Disease, and Heart Failure Clinical Records—undergo preprocessing, including handling missing values, removing duplicates, scaling, encoding categorical variables, and transforming unstructured data using natural language processing (NLP). Various feature selection methods (Chi-Squared, Forward Selection, L1 Regularization) are used to identify influential features for CFR, and the SHapley Additive exPlanations (SHAP) technique is integrated to improve interpretability. Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) models are trained and evaluated. Performance was evaluated using accuracy, precision, recall, f1-score, and area under the receiver operating characteristic curve (AUC-ROC). Results indicate that L1 Regularization with LR and Chi-Squared with RF perform best for specific datasets. The final model, combining all datasets with Forward Selection and RF, achieves high accuracy (91%), precision (87%), recall (97%), f1-score (91%), and AUC-ROC (94%). This study concludes that advanced text-based feature selection and SHAP interpretability significantly enhance CFR model accuracy and transparency, aiding clinical decision-making. Future research should incorporate more diverse datasets, explore advanced NLP techniques, and validate models in various clinical settings to enhance robustness and applicability.