Garuda - Garba Rujukan Digital

Journal of Electronics, Electromedical Engineering, and Medical Informatics

Vol 7 No 3 (2025): July

Dhairya Vyas (Unknown)
Milind Shah (Unknown)
Harsh Kantawala (Unknown)
Brijesh Patel (Unknown)
Patel, Tejas (Unknown)
Enamala, Jalaja (Unknown)

Publish Date
14 Jul 2025

This research presents an AI-driven framework for multi-disease classification using natural language symptom descriptions, optimized through large language model (LLM) oriented preprocessing techniques. The proposed system integrates essential NLP steps text normalization, lemmatization, and n-gram vectorization to convert unstructured clinical symptom data into machine-readable form. A publicly available dataset comprising 8,498 samples across ten common diseases, including pneumonia, heart attack, diabetes, stroke, asthma, and depression, was used for training and evaluation. Data balancing and cleaning ensured uniform class representation with 1,200 samples per disease category. The processed dataset was subjected to supervised machine learning models, including SVM, KNN, Decision Tree, Random Forest, and Extra Trees, to identify the most effective classifier. Experimental results, conducted in Google Colab, showed that ensemble models (Random Forest and Extra Trees) significantly outperformed the others, achieving 99% accuracy, precision, recall, and F1-scores, while SVM and Decision Tree followed closely with 98% performance across metrics. Notably, the models consistently predicted pneumonia with high confidence for relevant input queries , validating the framework's robustness. This work demonstrates the efficacy of integrating LLM-compatible preprocessing with traditional ML classifiers for accurate disease detection based on symptom narratives. The proposed approach serves as a foundational step toward developing scalable, intelligent healthcare support systems capable of real-time disease prediction and decision-making assistance.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Electronics, Electromedical Engineering, and Medical Informatics

Website

Abbrev

jeeemi

Publisher

Polilteknik Kesehatan Kemenkes Surabaya

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Engineering

Description

The Journal of Electronics, Electromedical Engineering, and Medical Informatics (JEEEMI) is a peer-reviewed open-access journal. The journal invites scientists and engineers throughout the world to exchange and disseminate theoretical and practice-oriented topics which covers three (3) majors areas ...

Article Info

Abstract

SympTextML: Leveraging Natural Language Symptom Descriptions for Accurate Multi-Disease Prediction

Article Info

Abstract