International Journal of Artificial Intelligence in Medical Issues
Vol. 4 No. 1 (2026): International Journal of Artificial Intelligence in Medical Issues

Leakage-Aware and Explainable Machine Learning for Healthcare Claim Fraud Detection Using Imbalanced Medical Insurance Data

Dian Hafidh Zulfikar (Universitas Islam Negeri Raden Intan Lampung)
Ery Setiyawan Jullev Atmadji (Politeknik Negeri Jember)
Bagus Satrio Wahyu Poetro (Universitas Islam Sultan Agung Semarang)



Article Info

Publish Date
23 May 2026

Abstract

Healthcare insurance fraud is a critical challenge in health systems because fraudulent claims may cause financial losses, increase administrative burden, and reduce trust in healthcare services. This study proposes an explainable machine learning approach for detecting fraudulent healthcare insurance claims using imbalanced medical claim data. The dataset consisted of 10,000 healthcare insurance claim records with 20 attributes, including patient information, provider characteristics, claim-related financial variables, medical codes, temporal features, and fraud labels. Fraudulent claims represented only 8.29% of the dataset, indicating a clear class imbalance problem. Several machine learning models were evaluated, including Logistic Regression, Decision Tree, Random Forest, Extra Trees, and AdaBoost, under different imbalance handling strategies, namely baseline learning, class weighting, and SMOTE. In addition, two feature scenarios were compared: a full-feature scenario and a leakage-aware scenario that excluded potentially post-decision variables such as claim status and approved amount. The experimental results showed that the best full-feature model was Logistic Regression without additional imbalance handling, achieving an accuracy of 0.9900, precision of 0.9740, recall of 0.9036, F1-score of 0.9375, ROC-AUC of 0.9989, and PR-AUC of 0.9896. The model correctly detected 150 out of 166 fraudulent claims in the test set. However, the best leakage-aware model achieved a lower F1-score of 0.6983, indicating that potentially leaked variables may substantially affect model performance. Feature importance analysis showed that claim amount, approved amount, claim submission delay, claim status, and provider-related variables were among the most influential predictors. These findings demonstrate that explainable machine learning can support healthcare claim fraud detection, but careful attention must be given to class imbalance, data leakage, and operational deployment context

Copyrights © 2026






Journal Info

Abbrev

ijaimi

Publisher

Subject

Computer Science & IT Dentistry Health Professions Medicine & Pharmacology Public Health

Description

The International Journal of Artificial Intelligence in Medical Issues (IJAIMI) is a premier, peer-reviewed academic journal dedicated to the integration and advancement of artificial intelligence (AI) in the medical field. The journal aims to serve as a global platform for researchers, clinicians, ...