Claim Missing Document
Check
Articles

Found 2 Documents
Search

An Explainable Credit Card Fraud Detection Model using Machine Learning and Deep Learning Approaches Alkhozae, Mona; Almasre, Miada; Almakky, Abeer; Alhebshi, Reemah M.; Alamri, Amani; Hakami, Widad; Alshahrani, Lamia
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.962

Abstract

This study proposes an adaptive, interpretable real-time fraud detection and prevention system designed for high-risk financial environments, capable of processing over 1.6 million imbalanced credit card transactions with low latency. The objective is to build a unified framework that integrates predictive accuracy, explainability, and adaptability. The methodology follows four phases: exploratory data analysis to reveal structural and behavioral fraud patterns, feature engineering with domain-informed attributes and ADASYN oversampling to mitigate the 1:174 imbalance, training of multiple models (XGBoost, LightGBM, Random Forest, Gradient Boosting, and MLP), and an ensemble architecture evaluated with SHAP-based explainability. The system introduces three key contributions: stability-aware SHAP caching that reduces explanation latency to 41.2 ms, reinforcement learning–based threshold tuning that dynamically adapts to evolving fraud patterns, and out-of-distribution detection to enhance resilience against data drift. Results demonstrate strong performance, with XGBoost achieving 99.86% accuracy, 96.36% precision, 80.59% recall, F1-score of 0.878, and ROC-AUC of 0.9988, outperforming other models. The full system attained 93.2% accuracy, 90.2% F1-score, and 96.1% AUC at the system level, successfully blocking 91% of fraudulent transactions while maintaining a false positive rate of 7.8%. Novelty lies in combining explainability and adaptivity in a production-ready architecture, where reinforcement learning enables continuous threshold self-regulation and SHAP stability analysis validates interpretability across models. These findings show that high fraud detection accuracy and transparency are not mutually exclusive, offering a scalable blueprint for financial institutions and other critical domains requiring real-time, explainable, and adaptive decision-making.
Self-consistency and Graph-based Filtering to Enhance Synthetic Arabic SMS Generation for Smishing Detection Alotaibi, Amal; Almasre, Miada; Surougi, Hadeel; Alkhozae, Mona; Alghanmi, Nouf
Journal of Applied Data Sciences Vol 7, No 1: January 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i1.1033

Abstract

Smishing or SMS phishing is a growing cybersecurity threat in mobile security, with Arabic-speaking regions particularly vulnerable due to the absence of large, labeled datasets. The main objective of this study is to develop a scalable pipeline that can generate and classify Arabic SMS messages to overcome the lack of data and enhance detection performance. The contributions are threefold: (i) constructing a balanced dataset of 6,903 messages by combining 903 synthetic samples with 6,000 real Arabic SMS messages; (ii) introducing a hybrid generation framework that integrates a fine-tuned GPT-3.5-turbo language model with Conditional WGAN embeddings, refined using self-consistency sampling and graph-based redundancy filtering; and (iii) evaluating the dataset using multiple machine learning (Logistic Regression, Random Forest, SVM) and deep learning (CNN, BERT) models. The pipeline unifies adversarial embedding generation, large language model fine-tuning, and cosine similarity filtering. Experimental results show consistently strong performance: Logistic Regression and Random Forest both achieved accuracy of 0.9949 and F1-score of 0.9950, while SVM outperformed all with accuracy 0.9957 and F1-score 0.9957. Among deep learning models, CNN reached accuracy 0.9942 and F1-score 0.9942, and BERT achieved 0.9900 across all metrics. These findings confirm that while SVM is most effective for this dataset, CNN and BERT add robustness by capturing semantic subtleties. Visual analyses, including confusion matrices and t-SNE projections, validated the overlap between real and synthetic embeddings, while comparative tables positioned this study within the context of recent Arabic smishing research. The novelty of this work lies in combining self-consistency and graph-based filtering within a hybrid generation-classification pipeline tailored for Arabic SMS, providing a reproducible framework extendable to low-resource, multilingual, and cross-platform environments such as WhatsApp and Telegram.