Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Variance : Journal of Statistics and Its Applications

EVALUATING NEARMISS AND SMOTE FOR VEHICLE INSURANCE FRAUD CLAIM CLASSIFICATION WITH A RANDOM FOREST CLASSIFIER Yusuf, Feby Indriana; Handamari, Endang Wahyu
VARIANCE: Journal of Statistics and Its Applications Vol 7 No 2 (2025): VARIANCE: Journal of Statistics and Its Applications
Publisher : Statistics Study Programme, Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Pattimura

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/variancevol7iss2page219-230

Abstract

This study evaluates the detection of fraudulent car insurance claims in unbalanced data by comparing two resampling techniques, namely NearMiss (undersampling) and SMOTE (oversampling), combined with Random Forest. The public dataset, consisting of 1,000 observations and 40 features, was preprocessed for missing value handling, label encoding, and min–max normalization, and split into 70% training data and 30% test data. Three scenarios were evaluated: original data (unbalanced), NearMiss, and SMOTE, using accuracy, precision, sensitivity (recall), specificity, and F1-score evaluations. The analysis results show that NearMiss provides the most balanced performance for antifraud purposes, with a sensitivity of 0.865, an F1-score of 0.667, and an accuracy of 0.787. For the original unbalanced data, the model achieved a sensitivity of 0.297 and an accuracy of 0.767. SMOTE achieved the highest precision (0.567) and accuracy (0.783), but its sensitivity was lower than that of NearMiss. These findings confirm that the selection of resampling techniques must be aligned with operational objectives: NearMiss is more appropriate when the priority is to capture as many fraud cases as possible, while SMOTE is more suitable when false positive control is prioritized.