Claim Missing Document
Check
Articles

Found 1 Documents
Search

Integration of Stacking Ensemble and Explainable AI for Taxpayer Compliance Risk Profiling Agung, Heru Pratama; Suharjito, Suharjito
Equivalent: Jurnal Ilmiah Sosial Teknik Vol. 8 No. 2 (2026): Equivalent: Jurnal Ilmiah Sosial Teknik
Publisher : Politeknik Siber Cerdika Internasional

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59261/jequi.v8i2.301

Abstract

Background: Non-compliance of corporate taxpayers is one of the biggest challenges for the Tax Authorities especially because no tangible data is available on corporate tax avoidance making tax evasion micro data sets have the characteristics of extreme class imbalance that are well known from the real-world tax data. Objective: To develop an accurate and transparent tax non-compliance prediction using an Ensemble Learning based prediction model incorporating Hybrid Resampling methods, and Explainable Artificial Intelligence (XAI). Methods: The dataset, which consists of 49159 observations, is extracted from the administrative record of Directorate General of Taxes where the ratio of imbalance about 18.81:1. In the former strategy, three hybrid resampling techniques (SMOTE-Tomek, SMOTEENN, Borderline-SMOTE Tomek) were integrated with tree-based classifiers (Random Forest, XGBoost, LightGBM) to act as the base-learners. These were all combined using two ensemble architectures, Stacking Classifier and Voting Classifier to utilize their respective predictive capabilities. We used the SHAP and LIME methods to break the black-box nature of the algorithm to interpret the predictive decisions. Results: Experimental results revealed that the best classification was achieved with the Stacking Classifier, yielding an Accuracy of 97.03% along with the minority class F1-Score of 0.7309 In turn, the strongest discrimination in probability was found for the Voting Classifier with an ROC-AUC metric 0.9859 Consequently, the XAI analysis confirmed that pure financial ratio being utterly secondary are dominated in the prediction of the non-compliance risk and shows that absolute financial scale indicators (e.g. Tax Payment Amount, Total Assets) and administrative profile characteristics (e.g. MSME Taxpayer Status, Non-Effective Status) are overwhelmingly DC dominated. Conclusion: The choice of Ensemble Learning provides an analytically sound and interpretable early warning system of tax audits beneficial for real risk-based audits with its composure of hybrid resampling and interpretability (XAI).