Garuda - Garba Rujukan Digital

International Journal of Engineering, Science and Information Technology

Vol 5, No 4 (2025)

Barot, Deep (Unknown)
Najeeb Shaik, Kamal Mohammed (Unknown)
Haque Mukit, Mohammad Mushfiqul (Unknown)
Melath, Vinesh (Unknown)
Nair, Rithesh (Unknown)

Publish Date
11 Nov 2025

The growing demand for data-driven decision-making in the enterprise context poses a conflict between the utilisation of machine learning (ML) and data privacy. The paper examines the feasibility of using synthetic data to replace actual enterprise data in business intelligence (BI) applications. Synthetic datasets were created using the CTGAN, Variational Autoencoders (VAE), and diffusion models and were successfully assessed in fraud detection and customer segmentation tasks. Empirical findings indicate that XGBoost with synthetic data as training data achieved an accuracy value of 97 percent, with an ROC AUC of 0.94, which is relatively close to the achievable accuracy with real data. CTGAN was found to have high fidelity as the Wasserstein distances were less than 0.15, and the Jensen-Shannon divergence was less than 0.08. The visualisations of dimensionality reductions ensured that the real and synthetic data had a substantial structural similarity. Privacy analyses revealed that the Nearest Neighbour Adversarial Distance (NNAD) scores differed between CTGAN and diffusion models, with values of 0.38 and 0.36, respectively. Corresponding Membership Inference Attack (MIA) success rates were 51-52%, which is significantly lower than the 68% success rate of the anonymised real data. These findings confirm the consideration that synthetic data can maintain analytical value and diminish privacy risks, providing an effective approach to the safe and scalable implementation of ML in businesses.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

International Journal of Engineering, Science and Information Technology

Website

Abbrev

ijesty

Publisher

Universitas Malikussaleh

Subject

Astronomy Biochemistry, Genetics & Molecular Biology Chemical Engineering, Chemistry & Bioengineering Chemistry Civil Engineering, Building, Construction & Architecture Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Earth & Planetary Sciences Education Electrical & Electronics Engineering Energy Engineering Industrial & Manufacturing Engineering Library & Information Science Materials Science & Nanotechnology Mathematics Mechanical Engineering Physics Social Sciences Transportation

Description

The journal covers all aspects of applied engineering, applied Science and information technology, that is: Engineering: Energy Mechanical Engineering Computing and Artificial Intelligence Applied Biosciences and Bioengineering Environmental and Sustainable Science and Technology Quantum Science and ...

Article Info

Abstract

Synthetic Data for Business Intelligence: A New Paradigm for Privacy-Preserving Machine Learning in Enterprise Environments

Article Info

Abstract