The growing demand for data-driven decision-making in the enterprise context poses a conflict between the utilisation of machine learning (ML) and data privacy. The paper examines the feasibility of using synthetic data to replace actual enterprise data in business intelligence (BI) applications. Synthetic datasets were created using the CTGAN, Variational Autoencoders (VAE), and diffusion models and were successfully assessed in fraud detection and customer segmentation tasks. Empirical findings indicate that XGBoost with synthetic data as training data achieved an accuracy value of 97 percent, with an ROC AUC of 0.94, which is relatively close to the achievable accuracy with real data. CTGAN was found to have high fidelity as the Wasserstein distances were less than 0.15, and the Jensen-Shannon divergence was less than 0.08. The visualisations of dimensionality reductions ensured that the real and synthetic data had a substantial structural similarity. Privacy analyses revealed that the Nearest Neighbour Adversarial Distance (NNAD) scores differed between CTGAN and diffusion models, with values of 0.38 and 0.36, respectively. Corresponding Membership Inference Attack (MIA) success rates were 51-52%, which is significantly lower than the 68% success rate of the anonymised real data. These findings confirm the consideration that synthetic data can maintain analytical value and diminish privacy risks, providing an effective approach to the safe and scalable implementation of ML in businesses.
Copyrights © 2025