ILKOMNIKA: Journal of Computer Science and Applied Informatics
Vol 6 No 2 (2024): Volume 6, Nomor 2, Agustus 2024

Integrating Machine Learning Utility in Tabular Data Synthesizer Training using Loss Function Learning

Nur, Muhammad Rizqi (Unknown)
Indraswari, Rarasmaya (Unknown)



Article Info

Publish Date
30 Aug 2024

Abstract

Machine learning (ML) utility has been the main evaluation metrics for data synthesizers. However, because ML utility cannot be simply calculated, none of the previous synthesizers were trained to reach the same level of ML utility as a training objective. This study aims to integrate ML utility into data synthesizer training using a transformer-based model as a learned loss function. The transformer was trained to estimate ML utility of synthetic datasets, then it’s integrated by backpropagating the difference between estimated and expected value. The integration has significantly improved the average ML utility of LCT-GAN and Realtabformer. The ML utility of LCT-GAN improved by 0.0158 for Contraceptive dataset, 0.031 for Insurance dataset, and 0.0561 for Treatment dataset. The ML utility of Realtabformer improved by 0.02 for Contraceptive dataset and 0.0024 for Insurance dataset. The increase affects the dataset distribution, correlation between features, and privacy, but the direction varies. Correlation coefficients indicate that synthetic data distribution gets closer to real data as ML utility improves. In addition to ML utility integration, this study has also shown that patterns between rows in a dataset can be learned, so better synthesizers can be developed based on them.

Copyrights © 2024






Journal Info

Abbrev

ilkomnika

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

ILKOMNIKA: Journal of Computer and Applied Informatics is is a peer reviewed open-access journal. The journal invites scientists and engineers throughout the world to exchange and disseminate theoretical and practice-oriented topics of computer science and applied informatics which covers five (5) ...