Machine learning models are widely used to predict the mechanical properties of aluminum alloys. However, their accuracy is often hindered by the scarcity of high-quality tensile test data, as experimental data collection is costly and time-consuming. To address this limitation, this study employs Generative Adversarial Networks (GANs) to generate synthetic tensile test data for aluminum alloys, improving the accuracy of predictive models. The dataset consists of 200 real samples containing the compositions of nine chemical elements and two mechanical properties-Yield Strength (YS) and Ultimate Tensile Strength (UTS). A trained GAN model was used to generate 1,000 synthetic samples, whose statistical similarity to the original dataset was validated using the Kolmogorov-Smirnov (KS) test and Pearson correlation analysis. The results confirmed that all synthetic variables retained similar distributions and correlation patterns to the original dataset. To evaluate the impact of synthetic data on predictive accuracy, three machine learning algorithms-Random Forest Regressor (RF), Gradient Boosting Regressor (GBR), and Ada Boost Regressor (ABR)-were tested under two training schemes: (1) synthetic data for training and real data for testing and (2) real data for both training and testing. The RF model showed the highest improvement in UTS prediction, with reductions of 38.3% and 46.3% in Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The GBR model exhibited notable enhancements in YS prediction, with MAE and RMSE reductions of22.5% and 28.3%. These results demonstrate that GAN-generated synthetic data is highly effective in improving machine learning predictions of aluminum alloy properties, particularly when experimental data is limited.
Copyrights © 2025