This test is used to compare the performance of a Machine Learning model trained on real data with an ML model trained on synthetic data. The test data (from the real dataset), type of model, and model parameters are kept constant for both models to have a fair comparison.Utility Metrics#
We can analyze the models trained on real and synthetic data separately, by comparing their performance metrics. If the performance metrics are similar or have a small difference between the two datasets, it suggests that the synthetic data is utility-preserving and can be a viable substitute for real data. Conversely, if there is a significant drop in performance with synthetic data, it indicates a lack of utility or discrepancy between the datasets. | ML Performance on Real Data (AdaBoost Classifier) | ML performance on synthetic data (AdaBoost Classifier) | % performance loss in synthetic data |
---|
Accuracy | 0.817 | 0.817 | 0 % |
Precision Score | | | |
0.760 | 0.744 | 3 % | |
Recall Score | 0.817 | 0.817 | 0 % |
ROC AUC Score | 0.743 | 0.733 | 1.3 % |
In the table above, we compared real and synthetic data using AdaBoost Classifier. For a comprehensive evaluation, we can compare using multiple machine learning models. Modified at 2023-08-29 05:43:17