This test is used to compare the performance of a Machine Learning model trained on real data with an ML model trained on synthetic data. The test data (from the real dataset), type of model, and model parameters are kept constant for both models to have a fair comparison.

Utility Metrics

How to interpret machine learning model performance

We can analyze the models trained on real and synthetic data separately, by comparing their performance metrics. If the performance metrics are similar or have a small difference between the two datasets, it suggests that the synthetic data is utility-preserving and can be a viable substitute for real data. Conversely, if there is a significant drop in performance with synthetic data, it indicates a lack of utility or discrepancy between the datasets.

ML performance comparison of real and synthetic data.

	ML Performance on Real Data (AdaBoost Classifier)	ML performance on synthetic data (AdaBoost Classifier)	% performance loss in synthetic data
Accuracy	0.817	0.817	0 %
Precision Score
0.760	0.744	3 %
Recall Score	0.817	0.817	0 %
ROC AUC Score	0.743	0.733	1.3 %

In the table above, we compared real and synthetic data using AdaBoost Classifier. For a comprehensive evaluation, we can compare using multiple machine learning models.

Utility Metrics

Utility Metrics#

How to interpret machine learning model performance#

ML performance comparison of real and synthetic data.#

Utility Metrics

How to interpret machine learning model performance

ML performance comparison of real and synthetic data.