Betterdata Docs
  1. Utility
Betterdata Docs
  • Getting Started
    • Introduction
    • Quickstart
  • Metrics Guide
    • Syntactical Accuracy
      • Syntactical Accuracy Metrics
    • Statistical Similarity
      • Statistical Similarity Summary
      • Statistical Similarity Metrics
    • Utility
      • Utility Summary
      • Utility Metrics
    • Privacy
      • Distance-based
      • Privacy Attacks
  1. Utility

Utility Metrics

This test is used to compare the performance of a Machine Learning model trained on real data with an ML model trained on synthetic data. The test data (from the real dataset), type of model, and model parameters are kept constant for both models to have a fair comparison.

Utility Metrics#

How to interpret machine learning model performance#

We can analyze the models trained on real and synthetic data separately, by comparing their performance metrics. If the performance metrics are similar or have a small difference between the two datasets, it suggests that the synthetic data is utility-preserving and can be a viable substitute for real data. Conversely, if there is a significant drop in performance with synthetic data, it indicates a lack of utility or discrepancy between the datasets.

ML performance comparison of real and synthetic data.#

ML Performance on Real Data (AdaBoost Classifier)ML performance on synthetic data (AdaBoost Classifier)% performance loss in synthetic data
Accuracy0.8170.8170 %
Precision Score
0.7600.7443 %
Recall Score0.8170.8170 %
ROC AUC Score0.7430.7331.3 %
In the table above, we compared real and synthetic data using AdaBoost Classifier. For a comprehensive evaluation, we can compare using multiple machine learning models.
Modified at 2023-08-29 05:43:17
Previous
Utility Summary
Next
Distance-based
Built with