Betterdata Docs
  1. Getting Started
Betterdata Docs
  • Getting Started
    • Introduction
    • Quickstart
  • Metrics Guide
    • Syntactical Accuracy
      • Syntactical Accuracy Metrics
    • Statistical Similarity
      • Statistical Similarity Summary
      • Statistical Similarity Metrics
    • Utility
      • Utility Summary
      • Utility Metrics
    • Privacy
      • Distance-based
      • Privacy Attacks
  1. Getting Started

Quickstart

Sign Up#

Go to https://app.betterdata.ai/betterdata/signin to sign up for an account using your email and password.
You may email contact@betterdata.ai to approve your trial account.
image.png

Create a new project#

Go to the "projects" tab to create a new project.
image.png

Add data#

In the new project, add your csv dataset.
image.png
Select your csv dataset from your computer.
image.png
You should see your dataset in that project.
image.png

Create a new synthetic data model#

Choose model#

Now, we will create a synthetic data generation model using your uploaded dataset.
Click the "Create new model" button.
image.png
Select a model type. Choose GAN for a simple dataset or LLM2 for a complex dataset. Do not use LLM1/LLM3 for now as it is in experimental stage.
A simple dataset would have less than 10k rows and less than 30 columns with numerical and categotical features only.
A complex dataset would have less than 500k rows and less than 200 columns with numerical, categorical and special characters (e.g. !(#$%&*()).
image.png

Select dataset#

Select your dataset and preview it.
image.png

Set data understander params#

Step 4A is to set parameters for the automatic data understander that learns the format and structure of your dataset such as data types, missing values, categorical cardinality threshold and so forth.
image.png
You may click "Learn Data Parameters" for this run which will use the default parameters. However, you may read the information icon "i" for each parameter to learn more about how it affects how the data understander learns the format and structure of your dataset. This process may take 1-10 minutes depending on the size of your dataset.
image.png

Verify dataset metadata#

Step 4B shows the format and structure of your dataset. You may now review and edit each column based on the business logic of your dataset.
image.png
Validate the 4B parameters to ensure what you have changed fulfils the format required.
image.png

Set model params and train#

Step 5 is where you may edit the model parameters used for training the synthetic data model.
For a faster run, you may set a lower number of epochs for the model.
For the GAN model, we recommend between 200-500 epochs.
For the LLM2, we recommend 10 epochs for a small dataset and 3 epochs for a complex dataset.
A simple dataset would have less than 10k rows and less than 30 columns with numerical and categotical features only.
A complex dataset would have less than 500k rows and less than 200 columns with numerical, categorical and special characters (e.g. !(#$%&*()).
image.png
Validate the training parameters and start training. We recommend to use GPU if you have one as it will be between 10-50 times faster depending on the speed of GPU available.
LLM requires GPU while GAN can be ran on either CPU or GPU (recommended).
image.png
Once the training has successfully started, you may view the progress in the same page as logs.
You may also stop the training to edit any parameters if needed. This will restart the training process.
image.png

Download synthetic data & report#

Once the training is completed, you will be directed to a generation page.
Here, you are able to download the synthetic data and quality assurance report.
image.png

Generate more data#

You are also able to generate more synthetic data where you can select the number of new rows to generate.
image.png
Modified at 2024-06-19 05:31:51
Previous
Introduction
Next
Syntactical Accuracy Metrics
Built with