Sign Up#
Create a new project#
Go to the "projects" tab to create a new project.Add data#
In the new project, add your csv dataset.Select your csv dataset from your computer.You should see your dataset in that project.Create a new synthetic data model#
Choose model#
Now, we will create a synthetic data generation model using your uploaded dataset.Click the "Create new model" button.Select a model type. Choose GAN for a simple dataset or LLM2 for a complex dataset. Do not use LLM1/LLM3 for now as it is in experimental stage.A simple dataset would have less than 10k rows and less than 30 columns with numerical and categotical features only.A complex dataset would have less than 500k rows and less than 200 columns with numerical, categorical and special characters (e.g. !(#$%&*()).
Select dataset#
Select your dataset and preview it.Set data understander params#
Step 4A is to set parameters for the automatic data understander that learns the format and structure of your dataset such as data types, missing values, categorical cardinality threshold and so forth.You may click "Learn Data Parameters" for this run which will use the default parameters. However, you may read the information icon "i" for each parameter to learn more about how it affects how the data understander learns the format and structure of your dataset. This process may take 1-10 minutes depending on the size of your dataset.Step 4B shows the format and structure of your dataset. You may now review and edit each column based on the business logic of your dataset.Validate the 4B parameters to ensure what you have changed fulfils the format required.Set model params and train#
Step 5 is where you may edit the model parameters used for training the synthetic data model.For a faster run, you may set a lower number of epochs for the model.For the GAN model, we recommend between 200-500 epochs.For the LLM2, we recommend 10 epochs for a small dataset and 3 epochs for a complex dataset.A simple dataset would have less than 10k rows and less than 30 columns with numerical and categotical features only.A complex dataset would have less than 500k rows and less than 200 columns with numerical, categorical and special characters (e.g. !(#$%&*()).
Validate the training parameters and start training. We recommend to use GPU if you have one as it will be between 10-50 times faster depending on the speed of GPU available.LLM requires GPU while GAN can be ran on either CPU or GPU (recommended).Once the training has successfully started, you may view the progress in the same page as logs.You may also stop the training to edit any parameters if needed. This will restart the training process.Download synthetic data & report#
Once the training is completed, you will be directed to a generation page.Here, you are able to download the synthetic data and quality assurance report.Generate more data#
You are also able to generate more synthetic data where you can select the number of new rows to generate.Modified at 2024-06-19 05:31:51