Contents

What is synthetic data?

Synthetic data is artificially generated data that is not collected from real world events. It replicates a real dataset, maintaining its properties regarding statistical fidelity and analytical utility and guaranteeing varying degrees of privacy. It can be used for:

Synthetic data is not, however, a total novelty. Techniques have existed for long, but have struggled with high dimensionality datasets, high cardinality categorical data, reliance on manual parameters and bias handling. Recent advances in the state-of-the-art have delivered improvements in these aspects.

<aside> 👉 To learn more about Synthetic Data, check out Everything You Always Wanted to Know About Synthetic Data”.

</aside>

And what are Synthesizers?

YData’s Synthesizers offer a simplified interface to train, asses the quality and interact with state-of-the-art Machine Learning models capable of generating data mimicking specific Data Catalog. Synthesizers are fully data-driven, unsupervised and automated, learning the underlying data distributions automatically while abstracting the complexity of creating and training these models (including infrastructure) through an easy-to-use interface.

The quality of synthetic data is very use-case dependent and should be evaluated in that context. As such, YData automatically generates a quality report describing each Synthesizer on its three fundamental axes of quality: