Contents
Synthetic data is artificially generated data that is not collected from real world events. It replicates a real dataset, maintaining its properties regarding statistical fidelity and analytical utility and guaranteeing varying degrees of privacy. It can be used for:
Synthetic data is not, however, a total novelty. Techniques have existed for long, but have struggled with high dimensionality datasets, high cardinality categorical data, reliance on manual parameters and bias handling. Recent advances in the state-of-the-art have delivered improvements in these aspects.
<aside> 👉 To learn more about Synthetic Data, check out “Everything You Always Wanted to Know About Synthetic Data”.
</aside>
YData’s Synthesizers offer a simplified interface to train, asses the quality and interact with state-of-the-art Machine Learning models capable of generating data mimicking specific Data Catalog. Synthesizers are fully data-driven, unsupervised and automated, learning the underlying data distributions automatically while abstracting the complexity of creating and training these models (including infrastructure) through an easy-to-use interface.
The quality of synthetic data is very use-case dependent and should be evaluated in that context. As such, YData automatically generates a quality report describing each Synthesizer on its three fundamental axes of quality: