Beyond billion-parameter burdens: Unlocking data synthesis with a conditional generator
We present a novel privacy-preserving synthetic data generation algorithm that enables automatic topic-wise distribution matching, making it accessible even for resource-constrained AI applications. Generating large-scale differentially private (DP) synthetic data is challenging due to the fundamental privacy–computation–utility trade-off, where strong privacy guarantees can either hurt the quality of the synthetic data, or require large amounts of …