Differentially-Private Synthetic Data Generation
Research into generating realistic synthetic tabular data with formal differential-privacy guarantees, evaluating the trade-off between utility and privacy versus real-data baselines.
How to build it — step by step
- 1Generators: Implement DP generative models (e.g. DP-GAN/PATE) for tabular data.
- 2Privacy accounting: Track the privacy budget (ε, δ) precisely during training.
- 3Utility metrics: Measure statistical fidelity and downstream ML performance on synthetic vs real data.
- 4Attacks: Run membership-inference attacks to empirically validate the privacy guarantees.
Key features to implement
- ✓DP synthetic data generation
- ✓Privacy-budget accounting
- ✓Utility-vs-privacy analysis
- ✓Membership-inference evaluation
- ✓Benchmark against real data
💡 Unique twist to stand out
Propose a metric that jointly scores fidelity and privacy, and chart the Pareto frontier across generators.
🎓 What you'll learn
Differential privacy, generative models, privacy attacks, and rigorous evaluation methodology.