Andrea Galloni
A Novel Evaluation Metric for Synthetic Data Generation
Galloni, Andrea; Lendák, Imre; Horváth, Tomáš
Authors
Imre Lendák
Tomáš Horváth
Abstract
Differentially private algorithmic synthetic data generation (SDG) solutions take input datasets Dp consisting of sensitive, private data and generate synthetic data Ds with similar qualities. The importance of such solutions is increasing both because more and more people realize how much data is collected about them and used in machine learning contexts, as well as a consequence of newly introduced data privacy regulations, e.g. the EU’s General Data Protection Regulation (GDPR). We aim to develop a novel and composite SDG evaluation metric which takes into account macro-statistical dataset similarities and data utility in machine learning tasks against privacy boundaries of the synthetic data. We formalize the mathematical foundations for quantitatively measuring both the statistical similarities and the data utility of synthetic data. We use two well-known datasets containing (potentially) personally identifiable information as inputs (Dp) and existing SDG algorithms PrivBayes and DPGroupFields to generate synthetic data (Ds) based on them. We then test our evaluation metric for different values of privacy budget . Based on our experiments we conclude that the proposed composite evaluation metric is appropriate for quantitatively measuring the quality of synthetic data generated by different SDG solutions and possesses an expected sensitivity to various privacy budget values.
Citation
Galloni, A., Lendák, I., & Horváth, T. (2020, November). A Novel Evaluation Metric for Synthetic Data Generation. Presented at IDEAL 2020: 21st International Conference on Intelligent Data Engineering and Automated Learning, Guimarães, Portugal
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | IDEAL 2020: 21st International Conference on Intelligent Data Engineering and Automated Learning |
Start Date | Nov 4, 2020 |
End Date | Nov 6, 2020 |
Online Publication Date | Oct 27, 2020 |
Publication Date | 2020 |
Deposit Date | Apr 8, 2024 |
Publisher | Springer |
Pages | 25-34 |
Series Title | Lecture Notes in Computer Science |
Series Number | 12490 |
Series ISSN | 0302-9743 |
Book Title | Intelligent Data Engineering and Automated Learning – IDEAL 2020: 21st International Conference, Guimaraes, Portugal, November 4–6, 2020, Proceedings, Part II |
ISBN | 9783030623647 |
DOI | https://doi.org/10.1007/978-3-030-62365-4_3 |
Keywords | Synthetic data generation, Differential privacy, Evaluation metrics |
Public URL | http://researchrepository.napier.ac.uk/Output/3587400 |
You might also like
Dynamic noise filtering for multi-class classification of beehive audio data
(2022)
Journal Article
Linear Concept Approximation for Multilingual Document Recommendation
(2021)
Book Chapter
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search