Synthetic Tabular Data

Synthetic tabular data generation aims to create artificial datasets that mimic the statistical properties of real data while addressing issues like data scarcity, privacy concerns, and bias. Current research focuses on improving the fidelity and utility of synthetic data using various generative models, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, and increasingly, large language models (LLMs), often incorporating techniques like transfer learning and conditional generation to enhance realism and preserve complex relationships between features. This field is significant because high-quality synthetic data can enable broader data sharing, augment limited datasets for improved machine learning model training, and facilitate research in sensitive domains where access to real data is restricted.

Papers

April 12, 2024

An improved tabular data generator with VAE-GMM integration
Patricia A. Apellániz, Juan Parras, Santiago Zazo
Generative Adversarial Network Variational Autoencoder Tabular Data Synthetic Tabular Data Gaussian Latent

March 27, 2024

DSF-GAN: DownStream Feedback Generative Adversarial Network
Oriel Perets, Nadav Rappoport
C Gan GAN Architecture Synthetic Tabular Data Synthetic Sample Adversarial Feedback

March 15, 2024

Structured Evaluation of Synthetic Tabular Data
Scott Cheng-Hsin Yang, Baxter Eaves, Michael Schmidt, Ken Swanson, Patrick Shafto
Synthetic Data Tabular Data Synthetic Data Generation Synthetic Tabular Data Synthetic Data Generator

January 24, 2024

Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare
Vibeke Binz Vallevik, Aleksandar Babic, Serena Elizabeth Marshall, Severin Elvatun, Helga Brøgger, Sharmini Alagaratnam, Bjørn Edwin, Narasimha Raghavan Veeraragavan, Anne Kjersti Befring, Jan Franz Nygård
Synthetic Data Tabular Data Healthcare System Trustworthy Artificial Intelligence Real Data Synthetic Tabular Data Quality Metric

January 1, 2024

Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective
Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng
Synthetic Data Data Centric Task Utility Synthetic Tabular Data Fidelity Reward Generative Training Synthetic Financial

December 19, 2023

Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data
Tobias Hyrup, Anton Danholt Lautrup, Arthur Zimek, Peter Schneider-Kamp
Synthetic Data Privacy Preserving General Principle Empirical Privacy Sharing Matter Synthetic Tabular Data Real Estate Appraisal

December 13, 2023

The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data
Alexander Decruyenaere, Heidelinde Dehaene, Paloma Rabaey, Christiaan Polet, Johan Decruyenaere, Stijn Vansteelandt, Thomas Demeester
Generative Model Synthetic Data Explanatory Inference Synthetic Tabular Data Naive Ensemble

November 29, 2023

Privacy Measurement in Tabular Synthetic Data: State of the Art and Future Research Directions
Alexander Boudewijn, Andrea Filippo Ferraris, Daniele Panfilo, Vanessa Cocca, Sabrina Zinutti, Karel De Schepper, Carlo Rossi Chauvenet
Synthetic Data Internal State Art Specific Information Research Direction Empirical Privacy Synthetic Tabular Data Informed Learning

October 30, 2023

MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation
Chandrani Kumari, Rahul Siddharthan
Synthetic Data Synthetic Data Generation Synthetic Tabular Data Synthetic Tabular Data Generation Heterogeneous Tabular Data

October 24, 2023

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng
Diffusion Model Tabular Data Synthetic Data Generation Auto Encoder Synthetic Tabular Data Sound Synthesizer Tabular Data Synthesis New Autodiff

July 28, 2023

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis
Conor Hassan, Robert Salomone, Kerrie Mengersen
Generative Model Differential Privacy Synthetic Data Generation Critical Synthesis Tabular Datasets Synthetic Tabular Data Privacy Sensitive

July 19, 2023

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation
Rodrigo Castellon, Achintya Gopal, Brian Bloniarz, David Rosenberg
Differential Privacy Synthetic Tabular Data Private Tabular Data

July 1, 2023

CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis
Abdallah Alshantti, Damiano Varagnolo, Adil Rasheed, Aria Rahmati, Frank Westad
Generative Adversarial Network Synthetic Data Adversarial Learning Synthetic Tabular Data Tabular Data Synthesis GAN Baseline

June 27, 2023

On the Usefulness of Synthetic Tabular Data Generation
Dionysis Manousakas, Sergül Aydöre
Data Augmentation Synthetic Data Synthetic Data Generation Synthetic Tabular Data Automatic Usefulness Prediction Synthetic Tabular Data Generation

June 24, 2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings
Nagarjuna Chereddy, Bharath Kumar Bolla
Training Data GAN Model Class Imbalance Task Utility GAN Based Synthetic Tabular Data Class Balancing Class Balancing Technique

April 9, 2023

Distributed Conditional GAN (discGAN) For Synthetic Healthcare Data Generation
David Fuentes, Diana McSpadden, Sodiq Adewole
Generative Adversarial Network GAN Model Multi Modal Synthetic Tabular Data Conditional GAN Synthetic Medical Data

February 4, 2023

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers
Aivin V. Solatorio, Olivier Dupriez
Transformer Megatron Decepticons Tabular Data Synthetic Tabular Data Tabular Transformer Synthetic Relational Relational Datasets

November 17, 2022

Permutation-Invariant Tabular Data Synthesis
Yujin Zhu, Zilong Zhao, Robert Birke, Lydia Y. Chen
Synthetic Data Synthetic Tabular Data Tabular Data Synthesis GAN Baseline

October 5, 2022

DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai, Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, Xia Hu
Reinforcement Learning Recommender System Unseen Task Synthetic Tabular Data

July 12, 2022

TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular Data
Vikram S Chundawat, Ayush K Tarun, Murari Mandal, Mukund Lahoti, Pratik Narang
Synthetic Data Tabular Data Synthetic Tabular Data Robust Evaluation Synthetic Tabular Data Generation

Synthetic Tabular Data

Papers

An improved tabular data generator with VAE-GMM integration

DSF-GAN: DownStream Feedback Generative Adversarial Network

Structured Evaluation of Synthetic Tabular Data

Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare

Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective

Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data

The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data

Privacy Measurement in Tabular Synthetic Data: State of the Art and Future Research Directions

MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis

On the Usefulness of Synthetic Tabular Data Generation

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

Distributed Conditional GAN (discGAN) For Synthetic Healthcare Data Generation

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

Permutation-Invariant Tabular Data Synthesis

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular Data