Data Provenance
Data provenance, the ability to trace the origin and transformations of data, is crucial for ensuring data quality, accountability, and trustworthiness across diverse applications. Current research focuses on developing robust methods for tracking provenance in various data types, including text, images, and code, often employing techniques like watermarking, blockchain technologies, and graph neural networks to enhance traceability and security. This field is vital for addressing challenges in areas such as AI model transparency, copyright protection, and scientific reproducibility, ultimately fostering greater trust and reliability in data-driven systems.
Papers
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker
Semiring Provenance for Lightweight Description Logics
Camille Bourgaux, Ana Ozaki, Rafael Peñaloza