Paper ID: 2207.13744

Lighting (In)consistency of Paint by Text

Hany Farid

Whereas generative adversarial networks are capable of synthesizing highly realistic images of faces, cats, landscapes, or almost any other single category, paint-by-text synthesis engines can -- from a single text prompt -- synthesize realistic images of seemingly endless categories with arbitrary configurations and combinations. This powerful technology poses new challenges to the photo-forensic community. Motivated by the fact that paint by text is not based on explicit geometric or physical models, and the human visual system's general insensitivity to lighting inconsistencies, we provide an initial exploration of the lighting consistency of DALL-E-2 synthesized images to determine if physics-based forensic analyses will prove fruitful in detecting this new breed of synthetic media.

Submitted: Jul 27, 2022

Topics

Generative Adversarial Network
Text Modality
Strong Consistency
Text to Image Synthesis
Synthesized Image
Image Forensics
Lighting Element
Synthetic Medium

Links

arXiv PDF