Paper ID: 2304.03536

Leveraging GANs for data scarcity of COVID-19: Beyond the hype

Hazrat Ali, Christer Gronlund, Zubair Shah

Artificial Intelligence (AI)-based models can help in diagnosing COVID-19 from lung CT scans and X-ray images; however, these models require large amounts of data for training and validation. Many researchers studied Generative Adversarial Networks (GANs) for producing synthetic lung CT scans and X-Ray images to improve the performance of AI-based models. It is not well explored how good GAN-based methods performed to generate reliable synthetic data. This work analyzes 43 published studies that reported GANs for synthetic data generation. Many of these studies suffered data bias, lack of reproducibility, and lack of feedback from the radiologists or other domain experts. A common issue in these studies is the unavailability of the source code, hindering reproducibility. The included studies reported rescaling of the input images to train the existing GANs architecture without providing clinical insights on how the rescaling was motivated. Finally, even though GAN-based methods have the potential for data augmentation and improving the training of AI-based models, these methods fall short in terms of their use in clinical practice. This paper highlights research hotspots in countering the data scarcity problem, identifies various issues as well as potentials, and provides recommendations to guide future research. These recommendations might be useful to improve acceptability for the GAN-based approaches for data augmentation as GANs for data augmentation are increasingly becoming popular in the AI and medical imaging research community.

Submitted: Apr 7, 2023