Paper ID: 2405.16097

Apply Distributed CNN on Genomics to accelerate Transcription-Factor TAL1 Motif Prediction

Tasnim Assali, Zayneb Trabelsi Ayoub, Sofiane Ouni

Big Data works perfectly along with Deep learning to extract knowledge from a huge amount of data. However, this processing could take a lot of training time. Genomics is a Big Data science with high dimensionality. It relies on deep learning to solve complicated problems in certain diseases like cancer by using different DNA information such as the transcription factor. TAL1 is a transcription factor that is essential for the development of hematopoiesis and of the vascular system. In this paper, we highlight the potential of deep learning in the field of genomics and its challenges such as the training time that takes hours, weeks, and in some cases months. Therefore, we propose to apply a distributed deep learning implementation based on Convolutional Neural Networks (CNN) that showed good results in decreasing the training time and enhancing the accuracy performance with 95% by using multiple GPU and TPU as accelerators. We proved the efficiency of using a distributed strategy based on data-parallelism in predicting the transcription-factor TAL1 motif faster.

Submitted: May 25, 2024