Paper ID: 2112.03710

CapsProm: A Capsule Network For Promoter Prediction

Lauro Moraes, Pedro Silva, Eduardo Luz, Gladston Moreira

Locating the promoter region in DNA sequences is of paramount importance in the field of bioinformatics. This is a problem widely studied in the literature, however, not yet fully resolved. Some researchers have presented remarkable results using convolution networks, that allowed the automatic extraction of features from a DNA chain. However, a universal architecture that could generalize to several organisms has not yet been achieved, and thus, requiring researchers to seek new architectures and hyperparameters for each new organism evaluated. In this work, we propose a versatile architecture, based on capsule network, that can accurately identify promoter sequences in raw DNA data from seven different organisms, eukaryotic, and prokaryotic. Our model, the CapsProm, could assist in the transfer of learning between organisms and expand its applicability. Furthermore the CapsProm showed competitive results, overcoming the baseline method in five out of seven of the tested datasets (F1-score). The models and source code are made available at https://github.com/lauromoraes/CapsNet-promoter.

Submitted: Dec 7, 2021