Paper ID: 2411.03522

Exploring the Potentials and Challenges of Using Large Language Models for the Analysis of Transcriptional Regulation of Long Non-coding RNAs

Wei Wang, Zhichao Hou, Xiaorui Liu, Xinxia Peng

Research on long non-coding RNAs (lncRNAs) has garnered significant attention due to their critical roles in gene regulation and disease mechanisms. However, the complexity and diversity of lncRNA sequences, along with the limited knowledge of their functional mechanisms and the regulation of their expressions, pose significant challenges to lncRNA studies. Given the tremendous success of large language models (LLMs) in capturing complex dependencies in sequential data, this study aims to systematically explore the potential and limitations of LLMs in the sequence analysis related to the transcriptional regulation of lncRNA genes. Our extensive experiments demonstrated promising performance of fine-tuned genome foundation models on progressively complex tasks. Furthermore, we conducted an insightful analysis of the critical impact of task complexity, model selection, data quality, and biological interpretability for the studies of the regulation of lncRNA gene expression.

Submitted: Nov 5, 2024