Paper ID: 2409.02864
Language Model Powered Digital Biology
Joshua Pickard, Marc Andrew Choi, Natalie Oliven, Cooper Stansbury, Jillian Cwycyshyn, Nicholas Galioto, Alex Gorodetsky, Alvaro Velasquez, Indika Rajapakse
Recent advancements in Large Language Models (LLMs) are transforming biology, computer science, and many other research fields, as well as impacting everyday life. While transformer-based technologies are currently being deployed in biology, no available agentic system has been developed to tackle bioinformatics workflows. We present a prototype Bioinformatics Retrieval Augmented Data (BRAD) digital assistant. BRAD is a chatbot and agentic system that integrates a suite of tools to handle bioinformatics tasks, from code execution to online search. We demonstrate its capabilities through (1) improved question-and-answering with retrieval augmented generation (RAG), (2) the ability to run complex software pipelines, and (3) the ability to organize and distribute tasks in agentic workflows. We use BRAD for automation, performing tasks ranging from gene enrichment and searching the archive to automatic code generation for running biomarker identification pipelines. BRAD is a step toward autonomous, self-driving labs for digital biology.
Submitted: Sep 4, 2024