Data Science Pipeline

Data science pipelines automate the process of extracting knowledge from data, encompassing data ingestion, analysis, visualization, and reporting. Current research emphasizes improving accessibility and usability through natural language interfaces and automated machine learning (AutoML), particularly focusing on integrating large language models (LLMs) and autonomous agents to streamline workflows. This focus aims to enhance reproducibility, transparency, and efficiency in data science, ultimately enabling broader application across diverse fields like medicine and fact-checking, while also addressing challenges like ensuring safety and interpretability of results.

Papers