Paper ID: 2406.15465
RadEx: A Framework for Structured Information Extraction from Radiology Reports based on Large Language Models
Daniel Reichenpfader, Jonas Knupp, André Sander, Kerstin Denecke
Annually and globally, over three billion radiography examinations and computer tomography scans result in mostly unstructured radiology reports containing free text. Despite the potential benefits of structured reporting, its adoption is limited by factors such as established processes, resource constraints and potential loss of information. However, structured information would be necessary for various use cases, including automatic analysis, clinical trial matching, and prediction of health outcomes. This study introduces RadEx, an end-to-end framework comprising 15 software components and ten artifacts to develop systems that perform automated information extraction from radiology reports. It covers the complete process from annotating training data to extracting information by offering a consistent generic information model and setting boundaries for model development. Specifically, RadEx allows clinicians to define relevant information for clinical domains (e.g., mammography) and to create report templates. The framework supports both generative and encoder-only models and the decoupling of information extraction from template filling enables independent model improvements. Developing information extraction systems according to the RadEx framework facilitates implementation and maintenance as components are easily exchangeable, while standardized artifacts ensure interoperability between components.
Submitted: Jun 14, 2024