Long Input

Processing long input sequences is a major challenge for large language models (LLMs), hindering their application in various domains requiring extensive contextual information, such as long-document summarization and question answering. Current research focuses on developing efficient compression techniques, novel model architectures (e.g., graph-based agents, state-space models), and training strategies to mitigate the computational and memory limitations associated with long inputs. These advancements are crucial for improving the performance and applicability of LLMs in tasks involving extensive textual or multimodal data, ultimately impacting fields ranging from information retrieval to medical diagnosis.

Papers