Long Text

Research on long text focuses on enabling large language models (LLMs) to effectively process and generate extended textual content, overcoming limitations of traditional transformer architectures. Current efforts concentrate on improving efficiency through optimized tokenization, novel attention mechanisms (like sparse attention and multi-kernel transformers), and techniques for semantic compression to handle longer sequences. This work is crucial for advancing numerous NLP applications, including improved machine translation, relation extraction from lengthy documents, and more accurate and efficient factual text generation.

58papers

Papers - Page 2

October 30, 2024

Long²RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall
Long Context Retrieval Augmented Generation Large Language Model Long Context Retrieval Long Span Long Text

October 23, 2024

Large Language Models Still Exhibit Bias in Long Text
Publication Bias Long Text Text Generation Long Form Generation Fairness Benchmark Large Language Model

October 12, 2024

LLM\timesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Language Model Long Sequence Processing Content Analysis Hadoop MapReduce Long Context Large Language Model Long Text Large Language Model

October 8, 2024

Integrating Planning into Single-Turn Long-Form Text Generation
Integrated Planning Text Generation Long Text

October 6, 2024

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model
Model Representation Large Language Model Better Representation Online Tokenizer Long Text

October 4, 2024

Detecting Machine-Generated Long-Form Content with Latent-Space Variables
Large Language Model Long Text Data Detection Latent Space Generated Text Paragraph Speech

September 15, 2024

Entity-Aware Self-Attention and Contextualized GCN for Enhanced Relation Extraction in Long Sentences
Sentence Entity Graph Entity Attention Relation Extraction Entity Representation Long Text

September 10, 2024

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
Language Model Long Text Step by Step Preliminary Study Translation Process Machine Translation Translation Quality

August 5, 2024

Long Input Benchmark for Russian Analysis
Long Document Natural Language Processing Long Context Understanding Long Text Long Input Scroll Benchmark Intelligent Analysis

July 25, 2024

Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
KV Cache Compression Narrative Review LLM Based KV Cache Key Value Cache Hidden CoST NCD Method Long Text

June 28, 2024

SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison
Text Data Human Language Large Language Model Low Priority Long Text Generated Text LLM Generated Text New Machine

June 27, 2024

June 26, 2024

MATE: Meet At The Embedding -- Connecting Images with Long Texts
Jina Embeddings Long Text Vision Language Model Cross Modal Retrieval Benchmark LLM Embeddings Image Text Pair

June 19, 2024

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
Long Text Attention Uncertainty Attention Pattern Long Context LLM

June 17, 2024

What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling
Different Type Language Model Large Language Model Long Text Long Context Language General Analysis Long Context

May 20, 2024

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Multi Agent Collaboration Long Text Human Translation Literary Text Translation Quality Monolingual Text Translation Process Machine Translation

May 11, 2024

Length-Aware Multi-Kernel Transformer for Long Document Classification
Long Text Kernel Transformer Long Document Classification Neural Language Model

May 4, 2024

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Long Document Relation Extraction Long Text Attribute Value Extraction Retrieval Augmentation Large Language Model Retrieval Augmented Language Model

Long Text

Papers - Page 2

Long²RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Large Language Models Still Exhibit Bias in Long Text

LLM\timesMapReduce: Simplified Long-Sequence Processing using Large Language Models

Integrating Planning into Single-Turn Long-Form Text Generation

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Entity-Aware Self-Attention and Contextualized GCN for Enhanced Relation Extraction in Long Sentences

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts

Long Input Benchmark for Russian Analysis

Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption

SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

Suri: Multi-constraint Instruction Following for Long-form Text Generation

VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation

LongLaMP: A Benchmark for Personalized Long-form Text Generation

MATE: Meet At The Embedding -- Connecting Images with Long Texts

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Length-Aware Multi-Kernel Transformer for Long Document Classification

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

Long Text

Papers - Page 2

Long2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Large Language Models Still Exhibit Bias in Long Text

LLM\timesMapReduce: Simplified Long-Sequence Processing using Large Language Models

Integrating Planning into Single-Turn Long-Form Text Generation

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Entity-Aware Self-Attention and Contextualized GCN for Enhanced Relation Extraction in Long Sentences

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts

Long Input Benchmark for Russian Analysis

Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption

SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

Suri: Multi-constraint Instruction Following for Long-form Text Generation

VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation

LongLaMP: A Benchmark for Personalized Long-form Text Generation

MATE: Meet At The Embedding -- Connecting Images with Long Texts

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Length-Aware Multi-Kernel Transformer for Long Document Classification

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

Long²RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall