Human Annotation

Human annotation, the process of labeling data for machine learning, is crucial but expensive and time-consuming. Current research focuses on mitigating this bottleneck through techniques like active learning, which prioritizes the most informative data points for human labeling, and the integration of large language models (LLMs) to automate or assist in the annotation process, including generating synthetic data or pre-annotating samples. These advancements aim to improve the efficiency and scalability of data annotation, ultimately accelerating the development and deployment of AI models across various domains, from natural language processing to medical image analysis. The resulting improvements in data quality and reduced annotation costs have significant implications for the broader AI research community and numerous practical applications.

132papers

Papers

May 22, 2025

The Language of Interoception: Examining Embodiment and Emotion Through a Corpus of Body Part Mentions
Body Part Underlying Emotion Emotion Lexicon Natural Language Self Awareness Large Corpus Different Physical Embodiment Human Annotation

May 15, 2025

Comparing LLM Text Annotation Skills: A Study on Human Rights Violations in Social Media Data
Study Feature Group Annotation Social Medium Data Large Language Model Language Model LLM Annotation Human Annotation Cross Lingual Human Right

May 6, 2025

DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral
Human Annotation Diverse Platform

May 3, 2025

MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
Large Scale Human Large Scale Dataset 3D Human Digitization Capturing Maneuver Human Centric 3D Human Human Annotation

April 27, 2025

AndroidGen: Building an Android Language Agent under Data Scarcity
Human Annotation Language Agent Data Scarcity Large Language Model Open Source LLM

April 26, 2025

A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification
Classification Code Ensemble Strategy LLM Inference Sentiment Analysis Human Annotation Large Language Model Robust Performance

April 14, 2025

Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Literature Review Scientific Paper Text Similarity Evaluation Protocol Information Seeking Human Annotation Review Generation

April 7, 2025

Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG
Human Annotation Efficient Annotation Retrieval Model Large Language Model Retrieval Enhanced App to App Retrieval Manual Effort

April 1, 2025

GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition
Human Annotation Software Suite Efficient Model Biomedical Named Entity Recognition Entity Recognition Named Entity Recognition Domain Annotation

March 10, 2025

Fully Unsupervised Annotation of C. Elegans
Microscopy Image Fluorescence Microscopy Human Annotation Cell Clustering Unsupervised Graph

March 3, 2025

Automated Annotation of Evolving Corpora for Augmenting Longitudinal Network Data: A Framework Integrating Large Language Models and Expert Knowledge
Human Annotation Longitudinal Analysis LLM Annotation Network Data Annotated Dataset Automatic Annotation

February 27, 2025

February 26, 2025

February 21, 2025

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs
Similarity Score Efficient Vision Language Model Human Annotation Large Vision Language Model

February 19, 2025

RLTHF: Targeted Human Feedback for LLM Alignment
Human Annotation Human Feedback Human Annotated RLHF V LLM Alignment Annotation Strategy Large Language Model

Human Annotation

Papers

The Language of Interoception: Examining Embodiment and Emotion Through a Corpus of Body Part Mentions

Comparing LLM Text Annotation Skills: A Study on Human Rights Violations in Social Media Data

DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral

MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization

AndroidGen: Building an Android Language Agent under Data Scarcity

A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification

Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol

Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG

GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition

Fully Unsupervised Annotation of C. Elegans

Automated Annotation of Evolving Corpora for Augmenting Longitudinal Network Data: A Framework Integrating Large Language Models and Expert Knowledge

VideoA11y: Method and Dataset for Accessible Video Description

Conformal Tail Risk Control for Large Language Model Alignment

Program Synthesis Dialog Agents for Interactive Decision-Making

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents

GLEAN: Generalized Category Discovery with Diverse and Quality-Enhanced LLM Feedback

From underwater to aerial: a novel multi-scale knowledge distillation approach for coral reef monitoring

Enhancing Human Evaluation in Machine Translation with Comparative Judgment

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs

RLTHF: Targeted Human Feedback for LLM Alignment