Conversational Corpus

Conversational corpora are collections of transcribed human conversations used to train and evaluate AI models for natural language understanding and generation. Current research focuses on creating corpora that are diverse in terms of language, geographic region, and demographic representation, as well as those grounded in structured knowledge bases like Wikidata to improve knowledge-based conversational AI. These corpora are crucial for advancing research in areas such as detecting cognitive impairment, building robust customer service applications, and developing more inclusive and accurate language models. The development of standardized tools and methodologies for corpus creation and analysis is also a key area of focus, enabling greater reproducibility and comparability of research findings.

Papers

December 24, 2024

Lla-VAP: LSTM Ensemble of Llama and VAP for Turn-Taking Prediction
Hyunbae Jeon, Frederic Guintu, Rayvant Sahni
LLaMa LlamaCare Preference Elicitation Conversational Corpus Turn Taking Prediction

October 29, 2023

EtiCor: Corpus for Analyzing LLMs for Etiquettes
Ashutosh Dwivedi, Pradhyumna Lavania, Ashutosh Modi
Medical LLM Large Corpus Network Sensitivity Social Norm Conversational Corpus

August 29, 2023

KGConv, a Conversational Corpus grounded in Wikidata
Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire Gardent
Conversational Data Wikidata Statement Conversational Question Conversational Corpus

February 14, 2023

TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments
Changye Li, Weizhe Xu, Trevor Cohen, Martin Michalowski, Serguei Pakhomov
Deep Learning Speech Analysis Easy to Use Toolkit Cognitive Impairment Dementia Detection Emergent Language Dementia Related Linguistic Anomaly Conversational Corpus

August 26, 2022

Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers
Jean-Philippe Corbeil, Mia Taige Li, Hadi Abdi Ghavidel
Transformer Megatron Decepticons Conversational Data Extractive Question Answering User Intent Multi Intent Multi Intent Attribute Aware Conversational Corpus

April 20, 2022

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus
Haewoon Kwak, Jisun An, Kunwoo Park
Large Corpus Specific Audience Korean Nationwide Daily Conversation Corpus Conversational Corpus

March 7, 2022

Building and curating conversational corpora for diversity-aware language science and technology
Andreas Liesenfeld, Mark Dingemanse
Conversational Context Technology Information Data Conversational Data Building PCC Interaction Data Conversational Corpus

Conversational Corpus

Papers

Lla-VAP: LSTM Ensemble of Llama and VAP for Turn-Taking Prediction

EtiCor: Corpus for Analyzing LLMs for Etiquettes

KGConv, a Conversational Corpus grounded in Wikidata

TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments

Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

Building and curating conversational corpora for diversity-aware language science and technology