Korean Nationwide Daily Conversation Corpus

Korean Nationwide Daily Conversation Corpora are large datasets of everyday Korean conversations used to train and evaluate natural language processing (NLP) models. Current research focuses on leveraging these corpora to improve conversational AI, particularly in areas like personality-based dialogue generation, emotion and causality recognition, and named entity recognition, often employing deep learning architectures such as Bi-LSTMs and Graph Neural Networks. These efforts aim to enhance the accuracy and naturalness of Korean language models for applications such as chatbots and knowledge base construction, while also addressing biases in data representation by analyzing demographic participation.

Papers

November 21, 2024

FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs
Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Sunghee Jung, Myeongcheol Shin
Language Model Comprehensive Evaluation Synthetic Dialogue Generative Capability Korean Nationwide Daily Conversation Corpus

April 1, 2024

PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models
Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang, Kyung-Ah Sohn
Persona Based Dialogue Conversation Generation Korean Nationwide Daily Conversation Corpus

March 16, 2024

ECRC: Emotion-Causality Recognition in Korean Conversation for GCN
J. K. Lee, T. M. Chung
Multi Task Learning Sentence Representation Causal Emotion Entailment Quater GCN Korean Nationwide Daily Conversation Corpus

February 27, 2024

KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark
Seongbo Jang, Seonghyeon Lee, Hwanjo Yu
Language Model Conversational Context Dialogue Benchmark Dialogue Comprehension Korean Nationwide Daily Conversation Corpus

May 10, 2023

Korean Named Entity Recognition Based on Language-Specific Features
Yige Chen, KyungTae Lim, Jungyeul Park
Entity Recognition Language Specific Korean Language Morpheme Based Korean Nationwide Daily Conversation Corpus

September 1, 2022

KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks
Gyeongmin Kim, Jinsung Kim, Junyoung Son, Heuiseok Lim
Entity Recognition Entity Typing Korean Nationwide Daily Conversation Corpus

June 29, 2022

OASYS: Domain-Agnostic Automated System for Constructing Knowledge Base from Unstructured Text
Minsang Kim, Sang-hyun Je, Eunjoo Park
Knowledge Base Unstructured Text Human Annotated Domain Agnostic Knowledge Base Construction Korean Nationwide Daily Conversation Corpus

April 20, 2022

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus
Haewoon Kwak, Jisun An, Kunwoo Park
Large Corpus Specific Audience Korean Nationwide Daily Conversation Corpus Conversational Corpus

Korean Nationwide Daily Conversation Corpus

Papers

FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs

PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models

ECRC: Emotion-Causality Recognition in Korean Conversation for GCN

KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

Korean Named Entity Recognition Based on Language-Specific Features

KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks

OASYS: Domain-Agnostic Automated System for Constructing Knowledge Base from Unstructured Text

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus