Text Classification
Text classification aims to automatically categorize text into predefined categories, driven by the need for efficient and accurate information processing across diverse domains. Current research focuses on leveraging large language models (LLMs) like BERT and Llama 2, often enhanced with techniques such as fine-tuning, data augmentation, and active learning, alongside traditional machine learning methods like SVMs and XGBoost. These advancements are improving the accuracy and efficiency of text classification, with significant implications for applications ranging from medical diagnosis and financial analysis to social media monitoring and legal research. A key challenge remains ensuring model robustness, interpretability, and fairness, particularly when dealing with imbalanced datasets or noisy labels.
Papers
Linear Classifier: An Often-Forgotten Baseline for Text Classification
Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, Chih-Jen Lin
Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs
Alejandro Peña, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Marcos Grande, Iñigo Puente, Jorge Cordova, Gonzalo Cordova
Efficient Shapley Values Estimation by Amortization for Text Classification
Chenghao Yang, Fan Yin, He He, Kai-Wei Chang, Xiaofei Ma, Bing Xiang
Analyzing Text Representations by Measuring Task Alignment
Cesar Gonzalez-Gutierrez, Audi Primadhanty, Francesco Cazzaro, Ariadna Quattoni
Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems
Ashim Gupta, Amrith Krishna
Cross Encoding as Augmentation: Towards Effective Educational Text Classification
Hyun Seung Lee, Seungtaek Choi, Yunsung Lee, Hyeongdon Moon, Shinhyeok Oh, Myeongho Jeong, Hyojun Go, Christian Wallraven
Machine Learning Approach for Cancer Entities Association and Classification
G. Jeyakodi, Arkadeep Pal, Debapratim Gupta, K. Sarukeswari, V. Amouda
Active Learning for Natural Language Generation
Yotam Perlitz, Ariel Gera, Michal Shmueli-Scheuer, Dafna Sheinwald, Noam Slonim, Liat Ein-Dor
Estimating class separability of text embeddings with persistent homology
Kostis Gourgoulias, Najah Ghalyan, Maxime Labonne, Yash Satsangi, Sean Moran, Joseph Sabelja
PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Yau-Shian Wang, Ta-Chung Chi, Ruohong Zhang, Yiming Yang
Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification
Chengyu Dong, Zihan Wang, Jingbo Shang