Text Classification
Text classification aims to automatically categorize text into predefined categories, driven by the need for efficient and accurate information processing across diverse domains. Current research focuses on leveraging large language models (LLMs) like BERT and Llama 2, often enhanced with techniques such as fine-tuning, data augmentation, and active learning, alongside traditional machine learning methods like SVMs and XGBoost. These advancements are improving the accuracy and efficiency of text classification, with significant implications for applications ranging from medical diagnosis and financial analysis to social media monitoring and legal research. A key challenge remains ensuring model robustness, interpretability, and fairness, particularly when dealing with imbalanced datasets or noisy labels.
Papers
Effects of term weighting approach with and without stop words removing on Arabic text classification
Esra'a Alhenawi, Ruba Abu Khurma, Pedro A. Castillo, Maribel G. Arenas
KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection
Michal Spiegel, Dominik Macko
STENCIL: Submodular Mutual Information Based Weak Supervision for Cold-Start Active Learning
Nathan Beck, Adithya Iyer, Rishabh Iyer
Text Categorization Can Enhance Domain-Agnostic Stopword Extraction
Houcemeddine Turki, Naome A. Etori, Mohamed Ali Hadj Taieb, Abdul-Hakeem Omotayo, Chris Chinenye Emezue, Mohamed Ben Aouicha, Ayodele Awokoya, Falalu Ibrahim Lawan, Doreen Nixdorf
APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT
Yiming Zhu, Zhizhuo Yin, Gareth Tyson, Ehsan-Ul Haq, Lik-Hang Lee, Pan Hui