Paper ID: 2410.00182

Zero-Shot Classification of Crisis Tweets Using Instruction-Finetuned Large Language Models

Emma McDaniel, Samuel Scheele, Jeff Liu

Social media posts are frequently identified as a valuable source of open-source intelligence for disaster response, and pre-LLM NLP techniques have been evaluated on datasets of crisis tweets. We assess three commercial large language models (OpenAI GPT-4o, Gemini 1.5-flash-001 and Anthropic Claude-3-5 Sonnet) capabilities in zero-shot classification of short social media posts. In one prompt, the models are asked to perform two classification tasks: 1) identify if the post is informative in a humanitarian context; and 2) rank and provide probabilities for the post in relation to 16 possible humanitarian classes. The posts being classified are from the consolidated crisis tweet dataset, CrisisBench. Results are evaluated using macro, weighted, and binary F1-scores. The informative classification task, generally performed better without extra information, while for the humanitarian label classification providing the event that occurred during which the tweet was mined, resulted in better performance. Further, we found that the models have significantly varying performance by dataset, which raises questions about dataset quality.

Submitted: Sep 30, 2024