State of the Art Large
Research on large language models (LLMs) currently focuses on rigorously evaluating their capabilities across diverse domains and tasks, including mathematical reasoning, foreign language comprehension, and specialized professional exams. This involves developing new benchmarks and evaluation methodologies, often incorporating techniques like chain-of-thought prompting and knowledge retrieval to improve performance and assess reasoning processes. These efforts aim to understand LLMs' strengths and weaknesses, ultimately leading to more reliable and trustworthy models with broader applicability in various fields, from healthcare and education to aerospace engineering and cybersecurity.
Papers
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues
Efficient Prompting for LLM-based Generative Internet of Things
Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit Niyato, Mohsen Guizani