Full Model
"Full Model" research encompasses the development and improvement of large-scale machine learning models across diverse applications, aiming to enhance performance, efficiency, and robustness. Current research focuses on addressing model vulnerabilities (e.g., adversarial attacks, hallucinations), improving efficiency for resource-constrained devices, and developing specialized models for specific domains (e.g., finance, astronomy, medical imaging). This work is significant for advancing AI capabilities in various fields and for mitigating potential risks associated with deploying complex models in real-world settings.
Papers
The Only Way is Ethics: A Guide to Ethical Research with Large Language Models
Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models
Patrick Haller, Jonas Golde, Alan Akbik
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge García-Carrasco, Alejandro Maté, Juan Trujillo
From Model Based to Learned Regularization in Medical Image Registration: A Comprehensive Review
Anna Reithmeir, Veronika Spieker, Vasiliki Sideri-Lampretsa, Daniel Rueckert, Julia A. Schnabel, Veronika A. Zimmer
Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models
Zhisheng Tang, Mayank Kejriwal
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use
Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, Zhengyin Du
Quantifying detection rates for dangerous capabilities: a theoretical model of dangerous capability evaluations
Paolo Bova, Alessandro Di Stefano, The Anh Han
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand, Peiran Yu, Shangqian Gao, Gowthami Somepalli, Tom Goldstein, Heng Huang
Fundamental Risks in the Current Deployment of General-Purpose AI Models: What Have We (Not) Learnt From Cybersecurity?
Mario Fritz
ResoFilter: Rine-grained Synthetic Data Filtering for Large Language Models through Data-Parameter Resonance Analysis
Zeao Tu, Xiangdi Meng, Yu He, Zihan Yao, Tianyu Qi, Jun Liu, Ming Li
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai, Ruimao Zhang
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
Shange Tang, Jiayun Wu, Jianqing Fan, Chi Jin
Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems
Genki Kusano, Kosuke Akimoto, Kunihiro Takeoka
A Unifying Information-theoretic Perspective on Evaluating Generative Models
Alexis Fox, Samarth Swarup, Abhijin Adiga
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust
On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models
Aditya Bhaskara, Agastya Vibhuti Jha, Michael Kapralov, Naren Sarayu Manoj, Davide Mazzali, Weronika Wrzos-Kaminska
The Multiplex Classification Framework: optimizing multi-label classifiers through problem transformation, ontology engineering, and model ensembling
Mauro Nievas Offidani, Facundo Roffet, Claudio Augusto Delrieux, Maria Carolina Gonzalez Galtier, Marcos Zarate
jinns: a JAX Library for Physics-Informed Neural Networks
Hugo Gangloff, Nicolas Jouvin
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model
Yuqiu Liu, Jingxuan Xu, Mauricio Soroco, Yunchao Wei, Wuyang Chen