Paper ID: 2503.14492 • Published Mar 18, 2025
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
NVIDIA:
Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco...
NVIDIA
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
We introduce Cosmos-Transfer, a conditional world generation model that can
generate world simulations based on multiple spatial control inputs of various
modalities such as segmentation, depth, and edge. In the design, the spatial
conditional scheme is adaptive and customizable. It allows weighting different
conditional inputs differently at different spatial locations. This enables
highly controllable world generation and finds use in various world-to-world
transfer use cases, including Sim2Real. We conduct extensive evaluations to
analyze the proposed model and demonstrate its applications for Physical AI,
including robotics Sim2Real and autonomous vehicle data enrichment. We further
demonstrate an inference scaling strategy to achieve real-time world generation
with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the
field, we open-source our models and code at
this https URL
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.