What in the world are ‘World Foundation Models”?
If you’re unfamiliar with the phrase “World Foundation Models,” that makes sense, because it’s pretty new and most likely coined by Nvidia. It conjoins the existing (but also recent) concepts of “world models” (AI systems that create internal representations of their environment to simulate and predict complex scenarios) and “foundation models” (AI systems trained on vast datasets that can be adapted for a wide range of tasks).
According to Nvidia, WFMs are an easy way to generate massive amounts of photoreal, physics-based artificial data for training existing models or building custom models. Robot developers can add their own data, such as videos captured in their own factory, then let Cosmos multiply and expand the basic scenario with thousands more, giving robot programming the ability to choose the correct or best movements for the task at hand.
The Cosmos platform includes generative WFMs, advanced tokenizers, guardrails, and an accelerated video processing pipeline. Developers can use Nvidia’s Omniverse to create geospatially accurate scenarios that account for the laws of physics. Then, they can output these scenarios into Cosmos, creating photorealistic videos that provide the data for robotic reinforcement learning feedback.