Meta’s V-JEPA 2 World Model Brings AI Closer to Thinking Before Acting

Monday June 16, 2025. 11:04 PM , from eWeek

Meta’s newest AI model, V-JEPA 2 (Video Joint Embedding Predictive Architecture), is described as a “world model” for its ability to simulate and reason about physical interactions. V-JEPA 2 doesn’t just identify objects — it can infer physical properties like gravity, anticipate how moving objects behave, avoid obstacles, and learn new tasks by analyzing what it sees.

The June 11 news release from Meta reads, in part: “Today, we’re excited to share V-JEPA 2, our state-of-the-art world model, trained on video, that enables robots and other AI agents to understand the physical world and predict how it will respond to their actions. These capabilities are essential to building AI agents that can think before they act, and V-JEPA 2 represents meaningful progress toward our ultimate goal of developing advanced machine intelligence (AMI).”

What is a world model?

Unlike most traditional types of AI models that rely on statistical patterns to complete tasks, world models create internal representations of their surroundings — allowing them to simulate how the environment works. This approach allows the model to fully understand its surrounding environment, create multi-step plans, and predict how its actions and the actions of others will affect reality.

To be classified as a world model, the AI must be capable of:

Understanding

Predicting

Planning

V-JEPA 2 was built with these three goals at its core, enhancing its predecessor with more precise forecasting, better generalization, and stronger pattern recognition.

How was V-JEPA 2 trained?

In addition to being trained on more than a million hours of video, V-JEPA 2 was fine-tuned using 6 hours of real-world robotic interaction data. It was later tested on robotic arms across different settings, where it successfully executed object manipulation tasks like grappling and placing, without prior exposure to those objects.

The model demonstrated strong generalization, adapting to unfamiliar scenarios without needing demonstration-based learning. Performance benchmarks showed that V-JEPA 2 significantly outperformed its earlier version in areas like motor control, prediction, and physical reasoning.

Introducing new AI benchmarks

The team at Meta also released three new AI benchmarks alongside V-JEPA 2. These benchmarks were designed to evaluate a model’s effectiveness in understanding the real world based on data gleaned from video.

IntPhys2: The first benchmark examines a model’s ability to recognize the difference between plausible and implausible physics.

Minimal Video Pairs (MVPBench): This benchmark tests the AI model through a variety of multiple choice questions on how well it understands any videos that were used for training.

CausalVQA: Finally, this benchmark scores the AI model’s ability to understand basic cause-and-effect scenarios.

Paving the way for smarter robotics

While the original V-JEPA marked an important step in bringing embodied intelligence to machines, V-JEPA 2 significantly advances the field. Built on the same foundation, the upgraded model delivers major improvements in predictive accuracy and task generalization — making it well-suited for use in next-generation robotics, wearables, and autonomous systems.

Read eWeek’s coverage of Meta’s new superintelligence lab, where Mark Zuckerberg outlined his vision for advancing AI beyond today’s frontier models.
The post Meta’s V-JEPA 2 World Model Brings AI Closer to Thinking Before Acting appeared first on eWEEK.