Primer: AI World Models
The Next Essential AI Revolution in Construction
You can’t build skyscrapers with predictive text.
Large Language Models have exploded into construction discourse. Every feed, every demo, every vendor pitch now includes an AI chatbot. Some of that progress is real. Much of it is noise.
LLMs have changed how we handle information. They have not changed how we handle reality.
Construction does not live in documents, language, or abstractions. It lives in gravity, geometry, friction, weather, and failure. Cause and effect rule everything we do. Any AI that cannot reason about the physical world will always hit a hard ceiling in this industry.
That ceiling is already visible.
This post looks beyond the LLM era. It explains why the next meaningful shift in construction AI is not better text generation, but AI World Models - and what leaders should do now to prepare.
The Limit of Large Language Models
LLMs operate as statistical prediction engines.
Tokens enter (words or images). Probabilities resolve. The next token emerges (words or images).
This approach works extremely well for language, images, and media. It works far less well for systems governed by physics.
LLMs do not understand the world. They approximate it through patterns in text. That distinction matters.
Construction operates inside feedback loops:
Load causes stress → Stress causes failure → Failure causes harm.
An AI that cannot simulate those loops cannot be trusted with decisions that matter.
The result is a mismatch. We apply text-based intelligence to physical problems and then wonder why hallucinations, brittleness, and safety concerns persist.
To move further, construction needs AI that reasons about what will happen, not just what sounds right.
That requires World Models.
What Is An AI World Model
A World Model builds an internal representation of reality and uses it to predict outcomes.
The difference is simple:
A generative model draws a bridge
A world model understands why it stands - and what happens when it doesn’t
World Models encode physics, object permanence, spatial relationships, and causality. They simulate future states rather than describing them.
This distinction is not philosophical. It is operational.
Why LLMs Fail at Coherent Reality
The “Assessing Language Models’ Worldview for Fiction Generation” (2024), often referred to as the ‘Waterloo Study’, tested whether LLMs maintain a stable internal world model. They do not.
Researchers found three consistent failures:
Inconsistency – Rephrased questions produced contradictory answers
Lack of robustness – Models agreed that claims could be both true and false
Narrative collapse – Generated worlds followed familiar story patterns rather than coherent internal logic
The conclusion was clear: LLMs behave as pattern-matching engines, not as systems with grounded understanding.
This is not a flaw. It is a design choice.
World Models exist to solve that gap.
LLMs vs World Models (What Actually Changes)
Correlation vs causation
LLMs predict sequences. World Models predict consequences.
Textual training vs physical grounding
LLMs learn from language. World Models learn from video, sensors, and interaction.
Hallucination vs simulation
LLMs generate plausible outputs. World Models enforce environmental consistency over time.
This is why World Models matter for construction. They are built for environments where mistakes propagate, not disappear.
Reality Check: World Models Are Hard
World Models are not magic. They face serious constraints:
Compute intensity – Real-time physics is expensive
Error accumulation – Small inaccuracies compound over time
Uncertainty – The real world is stochastic, not deterministic (definition)
Logic gaps – Language reasoning still favours LLMs
The likely future is hybrid: LLMs handle intent and communication; World Models handle execution and simulation.
That architecture is already emerging.
Inside a World Model Engine
A functional World Model contains three components:
Perception module
Ingests raw inputs—video, LiDAR, telemetry—and compresses them into a world state.
Prediction module
Simulates future states based on proposed actions. This is not video generation. It is causal modelling.
Cost / policy module
Evaluates outcomes against constraints such as safety, collision, or efficiency. High-risk paths are discarded. Better ones are explored.
This loop runs continuously. That loop is intelligence.
The State of the Art (December2025)
World Models have moved from theory to practice.
Google DeepMind – Genie 3
Generates interactive worlds from text. Runs at real-time frame rates. Supports live counterfactual testing.
Meta – V-JEPA 2
Predicts outcomes in latent space rather than pixels. Enables zero-shot robotic planning with minimal training data.
World Labs – Marble
Creates persistent, editable 3D worlds. Prioritises spatial consistency over visual spectacle.
Think of the progression like this:
Yesterday: AI as a dream
Today: AI as a self-building game engine
Tomorrow: AI as a simulator you can trust
What This Unlocks for Construction
Autonomous robotics
Robots adapt to new sites without retraining. Physical intuition replaces brittle scripting.
Digital rehearsals
Teams simulate dangerous scenarios safely. Weather, failure, and interference become testable inputs.
Spatial intelligence in design
Static BIM models give way to interactive, physics-aware environments.
Planning and logistics
AI reasons about fragility, sequence, and constraint - not just schedules and text.
This is not automation or innovation theatre. This is operational leverage.
The Future Construction AI Stack
The stack evolves in layers:
Sensory layer
Sites prioritise video and spatial data. Edge compute becomes mandatory.
Cognitive layer
LLMs interpret intent. World Models validate feasibility.
Simulation layer
Digital twins become living simulators, not passive viewers.
Execution layer
Robotics shift from programmed behaviour to adaptive planning.
In the next 5-10 years this will become a core capability in construction.
How Leaders Should Position Now
The advantage will not come from algorithms. Those will be commoditised.
The advantage will come from data, infrastructure, and foresight.
Shift from documents to spatial data
Stop discarding raw video. Treat it as a strategic asset and potential future value stream.
The Shift: Most construction firms currently discard the "raw" data from drones and site cameras, keeping only processed outputs or low-res streams. To prepare for World Models like V-JEPA 2 or Genie 3, you must treat high-fidelity video as your most valuable asset.
The Strategy: Build a "Visual Data Moat." The company with the most hours of diverse, real-world construction footage will have the smartest World Models, or will be able to monetise that footage to tech companies who need the training data.
Invest in edge compute
Latency kills safety. Physics demands proximity, enabled by edge computing.
The Shift: You cannot wait for data to travel to a server and back when a robotic arm is about to drop a steel beam.
The Strategy: Begin piloting "Edge AI" infrastructure. Look for hardware and GPUs that can reside on-site to process video data locally. This infrastructure is the prerequisite for the "Execution Layer" of the future tech stack.
Adopt open 3D standards (OpenUSD)
Closed formats block future ingestion.
The Shift: Moving away from proprietary, locked file formats towards universal scene descriptions.
The Strategy: Adopt OpenUSD (Universal Scene Description) as your future standard for 3D data. It is becoming the industry standard (pushed by NVIDIA, Pixar, and Autodesk) for interoperable 3D worlds. If your digital twins are built on OpenUSD, they will be "ingestible" by future World Models.
Redefine Digital Twins
A viewer shows what is. A simulator shows what could happen.
The Shift: A viewer shows you what is. A simulator shows you what could happen.
The Strategy: When procuring software or talking to BIM teams, ask: "Can this model simulate physics?" If the answer is no, it will soon be a legacy Digital Twin. Position your R&D budget towards simulation engines (often borrowed from gaming tech like Unreal Engine or Unity) rather than just CAD viewers.
Conclusion
LLMs digitised information. World Models will digitise reality.
Construction does not need smarter chatbots. It needs systems that understand gravity, causality, and risk.
The winners in this transition will not be the firms with the best AI demos. They will be the firms with the deepest spatial data, the right infrastructure, and the courage to prepare early.
The question is not whether World Models will arrive.
The question is whether your organisation will be ready when they do.
Monday Morning Checklist: Actionable Strategy
While World Models are on the horizon, you cannot wait until 2026 to prepare. Here are five concrete actions you and your team can take starting Monday morning to build the foundation for Spatial Intelligence.
Stop Deleting the “Raw” Footage
Action: Instruct your drone operators and site capture teams to archive the raw, uncompressed video files from site flights.
Why: This video data is the “training food” for future World Models. If you delete it to save storage space, you are deleting your future competitive advantage, or a potential sellable asset.
The “OpenUSD” Experiment
Action: Task your BIM/digital team to export one current project model into OpenUSD format and view it in an Omniverse or compatible viewer.
Why: This tests your data interoperability and starts familiarisation for this growing open model format.
Identify One “Counterfactual” Use Case
Action: Identify one scenario where a “What If” simulation would reduce the risk of harm (e.g., “What if this crane lift happens during high winds?”).
Why: This frames the problem for your team. You aren’t looking for AI to write emails; you are looking for AI to simulate risk. This focuses your technology scouting on the right kind of tools (Simulators vs. Generators).
Educate the Board on “Spatial Intelligence”
Action: Share a clip of Fei-Fei Li’s “Spatial Intelligence” TED Talk and Google DeepMind’s Genie with your leadership team.
Why: Managing expectations is key. Leaders need to understand that the next wave of AI isn’t about “smarter chatbots,” but about “robots that understand physics.”






Great post Neil. I recall you mentioning world models before and this has given me a better insight into what they are. Running simulations and scenario planning is where we need to go as an industry. I've referred to this as similar to an F1 strategist who is trying to get the best outcome for the team, based on real time developments in the race. We're focused on too much 'what happened and why', rather than focusing on what could happen and how to exploit opportunities that deliver better projects.
Thank you Neil, you have taught me something today!