Last November, Philipp Schmid, a technical lead at Hugging Face, posted a question that got 50,000 views in a few days.
Agents executing their own code is inevitable; it's too powerful not to happen. But I’m stuck on the architecture. Do we run this locally on the user's device, or safely in the cloud?
— Philipp Schmid (@_philschmid) November 9, 2025
Local sandboxes are the dream. Offline-first, zero latency, native access to your data. But can…
The replies lit up because every engineer building with agents was asking the same thing and nobody had a clean answer.
Once agents got good enough to actually do things, clone a repo, install dependencies, run commands, modify files, execute the code they just wrote, the natural question became: where does this actually run?
When Docker shipped their sandbox announcement earlier this year, they were thinking about the right question. Sandboxes are a genuinely reasonable first answer.
For devs asking “how do I run coding agents without breaking my machine?”
— Docker (@Docker) February 2, 2026
Docker Sandboxes are now available.
They use isolated microVMs so agents can install packages, run Docker, and modify configs - without touching your host system.
Read more → https://t.co/VjlWMG5wqF pic.twitter.com/7ssqWboten
Browsers have sandboxed JavaScript for decades. Security teams are comfortable with the model. You put something risky in a temporary environment, observe what happens, throw the environment away. Clean and familiar.
But sandboxes were designed for a specific thing: short-lived, contained execution. You run a snippet, observe the result, discard the environment. The whole model assumes you’re dealing with something temporary and untrusted that you want to poke at safely.
Agents aren’t snippets.
To be fair, not every agent needs a full computer either. A narrow agent that reads a file, answers a question, and exits doesn’t need a dedicated VM. Lightweight sandboxes, Docker, Firecracker, WASM, are still the right tool for short-lived, read-only, or tightly scoped work. The question isn’t sandbox vs. computer as a binary. It’s about matching the execution environment to what the agent actually does. And the moment an agent starts writing, installing, looping, and making decisions over time, the sandbox model starts to strain.
An agent doesn’t execute a program. It runs a loop.
Write code. Execute it. Observe the result. Plan the next step. Repeat.
That loop might run for thirty seconds. It might run for six hours. During that time the agent might install a package, rewrite a file, call an external API, spawn a subprocess, and modify the code it just wrote based on what it observed.
At some point in that process you’re not watching a script run. You’re operating a software system that evolves during execution.
This is where the sandbox model starts to show its limits.
Typical Agent Stack Today
LLM
│
▼
Tools
│
▼
Code Execution
│
▼
??? Where does this run ???Most agent stacks define the intelligence layer and the tool layer. The execution environment is usually an afterthought.
A sandbox answers one question: how do we stop this code from damaging the host?
But running agents in production requires a different answer entirely: how do we run this safely, predictably, and with enough visibility to understand what actually happened?
Those aren’t the same problem.
Containment prevents damage. Operation provides control. And control means things like lifecycle management, filesystem policies, network boundaries, reproducible environments, and the ability to audit what the agent actually touched, not just what it reported back.
Rich Harang made this point clearly:
“Soft controls at the model level — guardrails, behavioral constraints, prompt-level restrictions — are inconsistent and model-specific. What actually holds is hard controls at the execution layer.”
That’s not a sandbox problem. That’s an infrastructure problem.
If you look at most agent stacks today, there’s a layer that’s either undefined or improvised.
You have the LLM. You have the tools. You have execution happening somewhere. But that somewhere is usually a developer’s laptop, a CI runner, a container spun up on demand, or a temporary sandbox that wasn’t really designed for long-running autonomous software.
That works fine while agents are experiments.
It breaks down the moment agents are doing real work inside real systems, because now you have autonomous programs interacting with your filesystem, your network, your infrastructure, in a loop, for hours, with nobody watching.
And Philipp’s question is still sitting there unanswered.
Nick Vasilescu framed it better than anyone:
Agents don't need "compute".
— nick vasilescu (@nickvasiles) March 5, 2026
They need a computer.
A proper desktop, browser, file, and audio. Not just some code sandbox.
With Orgo, the environment for your agent is fast, isolated, and reproducible.
The constraint is no longer the models, it's the computers that they… https://t.co/eCpEau4yXR
That reframe is small and it changes everything.
We talk about giving agents “compute” the way we talk about provisioning CPU for a service. Abstract, fungible, something you allocate from a pool. But that framing misses what agents actually need, which is much closer to what a developer needs. A place to work. A filesystem to navigate. A network to reach out from. An environment that persists across the steps of a long-running task and is fully isolated from everything else.
Humans have laptops. Services have servers. Agents need their own computers.Not a sandbox. Not a microVM stripped of hardware access. A full virtual machine where the agent gets its own kernel, its own memory, its own disk, its own network. Nothing shared with the host. Isolated enough that you can give the agent real access to real tools without worrying about blast radius.
This is the problem stereOS is built to solve.
Agent Stack With a Real Execution Layer
LLM
│
▼
Tools
│
▼
Agent Computer
(stereOS VM)
│
▼
InfrastructureOnce an agent can write and execute code, it needs a computer of its own.
stereOS isn’t a sandbox. It’s a self-hosted, on-metal runtime environment designed specifically for agents, built on full virtual machines because that choice is deliberate. Each agent gets its own kernel, RAM, disk, and network environment. The design enables strong isolation, hardware access for GPU passthrough and compliance environments like FIPS, and the ability to run on your own hardware without handing control to a cloud provider.
The tradeoff is real: full VMs carry more boot latency and overhead than containers or microVMs. At scale, that matters. stereOS addresses this through a lightweight NixOS base and gVisor, but it’s worth being honest that hybrid models, lightweight sandboxes for short tasks, full VMs for long-running autonomous work, may still make sense depending on your workload. The goal isn’t to run everything in a full VM. It’s to have the right environment available when the agent actually needs it.
Instead of throwing an agent into a disposable environment and hoping for the best, stereOS gives it a computer to actually work on.
And once agents have real computers to run on, the rest of the operational picture comes into focus. You can manage lifecycle, controlling when an agent starts, what it has access to, and when it stops. You can observe what actually happened. You can replay a run. Tools like Masterblaster and TAPES handle those layers, but none of it works without the execution foundation underneath.
“stereOS gives every agent its own computer — a full execution environment with isolation, resources, and policies.”
The industry keeps asking: how do we sandbox agents?
But that question assumes agents are experiments. They aren’t. They’re software systems. And software systems don’t run in sandboxes. They run on computers with real infrastructure underneath them, with real controls, real observability, and real operational ownership.
The ecosystem is moving fast. Native computer-use in models, macOS-native agent sandboxes, enterprise cloud PCs, and projects like AgentStation and Orgo are all converging on the same realization from different directions: the execution environment is the problem that needs solving.
Daytona is one worth watching. They started as developer sandbox infrastructure, secure elastic environments for running code in isolation. They’ve already made the leap from sandboxes to agent computers. When the people who built sandboxes start building computers, the thesis has landed.
You can't beat the platform you're built on top of.
— Ivan Burazin (@ivanburazin) March 12, 2026
Our competitors' architecture:
- Rent AWS compute
- Spin up virtual machines
- Put sandboxes inside VMs
- When full, spin up more VMs
Our architecture:
- Own bare metal
- Machines always idle, always ready
- Fire up…
stereOS is an open, hardened option in that wave, not the only answer. But the wave itself is a signal. The industry is figuring out that agents need somewhere real to live.
As Vasilescu put it: the constraint is no longer the models. It’s the computers they operate on.
We’ve spent years making agents smarter. It’s time to give them somewhere real to work.
What does your agent infrastructure look like right now? And if something went sideways at 2am, would you know exactly what it touched?
Humans have laptops. Services have servers. Agents need computers too.
We are launching soon, subscribe for early access.