Introducing Paper Compute Co: Infrastructure for Running AI Agents in Production

We’ve spent years looking at systems from two different vantage points.

One from deep inside how developers actually build. Not watching from the outside, but embedded in the conversations, the open source communities, the real workflows most infrastructure never gets designed for.

The other from inside the systems themselves. Building infrastructure that has to hold up in production. Where every action matters. Where systems need to be understood, audited, and trusted and not just “working.”

Both perspectives kept running into the same problem. You can see that something happened, but you can’t answer the question that actually matters:

“What exactly happened and can you prove it?”

The AI Gap

The second an agent touches production, the questions might not come right away. But if something breaks, they’ll come rapid fire: What exactly did it do? Can we audit it? Can we prove it? If that agent just accessed customer data, and your answer is ‘We can check the logs later,’ you don’t have an answer. You have the Agent Gap.

We built Paper Compute to close it.

What We Built to Close the Agent Gap

tapes

We built tapes as a zero-instrumentation observability layer for AI agents. It works as a reverse proxy at the network layer, sitting between your agent and the inference provider, transparently capturing every action and output. No SDK. No code changes. One environment variable. If an agent ran, you have a cryptographically verified, tamper-proof record of what happened.

But the record is just the beginning.

Engineering leads can see token usage, cost, and how AI is actually being used
New engineers can replay sessions instead of relying on docs or walkthroughs
Teams can turn successful runs into reusable skills
Security and compliance teams get audit trails that are provable — not reconstructed

The operational knowledge that usually lives in someone’s head or disappears when they leave starts living somewhere real.

With anomaly detection, you can determine where agents get stuck, systems are fragile, and the complexity is quietly building up. And instead of just alerting, you get a path to resolve it. We saw this firsthand — our Pokémon agent was spamming inputs during battle animations. tapes flagged the anomaly, and the fix came directly from that detection.

Learn more in our introduction to tapes and the tapes deep dive.

stereOS

Agents don’t just need a runtime. They need an actual operating system because agents don’t just fail quietly. They fail creatively. They loop. They escalate. They pull in dependencies you didn’t approve. They touch systems you didn’t intend. And most setups give them far more access than they should have. Your team shouldn’t have to make tradeoffs with capability vs isolation vs cost.

stereOS is built for that failure mode.

stereOS is a hardened Linux operating system (built on NixOS), purpose-built for agents that provides a secure base layer (a hardened VM) and then runs agents inside sandboxed environments using gVisor.

Each agent gets:

Its own virtualized kernel boundary (via gVisor)
A read-only, content-addressed /nix/store root filesystem
Fast startup without full VM overhead

So without breaking isolation, agents can:

Spin up sub-agents
Install, build, and run complex systems
Operate like real systems

You boot stereOS once. Each agent runs in its own sandbox. If an agent escapes, it doesn’t reach your machine. It’s still contained inside stereOS.

Isolation is enforced below the application layer, so agents can use GPUs and system-level features without compromising those boundaries. Read the full introduction to stereOS.

“tapes shows you what happened. stereOS makes sure it can’t go further than it should.”

See It Working

The Gmail agent demo shows an agent running a real workflow with a full execution history. Not just what it did. Exactly how it happened. Step by step. Replayable.

Try the Gmail agent demo →

The parallel agents demo shows agents spawning agents — each isolated, each tracked, each recoverable. No orchestration framework. A CLI that works whether a human or an agent is calling it.

Try the parallel agents demo →

What We’re Building Toward

Agents deserve the same rigor we gave production software.

Observable. Auditable. Durable.

Systems teams can actually trust.

That work is just starting. We’d love to know what you’re building and what’s blocking you from running agents in production today.

Try the demos. Reach out. Tell us what’s missing.