Interactive Demo

Gmail triage inside
an ephemeral VM.

An OpenClaw agent triages your inbox inside a stereOS VM. Gmail credentials exist only in the VM’s file-based keyring—destroyed when the VM stops. Every LLM call recorded to a tapes black box in .mb/tapes/tapes.sqlite.

View openclaw-in-a-box on GitHub →

← All demos
The Setup

The VM sandbox

The jcard.toml declares the entire sandbox: an opencode-mixtape VM with egress locked to Gmail and Anthropic APIs, a 2-hour auto-teardown timeout, and secrets injected via tmpfs. Gmail OAuth is handled by the gog CLI—tokens are imported into the VM’s file-based keyring and destroyed with the VM.

Why this exists. In February 2026, an OpenClaw agent with Gmail access deleted 200+ emails while ignoring stop commands. The root cause: context window compaction dropped the safety constraint. No network boundary, no kill switch, no flight recorder. This project makes that scenario impossible—stereOS sandboxes the network, tapes records every decision, and the VM self-destructs after 2 hours.
jcard.toml — openclaw-in-a-box
~ — boot, setup, and invoke
The skill drives everything. The triage logic lives in skills/gmail-triage/SKILL.md—a Markdown file that defines classification rules, safety constraints, and output format. No code. The agent reads the skill, fetches messages via gog, classifies with Claude, and applies actions. Edit the Markdown to change behavior.
Act I

The agent triages your inbox

Inside the VM, OpenClaw loads the gmail-triage skill and uses gog to fetch unread messages. Claude classifies each one into four categories with specific actions. Safety constraint: never delete, never reply.

inside VM — openclaw agent triaging
┌──────────────────────────────────────────────────────────┐ inbox 20 unread threads ├── newsletter 12 ──▶ archived ├── receipt 3 ──▶ labeled + archived ├── action-needed 2 ──▶ starred + labeled └── fyi 3 ──▶ marked as read every LLM call recorded to .mb/tapes/tapes.sqlite └──────────────────────────────────────────────────────────┘
The black box. tapes sits between OpenClaw and the Anthropic API as a transparent proxy, capturing every request and response to .mb/tapes/tapes.sqlite. Content-addressed hash chains make the sequence tamper-evident. If the agent miscategorizes an email, replay the conversation to see exactly what input it received and what reasoning it produced.
The Boundary

What the wall looks like from inside

The agent can talk to Gmail and the Anthropic API. It cannot talk to anything else. If it tries to reach a domain that isn’t on the list—to exfiltrate data, phone home to an unvetted server, or download something unexpected—the request fails immediately. Not a timeout. A hard no.

inside VM — testing egress
Act II

You review the report

The agent writes output/INBOX_REPORT.md with a structured summary. SSH in, read the report, check the black box.

~ — reviewing the report
looks good — time to tear it down
Teardown

Clean shutdown — credentials destroyed

Tear down the VM. The gog token in the file-based keyring is gone. The Anthropic API key in tmpfs is gone. What persists on the shared mount: .mb/tapes/tapes.sqlite (the black box), output/ (reports), and .openclaw/ (agent config for next boot).

~ — teardown and verify
Ephemeral access. Permanent audit trail. The Gmail OAuth token existed only in the VM’s file-based keyring for the duration of the session. The Anthropic API key lived in tmpfs RAM. Both are gone. But the complete decision log—every prompt, every classification, every action—lives on in .mb/tapes/tapes.sqlite. Replay it, audit it, learn from it. Next time: mb up and the agent is ready again.

Agents need black box recorders.

stereOS sandboxes the agent. tapes records every decision. OpenClaw drives the skill.
The VM self-destructs. The recording is permanent.

View on GitHub →