Definition
Skill extraction is the process of reading a recorded agent session and producing a draft skill — the trigger, tool sequence, decisions, and troubleshooting that helped the session succeed — so the pattern can be reviewed, versioned, and reused on later runs instead of rediscovered.
Skill extraction is the step that turns a recorded agent session into a reusable skill: the procedure, the tool sequence, the decisions, and the known fixes that helped that session succeed, lifted into a draft you can review and reuse. In the continuous agent improvement loop, extraction is the move between capturing a session and applying what it taught — the point where a one-time result becomes something a later run can reuse instead of rediscovering.
Quick breakdown
Extraction is a transformation. Each part of a recorded session maps to a part of the resulting skill.
| Recurring symptom → Trigger | The error or task type that started the session becomes a signal for when the skill applies. |
|---|---|
| Successful tool calls → Procedure | The ordered calls that resolved the task become the skill's recommended steps. |
| Branch after an error → Decision point | A choice the agent made based on observed state becomes a documented decision. |
| Error and the fix that cleared it → Troubleshooting | Symptom, likely cause, and fix become a reusable lookup. |
| Source sessions → Provenance | The draft records which sessions it came from and that it was AI-generated, so its origin is auditable. |
What a recorded session contains that extraction reads
Extraction is only as good as the session record under it. A useful record is the full trajectory of a run rather than a summary written afterward. It holds the prompts, the model responses, the tool calls and their arguments, the errors returned, the fixes attempted, and the outcome, captured as the run happened.
Extraction reads what already worked in a session, instead of guessing what should work next time.
This is the practical difference between a skill and a prompt. A prompt is written from intent. A skill is drafted from a decision trail that already produced a result. The session record is the evidence, and extraction is the step that turns that evidence into something reusable.
What skill extraction keeps and what it discards
A raw session is mostly noise. The job of extraction is to keep the part that generalizes and drop the part that was specific to one run.
| Kept (tends to generalize) | Discarded (run-specific) |
|---|---|
| The tool sequence that resolved the task | Timestamps and wall-clock timing |
| Error signatures and the fixes that cleared them | Credentials and secrets, redacted at capture |
| Decision branches based on observed state | One-off literal values unique to that run |
| Conditions that say when the skill applies | Conversational filler that did not change the path |
| Inputs that materially changed the outcome | Dead ends the session already superseded |
The hard part is the boundary. Keep too much and the skill only fires on the exact session it came from. Keep too little and the skill is a vague suggestion. Good extraction leans toward the procedure and the decisions, and away from the literals.
How to generate a skill from a recorded session
In paper console, skill generation runs from a recorded session. Open the session, choose Generate Skill, and the server runs extraction over the session and returns a draft. The draft opens in an editor so you can read it, adjust it, and decide whether it is worth keeping.
Recorded session (captured by paper cli)
prompts · tool calls · errors · fixes · outcome
│
▼
paper console: Generate Skill
(server runs extraction over the session)
│
▼
Draft skill (Markdown + structured fields)
├── type: workflow | prompt-template | domain-knowledge
├── procedure, decisions, troubleshooting
├── originating sessions + AI-generated flag
└── version + visibility (private or team)
│
▼
Review and edit → save A generated skill carries structure beyond its text. It has a type — a workflow, a prompt template, or domain knowledge — so the library can tell procedures apart from reference material. It records the sessions it was generated from and that it was AI-generated, which keeps its origin traceable. It is versioned, and its visibility can be scoped to you or shared with the team.
Why a generated skill is a draft, not a finished skill
Extraction produces a draft. Adoption is a separate, human decision.
A generated skill is a starting point a person reviews, not a verdict the system commits on its own.
Two failure modes make the review step matter. First, a session can look successful without being so: the agent declared victory, but the fix was incidental, and extracting from it would bake in a pattern that does not hold. Second, an over-fit draft captures literals that should have been general, so it quietly breaks the first time the inputs differ. Both are caught by a person reading the draft before it is saved and shared. That is why the console opens the generated skill in an editor rather than publishing it directly.
How skill extraction differs from prompt engineering
Both produce instructions for an agent. They differ in where the instructions come from.
| Prompt engineering | Skill extraction |
|---|---|
| Starts from what you think should work | Starts from a session that already worked |
| Iterated by trial against live runs | Drafted from a recorded trajectory |
| Often omits the error paths you did not anticipate | Can include the errors the session hit, with fixes |
| Tends to live as text maintained by hand | Lives as a versioned artifact, re-derivable from new sessions |
The two are complementary. You might prompt-engineer your way to a working session, then extract a skill so the next run does not have to repeat the search.
How Paper Compute extracts skills from sessions
paper cli records each session as a durable, queryable archive. In paper console, Generate Skill runs extraction over a recorded session and returns a draft skill you review and edit before saving. The result is a structured artifact with a type, a version, a record of the sessions it came from, and a visibility scope. From there it belongs to your skill library, where review and versioning govern it like any other shared asset. stereOS is where a skill runs safely when applying it involves executing code.
Extraction is the step that keeps a solved problem from being solved twice. Capture makes it possible; review makes it trustworthy.