Case 03 Systems & frameworks

Scaling agent behavior without losing quality at the edges.

As an agent serves more user contexts and response types, a single instruction set starts breaking down. Rules conflict. The agent loads everything for every request. Formatting drifts on the responses that matter most. The skills architecture decomposes the response guidelines into a persona layer that applies universally and four loadable response patterns the agent pulls only when they apply.

Role: Architect and author
Partners: Engineering, AI research
Surface: Agent skills system
Status: In production

The persona stays loaded. The response patterns load on demand. Each pattern owns its own rules, rubric, prompts, and examples.

The problem

When everything loads at once, nothing loads well.

The response guidelines started as two documents that worked. Voice and formatting on one side, reasoning and context on the other. As the agent took on more user contexts and more response types, the documents kept growing. The team kept adding rules, and the agent kept reading all of them on every turn, regardless of whether the rule applied to the request in front of it.

This shows up as formatting drift. The agent follows the broad principles but misses the specific structural requirements for the type of response being generated. A triage summary gets formatted like a diagnostic walkthrough. A data comparison arrives as prose when it should be a table. The rules for each case exist in the guidelines, but they're spread across sections that the agent does not always pull together accurately.

The rules themselves held up. The issue was that they all loaded at once, every time, with no signal about which ones applied to this specific request.

Two failure modes followed. The first was the obvious one: drift on the responses that matter most. The second was harder to see but bigger over time. Every change to the guidelines had to be tested against every response type, because nothing was scoped. The team could not refine the triage rules without risking a regression on data presentation. The document was a single point of failure for every kind of response the agent could produce.

The approach

A persona that applies universally. Response patterns that apply only when they should.

I redesigned the architecture as two layers, with the boundary between them defined by what stays consistent versus what depends on the task.

The persona layer

The persona defines the agent's core identity: its professional role, domain expertise, communication style, and behavioral traits. These remain consistent across every interaction regardless of which response pattern is active. When a response pattern loads, it operates within the persona's constraints. If the persona establishes that the agent uses direct, evidence-based language, the triage-response pattern structures the information for quick scanning while maintaining that same voice.

The persona sets the baseline for how the agent communicates. Response patterns determine how it organizes and delivers specific types of information within that baseline.

Four response patterns

Each pattern contains the rules for one specific communication structure. The pattern is defined by the shape of the response itself, so the same pattern applies across domains. Any task where the user needs to compare multiple items uses data-presentation, whether they are comparing equipment, work orders, or sensor readings. Any task where the user needs a quick answer with optional depth uses progressive-disclosure.

Pattern	User objective	What the agent does
progressive-disclosure	Solve a problem with optional depth	Lead with a direct finding, then offer deeper reasoning if requested
data-presentation	Compare assets or attributes	Format technical metrics into structured layouts for side-by-side comparison
triage-response	Identify and rank issues	Provide concise summaries with clear priority markers for immediate decision-making
handoff-response	Document and hand off work	Produce a clean summary organized by status, actions taken, and next steps

Four files per pattern

Every response pattern folder contains the same four files. Together, they give the agent everything it needs to recognize when a pattern applies, execute it correctly, and verify the output before delivering.

SKILL.md

The core rules

Defines the specific constraints on response structure, length, and depth that shape how the agent communicates for this pattern. The progressive-disclosure skill, for example, sets rules like leading with the direct answer, capping the initial response at four sentences, and offering deeper detail only when the user asks.

quality-rubric.md

The self-check

The criteria the agent checks its drafted response against before delivering. If any required criterion fails, the agent revises. This catches common problems before the user sees the response.

prompt-templates.md

The routing signal

Example user requests used for intent classification. At the start of a turn, the system checks the incoming request against these templates to determine which skill folders to dynamically inject into the prompt context.

examples.md

The reference model

Shows the agent what good and bad output looks like for this pattern, with annotations explaining why each version succeeds or fails. The examples demonstrate correct execution through concrete instances.

Routing only the relevant patterns into the prompt context has a second benefit beyond consistency: the agent runs on a leaner context window. Persona stays static across turns. Patterns load as needed. Prompt payloads stay small, which reduces token usage and keeps inference latency low. Scaling the system means adding new pattern folders, not enlarging the foundational instruction set.

Inside one skill

One pattern, fully written.

Here is the SKILL.md for progressive-disclosure, the pattern the agent uses for any initial response. The format is intentionally compact: a frontmatter block that defines when the skill applies, a short statement of communication goal, the core rules, and the rules for what happens when this skill is active alongside another.

---
name: progressive-disclosure
description: Structure responses in layers. Lead with the direct
             answer in 4 sentences or fewer, then offer more depth
             progressively. Apply to all initial responses unless
             triage-response is active.
---

## Communication goal

The user needs enough information to understand the situation
and act. They may want more depth, but that should be their
choice. Lead with the answer, keep it brief, and make it clear
that more detail is available.

## Core rules

Lead every initial response with the direct answer or finding.
Limit the initial response to 4 sentences or fewer.
Explicitly offer more detail at the end of every initial response.
Keep technical reasoning, timelines, and causal chains out of the
initial response. These belong in follow-up responses when the
user asks for them.

## When multiple patterns are active

If triage-response is also active, it determines the priority
order of items. Progressive-disclosure controls how much detail
appears for each item. Lead with the ranked list, keep detail
minimal per item, and offer to go deeper on any specific one.

If data-presentation is also active, progressive-disclosure
controls the framing (direct answer first, offer to expand),
while data-presentation controls how comparison data is formatted
(tables, structured layouts).

The composition rules at the bottom are the part that makes the architecture work. A response is rarely just one pattern. A triage that leads into troubleshooting starts with triage-response to rank the issues, then shifts into progressive-disclosure as the user focuses on a specific problem. Each pattern controls a different aspect of the response, so they complement each other rather than conflict.

What changed

Refining one pattern stopped breaking all the others.

The architecture changed three things about how the team works on the agent.

Drift

Each response type now follows its own rules. Triage summaries get formatted as ranked lists. Data comparisons arrive as tables. Handoffs include status and next steps. The drift that came from the agent pulling rules from the wrong section is gone.

Scope

Changes stay local. Refining the triage rules no longer risks a regression on data presentation. Each pattern can be tested and iterated independently, with the evaluation framework catching any regression that does cross over.

Extension

New patterns slot in without restructuring. Because the persona stays static and patterns load independently, the system scales by addition. A dense, high-velocity triage response and a plain-language executive summary can consume the exact same underlying data layer without their rules ever conflicting.

The architecture also became the foundation for how domain skills integrate with conversation skills. A domain skill like an equipment failure walkthrough can reference progressive-disclosure for its framing and data-presentation for its tables, without having to embed those rules inside the domain skill itself. The conversation patterns become the language layer the rest of the agent's capabilities speak through.