Skills·Intermediate·Last tested: 2026-03·~5 min read

Progressive Loading

Progressive loading is the architectural decision that makes skills scale. Instead of dumping every skill's full content into the agent's context window, the system loads knowledge in three tiers — metadata first, instructions when needed, supporting files only at the point of use.

The Problem

An agent with 50 skills, each averaging 3,000 tokens of instructions plus 5,000 tokens of reference material, would need 400,000 tokens just for skill content. Most context windows can't handle that, and even if they could, the signal-to-noise ratio would be terrible.

The Solution: Three Tiers

Tier 1 — Skill Discovery (~100 tokens per skill)

The agent always has the YAML frontmatter of every skill in memory. For 50 skills, that's roughly 5,000 tokens — a small fraction of any context window.

Agent context at startup:
├── System prompt
├── Conversation history
└── Skill index (Tier 1)
    ├── sales-reporting: "Access sales data via reporting API..."
    ├── deploy-checker: "Check deployment status across environments..."
    ├── customer-lookup: "Find customer records by name, ID, or email..."
    └── ... (47 more, ~100 tokens each)

The agent reads the description field of each skill and decides which one matches the user's query. This is why the description is the most critical line in any skill.

Tier 2 — Full Instructions (<5,000 tokens)

When a skill is triggered, the agent loads the full markdown body of SKILL.md — Steps, Guidelines, Output format. Only the matched skill loads, not all 50.

Agent context after matching "sales-reporting":
├── System prompt
├── Conversation history
├── Skill index (Tier 1) — all 50 skills
└── Active skill (Tier 2) — sales-reporting
    ├── Steps (1-4)
    ├── Guidelines
    └── Output format

Tier 3 — On-Demand Resources (variable)

When a specific step references a file, the agent loads it at that point. Step 3 says "read references/metric-definitions.md" — the file loads when the agent reaches step 3, not before.

Agent context at step 3:
├── System prompt
├── Conversation history
├── Skill index (Tier 1)
├── Active skill (Tier 2)
└── Step 3 resource (Tier 3) — metric-definitions.md

After step 3 completes, the reference content can be released from context if space is needed.

Token Budget Comparison

Load everything upfront — 400K tokens (50 skills) / 1.6M tokens (200 skills) / 8M tokens (1,000 skills)
Progressive loading (typical query) — ~8K tokens regardless of skill count

The key insight: context usage is proportional to the active skill's complexity, not the total number of skills.

Tier 1: Agent scans all skill descriptions. sales-reporting matches on "churn" and "monthly".
Tier 2: Agent loads the full Steps/Guidelines/Output format from sales-reporting/SKILL.md.
Step 1: Agent parses the request — product: all, metric: churn, period: last month. No Tier 3 needed.
Step 2: Agent runs scripts/fetch-report.py --metric churn --period last-month. Tier 3: script executes, returns JSON.
Step 3: Agent needs the churn calculation definition. Loads references/metric-definitions.md (Tier 3). Applies the formula.
Step 4: Agent formats the output per the rules. No additional loading needed.

Total context used: ~8,000 tokens. Total skills available: as many as you want.

Progressive Loading

The Problem

The Solution: Three Tiers

Tier 1 — Skill Discovery (~100 tokens per skill)

Tier 2 — Full Instructions (<5,000 tokens)

Tier 3 — On-Demand Resources (variable)

Token Budget Comparison

Design Implications

Description quality is everything

Keep SKILL.md body under 5,000 tokens

Reference files should be self-contained

Scripts return structured output

Practical Example