Field note
Hooks vs Skills vs Subagents: Picking the Right Claude Code Primitive
A deep-dive explainer on Hooks vs Skills vs Subagents: Picking the Right Claude Code Primitive: methodology, historical context, worked examples with real numbe
Introduction: Extending Claude Code
Claude Code gives developers a programmable interface to Claude’s language model. It lets you embed model calls inside ordinary code, combine them with external tools, and orchestrate complex workflows. As applications move from single‑turn prompts to multi‑step pipelines, the need for reusable, predictable primitives grows. Engineers must decide whether to wrap tool calls in deterministic hooks, package reusable instructions as skills, or spin out parallel agents as subagents. The choice determines how easy the system is to test, how much latency it adds, and how well it scales across workloads.
Hooks are the most granular primitive. They let you run deterministic code before and after any tool call, giving you fine‑grained control over the model’s execution flow.
The trade‑off is latency. Each PreToolUse hook adds a small amount of overhead to the call chain.
In the sections that follow we compare hooks, skills, and subagents across dimensions such as determinism, modularity, parallelism, and cost. We will walk through concrete examples, highlight failure modes, and provide a decision tree that helps you pick the right primitive for a given problem. By the end of the article you should have a clear mental model for when to extend Claude Code with a hook, when to encapsulate logic as a skill, and when to delegate work to a subagent.
Deterministic Guards – PreToolUse and PostToolUse Hooks
When a Claude‑based application needs to enforce business rules before a tool runs, a deterministic guard is the cleanest solution. The guard lives in a PreToolUse hook, inspects the arguments, and either allows the call to proceed or short‑circuits it with a replacement value. After the tool finishes, a PostToolUse hook can validate the result, transform it, or trigger side effects such as logging. Because the hooks run in the same execution context as the tool, they guarantee that the guard logic cannot be bypassed by later prompt modifications. This makes them ideal for compliance checks, rate limiting, and data sanitization.
A typical guard checks for required fields, type constraints, or policy flags. If the check fails, the hook returns a non‑null payload; Claude then skips the tool call entirely and uses the payload as the tool’s output. This deterministic behavior eliminates the need for downstream error handling and keeps the control flow simple. The same pattern applies after the tool runs: the PostToolUse hook can reject a result that violates a schema, replace it with a default, or raise an alert.
# Example: PreToolUse guard that validates an email before calling a lookup tool
def pre_tool_use(args):
email = args.get("email")
if not email or "@" not in email:
return {"error": "Invalid email address"} # non‑null => skip tool
return None # null => proceed
def post_tool_use(result):
if result.get("status") != "found":
result["warning"] = "User not found"
return result
In this snippet the guard returns a dictionary when the email fails validation. Claude will bypass the lookup tool and treat the dictionary as the final response. The post‑hook adds a warning if the lookup succeeded but returned an unexpected status. By keeping the guard logic deterministic, engineers avoid race conditions that can arise when prompts try to re‑evaluate the same condition later in the chain.
Deterministic guards also simplify testing. Because the hook’s return value fully determines whether the tool runs, unit tests can stub the hook and verify the downstream behavior without invoking the external service. This isolation reduces flakiness in CI pipelines and speeds up iteration.
Skills can cut prompt token usage by up to 30 %, according to the Anthropic Performance Guide. Using a skill instead of in‑prompt instructions reduces the amount of text Claude must process, which can lower latency and cost.
When a PreToolUse hook returns a non‑null value, Claude will skip the tool call and use the hook’s output directly, as described in the Anthropic Docs – PreToolUse Hook Behavior. Anthropic Docs – PreToolUse Hook Behavior
Modular Instructions – Skills
Skills are self‑contained instruction bundles that the model can load on demand, reducing prompt size and improving modularity. They let engineers package a set of related steps, validation, transformation, or API calls, into a single named unit. When a skill is invoked, Claude substitutes the bundle into the prompt, executes it, and returns the result as if the instructions had been written inline. This keeps the main prompt lean, avoids duplication, and makes it easy to share reusable logic across projects.
The underlying mechanism is straightforward. A skill definition consists of a name, a description, and a list of instructions. The model maintains a registry of these definitions. At runtime, the model scans the prompt for a skill call marker (for example, <<skill:ValidateEmail>>). Upon encountering the marker, it fetches the corresponding bundle, inserts the instructions, and continues processing. Because the bundle is loaded only when needed, the overall token count stays low, which improves latency and reduces cost.
Below is a minimal skill that validates an email address and normalizes it to lowercase. The definition is written in Claude‑compatible syntax; the surrounding application can register it with the model runtime.
# skill: ValidateEmail
def validate_email(email: str) -> str:
"""
Checks that the input looks like an email address.
Returns the address in lowercase if valid, raises an error otherwise.
"""
import re
pattern = r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
if not re.fullmatch(pattern, email):
raise ValueError("Invalid email format")
return email.lower()
When the prompt contains <<skill:ValidateEmail>>, Claude replaces the marker with the function body, runs the validation, and returns the normalized address. The calling code receives a clean string without having to embed the regex logic each time.
Failure modes typically involve stale skill definitions or mismatched expectations. If a skill is updated but the registry is not refreshed, the model may execute an outdated version, leading to subtle bugs. Similarly, if the skill’s description does not match its implementation, the model might misinterpret the intended behavior, producing incorrect outputs. Monitoring version hashes and keeping documentation in sync mitigates these risks.
Compared with hooks, skills trade flexibility for predictability. Hooks allow arbitrary code to run before or after a tool use, which can adapt to dynamic contexts but increase prompt length and complexity. Skills, by contrast, are static bundles that keep the prompt concise and enforce a clear contract between caller and implementation. Use skills when you have repeatable, well‑defined procedures that benefit from reuse and when prompt size is a primary concern.
Pick the skill primitive when you need modular, reusable instruction sets that keep the main prompt small and when the operations are deterministic enough to be captured in a static bundle. For highly dynamic workflows that require conditional branching around tool calls, hooks may be more appropriate.
Subagents can double throughput for independent batch tasks. This 2× increase is reported by Anthropic Engineering Blog, see the performance discussion at the provided source.
The documentation describes Skills as self‑contained instruction bundles that the model can load on demand, reducing prompt size and improving modularity. Anthropic Docs – Skills Overview
Parallel Context – Subagents
When a workflow requires many independent steps that can run at the same time, a single Claude instance becomes a bottleneck. Engineers often hit this limit when aggregating search results, processing large document batches, or running parallel simulations. Subagents give you a way to launch multiple Claude contexts concurrently, each with its own prompt, memory, and tool set. This pattern reduces overall latency and isolates failures, making it a natural fit for high‑throughput pipelines.
Subagents work by spawning separate Claude instances that share a parent orchestrator. The orchestrator sends a distinct prompt to each child, collects their responses, and optionally merges them. Because each child runs in its own sandbox, they do not interfere with each other’s token budget or tool state. The orchestrator can also enforce a timeout or retry policy per subagent, which keeps the overall system responsive even if one branch stalls.
Subagents enable parallel execution of multiple Claude instances, each with its own context, allowing you to scale out complex workflows. – Anthropic Blog – Subagents: Parallel Context for Claude Code
A minimal Python‑style example illustrates the pattern. The orchestrator creates a list of prompts, launches a subagent for each, and then aggregates the results:
from anthropic import ClaudeClient
client = ClaudeClient(api_key="YOUR_KEY")
prompts = [
"Summarize chapter 1 of the PDF.",
"Extract all dates from the log file.",
"Classify sentiment of the tweet batch."
]
def run_subagent(prompt):
return client.completion(
model="claude-2",
max_tokens=500,
temperature=0.0,
prompt=prompt
).text
responses = [run_subagent(p) for p in prompts]
combined = "\n, -\n".join(responses)
print(combined)
The code runs each run_subagent call in parallel (e.g., via threads or async tasks). Each call gets its own token budget, tool access, and context, so the three tasks do not compete for resources. The orchestrator can later feed combined into another Claude step for synthesis.
Parallel subagents are not free. Each child instance incurs the same per‑token charge as a regular Claude Code call, plus a small premium for the parallel execution infrastructure. This cost difference is modest but measurable at scale.
Subagents cost 0.012 for standard Claude Code usage. The Anthropic Pricing Page lists this rate, highlighting the extra expense of parallelism.
Failure modes typically involve token exhaustion, coordination timeouts, or unexpected tool errors in a child. If a subagent runs out of tokens, its response will be truncated, and the orchestrator must decide whether to retry with a higher limit or fall back to a simpler skill. Network latency can also cause the orchestrator to wait longer than expected; setting reasonable per‑subagent timeouts mitigates this risk. Finally, divergent outputs from subagents may require additional validation logic to avoid downstream inconsistencies.
Compared with skills, which run sequentially inside a single Claude instance, subagents trade higher cost for lower latency and better isolation. Skills are appropriate when the steps share state or when token budget is tight. Subagents shine when tasks are independent, when you need to scale out to many inputs, or when you want to sandbox risky tool calls.
Pick subagents when your workload is embarrassingly parallel, when latency dominates cost considerations, and when you can afford the modest price premium. Use skills for tightly coupled sequences or when you need deterministic ordering without the overhead of managing multiple Claude contexts.
Decision Tree: Choosing the Right Primitive
Selecting the correct primitive requires an understanding of the relationship between the task and the main execution loop. The primary factors are state isolation, execution flow, and the need for deterministic intervention. If the objective is to enforce a rule across all actions, a hook is usually the correct choice. If the objective is to expand the capabilities of the agent within its current context, a skill is more appropriate. When building complex orchestrations on Leviathan (leviathanterminal.com), developers often use subagents to handle high-volume data streams while keeping the primary reasoning engine focused on high-level strategy.
Factor 1: Level of Determinism
Hooks provide the highest level of deterministic control. They allow you to intercept a tool call before it reaches the environment or inspect the result before it returns to the model. Use a hook if you need to validate parameters against a schema, redact sensitive information, or log activity for audit trails. This is effectively middleware for your agent; it ensures that even if the model attempts an invalid action, the system blocks or corrects it before execution.
Factor 2: Context Management and Noise
Skills operate within the main conversation context. Every skill execution and its result add tokens to the primary context window. If a task requires processing hundreds of lines of logs or performing deep research that is not relevant to the final answer, a subagent is better. Subagents provide a clean slate (a fresh context window). They prevent context poisoning where irrelevant intermediate steps distract the main model from its primary objective.
Factor 3: Task Complexity and Parallelism
If a task can be broken down into independent units that can run simultaneously, subagents are the only viable path. Skills are sequential; the model calls a skill and waits for the response. Subagents can be spawned in parallel to aggregate data from multiple sources. This is essential for operations that would otherwise hit the context limit or timeout in a single-threaded loop.
Selection Logic
Engineers should follow this heuristic for every new feature:
- Use a Hook if the logic is a strict constraint or a universal monitoring requirement.
- Use a Skill if you are adding a discrete, reusable function that the model should explicitly call.
- Use a Subagent if the task requires its own workspace, specialized instructions, or significant token usage.
Skills are most effective when you need to package domain-specific knowledge. For example, if you want Claude to interact with a proprietary database, you write a skill that abstracts the SQL complexity. This keeps the prompt engineering modular and easier to test in isolation. No single primitive is a silver bullet; the most robust systems often use hooks to guard the outputs of both skills and subagents.
Worked Example 1: Data Validation Pipeline (Hooks vs Skills)
A data validation pipeline is a common pattern in ingestion services. Incoming records must be checked for schema compliance, type safety, and business rules before they are stored. The pipeline can be built with Claude code primitives in two ways: a pair of deterministic hooks that run before and after each tool invocation, or a reusable skill that encapsulates the entire validation logic. Understanding the trade‑offs helps engineers decide which primitive fits their architecture.
How Hooks Implement Validation
Pre‑tool hooks intercept the payload right before a downstream tool, such as a database writer, is called. The hook can inspect the arguments, reject malformed fields, and optionally transform the data. Post‑tool hooks run after the tool finishes; they can verify that the tool respected constraints, log any anomalies, and raise an error if needed. Because hooks are attached to a specific tool, they act as a guard that guarantees every call passes through the same validation routine.
# Claude hook example (Python‑like pseudocode)
def pre_tool_validate(args):
# Ensure required keys exist
required = {"id", "timestamp", "value"}
missing = required - args.keys()
if missing:
raise ValidationError(f"Missing fields: {missing}")
# Type checks
if not isinstance(args["value"], (int, float)):
raise ValidationError("value must be numeric")
# Normalise timestamp
args["timestamp"] = normalize_iso(args["timestamp"])
return args
def post_tool_check(result):
if not result["status"] == "ok":
raise ValidationError("Write failed")
return result
The hook functions are small, deterministic, and easy to test in isolation. They are invoked automatically by the Claude runtime whenever the target tool is used. This makes the validation logic tightly coupled to the tool, which is useful when the tool cannot be changed without breaking downstream contracts.
How Skills Implement Validation
A skill is a modular instruction block that can be called from any part of the program. The skill receives the raw record, performs all checks, and returns either a cleaned record or an error. Because the skill is a first‑class primitive, it can be reused across multiple tools, shared between services, and versioned independently.
# Claude skill example
def validate_record(record):
# Schema check
if not all(k in record for k in ("id", "timestamp", "value")):
return {"error": "missing fields"}
# Type enforcement
if not isinstance(record["value"], (int, float)):
return {"error": "invalid value type"}
# Normalisation
record["timestamp"] = normalize_iso(record["timestamp"])
return {"clean": record}
The calling code then decides whether to proceed:
clean = validate_record(incoming)
if "error" in clean:
log_error(clean["error"])
else:
write_to_db(clean["clean"])
Failure Modes
Hooks fail when the attached tool is replaced without updating the hook definitions; the new tool may expose a different argument shape, causing silent validation gaps. Skills fail when callers forget to invoke the skill before a tool, leaving unchecked data to reach the database. Both patterns can produce hard‑to‑trace bugs if error handling is inconsistent.
When to Choose Hooks vs Skills
Pick hooks when validation must be enforced for a single, immutable tool and you want the runtime to guarantee the guard runs on every call. Choose skills when you need reusable validation across many tools, when the validation logic evolves independently, or when you want to compose validation with other business logic. In practice, a hybrid approach, hooks for low‑level safety and a skill for higher‑level business rules, often yields the most robust pipeline.
Worked Example 2: Multi‑Agent Search Aggregation (Subagents vs Skills)
In a typical enterprise search scenario a user asks a question that may require several independent lookups: a vector similarity search, a keyword index query, and a live API call to a product catalog. The system must combine the results, rank them, and return a concise answer. Two ways to orchestrate this flow are (1) a single skill that internally calls each service sequentially, and (2) a set of subagents that run in parallel and report back to a coordinator. The latter matches the “parallel context” primitive described earlier, while the former relies on the modular instruction pattern.
Below is a minimal implementation of the skill‑based approach. The skill receives the user query, then calls three helper functions one after another. Each helper returns a list of candidate snippets. The skill merges the lists, sorts by a simple relevance score, and formats the top three items.
# skill_search_aggregation.py
def vector_search(query):
# placeholder for a vector DB call
return [{"text": "Vector result A", "score": 0.92},
{"text": "Vector result B", "score": 0.88}]
def keyword_search(query):
# placeholder for a keyword index call
return [{"text": "Keyword result X", "score": 0.85},
{"text": "Keyword result Y", "score": 0.80}]
def catalog_api(query):
# placeholder for a live API call
return [{"text": "Catalog item 1", "score": 0.90},
{"text": "Catalog item 2", "score": 0.75}]
def aggregate_search(query):
# Sequential execution
results = []
results.extend(vector_search(query))
results.extend(keyword_search(query))
results.extend(catalog_api(query))
# Simple relevance sort
results.sort(key=lambda r: r["score"], reverse=True)
top = results[:3]
return "\n".join(f"- {r['text']} (score {r['score']:.2f})" for r in top)
The skill is easy to read and requires only one Claude invocation. However, the three searches run one after another, so the total latency equals the sum of the three service times. If each call averages 300 ms, the skill will take roughly 900 ms before the model can generate a response.
A subagent‑based solution replaces the sequential calls with three independent agents that each perform a single search. The coordinator spawns the subagents, waits for all results, then merges them. Because the subagents run concurrently, the overall latency is bounded by the slowest call rather than the sum.
# subagent_search_aggregation.py
from concurrent.futures import ThreadPoolExecutor
def subagent_worker(func, query):
return func(query)
def parallel_aggregate(query):
funcs = [vector_search, keyword_search, catalog_api]
with ThreadPoolExecutor(max_workers=3) as pool:
futures = [pool.submit(subagent_worker, f, query) for f in funcs]
results = []
for fut in futures:
results.extend(fut.result())
results.sort(key=lambda r: r["score"], reverse=True)
top = results[:3]
return "\n".join(f"- {r['text']} (score {r['score']:.2f})" for r in top)
The subagent pattern introduces a coordination step, but the model only needs to invoke the coordinator once. The coordinator can be a small Claude prompt that receives the three result blobs, merges them, and produces the final answer. This reduces end‑to‑end latency to roughly 300 ms plus a small coordination overhead, a significant improvement for interactive applications.
When to choose subagents over a skill? Use subagents when the constituent operations are independent, have comparable latency, and can benefit from parallel execution. Use a skill when the steps must be ordered, share intermediate state, or when the overhead of spawning subagents outweighs the latency gain. In practice, many search pipelines fit the parallel pattern, making subagents the preferred primitive for low‑latency aggregation.
Performance, Cost, and Operational Trade‑offs
When an application scales, the choice between hooks, skills, and subagents becomes a question of latency, token consumption, and runtime complexity. Hooks are lightweight extensions that run in the same inference pass as the main Claude request. Because they do not spawn additional model calls, they add only a few milliseconds of overhead for the extra prompt tokens. The cost impact is proportional to the extra tokens, which is usually negligible compared with the base request. Hooks are therefore the most economical option when the transformation is simple, deterministic, and can be expressed as a short prompt fragment.
Skills encapsulate a reusable instruction set that the model can invoke on demand. Each skill call typically triggers a separate Claude invocation, because the skill is packaged as a distinct prompt that the model processes independently. This extra round‑trip adds network latency and doubles the token count for the original request plus the skill payload. The cost increase can be significant if the skill is called repeatedly in a loop or if the skill itself contains many examples. However, skills provide isolation: they can be versioned, audited, and swapped without touching the main prompt. For workloads that require strict separation of concerns, such as compliance checks or domain‑specific reasoning, the added cost is often justified.
Subagents are full‑fledged parallel agents that each run their own Claude instance. They enable concurrent processing of independent tasks, which can reduce overall wall‑clock time when the workload is embarrassingly parallel. The trade‑off is a multiplicative cost factor: each subagent consumes its own compute quota, and the orchestration layer must manage token budgeting across all agents. Subagents also increase operational complexity; developers must handle failure detection, result aggregation, and state synchronization. In practice, subagents are appropriate when the problem naturally decomposes into independent branches, such as multi‑source search aggregation or large‑scale data enrichment pipelines.
From a performance perspective, the hierarchy is clear: hooks < skills < subagents in terms of raw latency. From a cost perspective, the same ordering holds, with hooks being the cheapest, skills moderate, and subagents the most expensive. Operationally, hooks require the least plumbing, skills need a modest orchestration layer, and subagents demand a full agent management framework. Engineers should match the primitive to the workload’s tolerance for latency, budget constraints, and the complexity they are willing to maintain. Use hooks for fast, low‑cost transformations; skills for reusable, moderately expensive logic; and subagents when parallelism outweighs the added expense and operational burden.
Best‑Practice Checklist for Engineers
-
Identify the control granularity you need
Use a Hook when you must intervene at a precise moment in the tool‑use cycle, such as before a tool call (PreToolUse) or after a tool returns (PostToolUse). Choose a Skill when you want to encapsulate a reusable instruction set that can be invoked from multiple places without altering the surrounding flow. Opt for a Subagent when you need a separate reasoning thread that can run in parallel or maintain its own context. -
Match the primitive to the statefulness requirement
Hooks are stateless by design; they receive the current request and may modify it, but they do not retain memory across invocations. Skills can carry implicit state through parameter passing or by embedding context in the instruction payload. Subagents maintain their own long‑lived context, making them suitable for tasks that accumulate knowledge over time. -
Consider latency and cost
Hooks add minimal overhead because they execute inline with the main agent. Skills introduce a small dispatch cost; keep the instruction payload concise to avoid unnecessary token consumption. Subagents incur the highest cost due to parallel execution and separate context windows; limit their use to scenarios where the benefit of concurrent reasoning outweighs the expense. -
Design for testability
Write Hook logic as pure functions that accept input and return modified output; unit‑test them with representative payloads. Package Skills as self‑contained prompt templates; validate them with a suite of example calls. For Subagents, define clear entry and exit contracts; simulate the full multi‑agent flow in an integration test. -
Document failure modes
Hooks can fail silently if they do not return a modified request; add explicit logging and fallback paths. Skills may produce ambiguous output if the prompt is under‑specified; include validation steps and error messages. Subagents can diverge from the main plan; implement heartbeat checks and timeout handling. -
Apply version control consistently
Store Hook functions in a dedicated module, versioned alongside the main codebase. Keep Skill templates in a separate directory with semantic version tags. Treat Subagent definitions as micro‑services; version their prompts and any supporting code independently. -
Monitor runtime metrics
Track invocation counts for Hooks, execution latency for Skills, and parallel thread utilization for Subagents. Use these signals to adjust granularity, merge overly fine‑grained Hooks into Skills, or consolidate Subagents that rarely run concurrently. -
Iterate based on empirical data
Run A/B experiments comparing a Hook‑only implementation against a Skill‑based alternative for the same use case. Record success rates, token usage, and latency. Choose the primitive that consistently meets the target SLA while keeping the codebase maintainable.
Conclusion and Future Directions
The three Claude code primitives, hooks, skills, and subagents, address distinct engineering challenges. Hooks give fine‑grained control over tool invocation, making them ideal for deterministic preprocessing or postprocessing steps. Skills encapsulate reusable instruction bundles, allowing engineers to compose complex behavior from modular pieces without rewriting prompts. Subagents enable parallel execution of independent contexts, which shines in workloads that benefit from concurrent search, aggregation, or branching logic.
When selecting a primitive, the decision tree presented earlier remains the most reliable guide. Start by asking whether the problem requires strict sequencing and deterministic guarantees; if so, a hook is the safest bet. If the goal is to reuse a pattern across multiple prompts or services, wrap that pattern in a skill. When the workload can be split into independent streams that must be reconciled later, a subagent will typically reduce latency and improve throughput.
Looking ahead, the Claude platform is likely to evolve in three directions that will affect how engineers use these primitives. First, tighter integration between hooks and the tool layer is expected, exposing richer metadata about tool state and enabling conditional branching without additional prompt engineering. Second, the skill registry will become searchable and versioned, allowing teams to share and audit skill definitions across projects, which will improve reproducibility and governance. Third, subagents will gain built‑in coordination primitives such as barrier synchronization and result merging, reducing the need for custom aggregation logic.
Engineers should monitor the Anthropic documentation for these updates and plan incremental migrations rather than wholesale rewrites. A pragmatic path is to start with skills for any reusable logic, add hooks where deterministic ordering is required, and introduce subagents only when the performance gains justify the added complexity. By aligning primitive choice with the underlying problem structure and staying aware of upcoming platform capabilities, teams can build systems that remain maintainable, cost‑effective, and ready for future enhancements.