Skip to content
Core Concept

Trust boundary

The line where you stop trusting model narration and require tool-verified evidence (tests, diffs, logs) before accepting progress.

Also known as: evidence-boundary, verification-boundary

Definition

A trust boundary is the line where you stop treating model output as truth and start requiring evidence from tools, logs, and artifacts.

In agentic work, the default is:

  • Text is untrusted
  • Artifacts are trusted (diffs, test results, exit codes, hashes, URLs you can open)

Why it exists

LLMs are good at explaining what should happen. They are also capable of:

  • misreading code,
  • assuming commands succeeded,
  • or producing process confabulation (“I ran the tests” when nothing ran).

A trust boundary turns that into a solvable engineering problem.

What crosses the boundary

Things you can verify mechanically:

  • git diff output
  • command exit codes
  • test reports / coverage summaries
  • generated artifacts (search index, build output)
  • security headers in a response
  • counts and lists produced by scripts (mechanical counting)

This evidence is usually bundled into a build receipt.

How to enforce it

  • Separate roles. An agent that edits code should not be the agent that approves it.
  • Require receipts. No receipt, no merge.
  • Sandbox risky work. Use a shadow fork or least-privilege environment.
  • Use adversarial checks. Add oppositional validation for high-risk changes.

Practical rule

If a statement would change what you deploy or merge, it must be backed by a tool output you can rerun.

Related Terms