Definition
A trust boundary is the line where you stop treating model output as truth and start requiring evidence from tools, logs, and artifacts.
In agentic work, the default is:
- Text is untrusted
- Artifacts are trusted (diffs, test results, exit codes, hashes, URLs you can open)
Why it exists
LLMs are good at explaining what should happen. They are also capable of:
- misreading code,
- assuming commands succeeded,
- or producing process confabulation (“I ran the tests” when nothing ran).
A trust boundary turns that into a solvable engineering problem.
What crosses the boundary
Things you can verify mechanically:
git diffoutput- command exit codes
- test reports / coverage summaries
- generated artifacts (search index, build output)
- security headers in a response
- counts and lists produced by scripts (mechanical counting)
This evidence is usually bundled into a build receipt.
How to enforce it
- Separate roles. An agent that edits code should not be the agent that approves it.
- Require receipts. No receipt, no merge.
- Sandbox risky work. Use a shadow fork or least-privilege environment.
- Use adversarial checks. Add oppositional validation for high-risk changes.
Practical rule
If a statement would change what you deploy or merge, it must be backed by a tool output you can rerun.