Skip to content
Go back

xchecker: Spec Pipeline for LLMs

Published:
·12 min read·infrastructure

Originally submitted to the Kiroween hackathon on Devpost.

Keep your code monster under control with xchecker: a Frankenstein spec harness that stitches Rust safety to LLM creativity in an auditable flow.

Inspiration

Victor’s mistake wasn’t building a monster; it was failing to take care of Adam — failing to build the structure and guardrails that would let him grow.

Vibe coding today feels a lot like Victor’s lab: wiring powerful models straight into repos without specs, receipts, or schema enforcement, then acting surprised when the results bite back.

Anthropic and others have been pushing long-running agents and harnesses: tool-first workflows over versioned artifacts instead of one-off prompts. GitHub’s Spec Kit and Amazon’s Kiro push in that direction from the workspace side.

xchecker is the pipeline version of that idea:


What xchecker does

xchecker is a spec-driven Rust CLI for the upstream part of LLM-driven development — requirements, design, tasks, review, fixups. It turns that work into a structured, auditable pipeline instead of a one-shot prompt.

Core premise:

LLMs are eager juniors. Give them a governed spec lifecycle — clear inputs, explicit phases, strong verification — instead of a single prompt and hope.

That structure is there so LLM “juniors” can move quickly while human reviewers look at finished packets and receipts instead of raw prompts and diffs. Trust, but verify.

Current scope

Right now, xchecker owns the spec side of the SDLC:

Requirements → Design → Tasks → Review → Fixup → Final (you end up with a spec, a plan, and change proposals; not a deployed service)

It does not run builds, tests, or deploys. The spine is designed so downstream phases (Build, Gate, Deploy, etc.) can attach later; this release focuses on upstream thinking and change proposals you can gate and audit.

Scope snapshot

StageStatus here
Problem → SpecImplemented (requirements, design)
Spec → TasksImplemented (task plan, review, fixups)
Build / TestNot implemented (planned downstream)
Gate / DeployNot implemented (planned downstream)

Multi-phase harness with an air gap

Each phase is first-class:

LLMs never write directly to the repo:

The junior drafts. The harness controls edits and persistence.

Deterministic packets instead of fuzzy RAG

Before any LLM call, xchecker builds a Packet: a deterministic, budgeted context window.

This is an implementation of Proactively Augmented Generation:

In practice:

And behaviour is explainable:

LLM backend layer

An LlmBackend abstraction hides provider details. In this Kiroween build xchecker runs against Anthropic’s Claude Code command-line tool; the backend is structured so other CLIs/APIs (Gemini CLI, OpenRouter, Anthropic’s HTTP API, etc.) can slot in later without rewriting phase logic.

The backend layer:

Receipts, lockfiles, and drift

Every run emits a JCS-canonical JSON receipt with:

xchecker also treats the LLM environment as a dependency, similar to Cargo.lock:

Packets + receipts + lockfiles give “what changed?” answers at the spec and environment level, not just the code diff level.

Status, gate, and doctor

Docs and tests

Doc validation:

If code and docs drift, tests fail.

The broader suite includes:

The goal is a tool you can put in CI for real projects, not just a demo.


How xchecker relates to Spec Kit, Kiro, and DemoSwarm

All of these live in the same space: “how do we get from rough intent → governed change using AI.”

They make different bets about where to sit:

Short version:

Spec Kit and Kiro set the pattern for spec-first, AI-assisted SDLC in the workspace.

DemoSwarm (separate repo) explores one way to push that pattern into a 6-flow, multi-agent SDLC swarm inside Claude Code.

xchecker explores the same direction from another angle: it takes that spec-and-task loop and grounds it in a deterministic Rust pipeline that can drive those orchestrators as nested phases, with an extra subagent layer and tighter context/receipt handling.

Side-by-side

High-level comparison – where they live and how they run

Aspect / LayerSpec KitKiroDemoSwarm (separate repo)xchecker
Home surfaceAI surfaces (Claude Code REPL, Copilot chat, Cursor, etc.)Kiro IDEClaude Code REPL + repoCLI + repo
Orchestration triggerSlash commands (/constitution, /specify, /plan, …)Click task in markdown task listSlash commands (/flow-1-signal, …)CLI phases (spec, resume, status, gate)
L0: OrchestratorHuman in AI surface steering Spec Kit commandsKiro runtime reacting to user-selected tasksClaude Code flow for each /flow-*Rust CLI orchestrator running phases (spec, resume, status, gate)
L1: Primary AI workerAssistant thread executing Spec Kit commands + toolsSingle Claude flow per selected taskNarrow domain subagents per flowProvider orchestrator flow per phase (Claude Code today, others later)
L2: Extra AI tierTools / light subagent usage when running inside Claude CodeNone – that task flow doesn’t spawn subagentsNone – Tools inside each agent; agents don’t call other agentsSubagents under provider orchestrator for micro-scoped work per phase
SDLC spanSpec → Plan → Tasks / implementationRequirements → Design → Tasks / implementationSignal → Specs → Plan → Build → Gate → Deploy → WisdomProblem → Spec → Tasks (upstream)
Receipts / gatesSpecs and logs; any gating is something you wire in CI around those filesSpecs, tasks, and steering files in the repo; rely on your normal test/CI gatesPer‑flow swarm artifacts under swarm/runs/<run-id>/, plus agentic gates inside the flows (critics, BDD runs, mutation tests)JCS‑canonical JSON receipts per phase, status --json, and a gate command intended for CI

Read another way: Spec Kit and DemoSwarm keep the orchestrator inside the assistant, Kiro drives it from a clickable task list in the IDE, and xchecker moves the orchestrator into code so it can wrap those flows, give them deeper subagent trees, and keep the whole upstream SDLC under packets, receipts, and lockfiles.


How I built it

Most of xchecker was built in Kiro’s Spec loop, with ChatGPT providing framing and review. The final stretch used Kiro’s Vibe mode on top of that foundation.

Spec-driven loop

  1. Problem framing & research (ChatGPT) Used ChatGPT to map the problem: how to harness LLMs over specs/receipts/state; how Anthropic’s harness work and prior RAG patterns fit; what guarantees were actually needed.

  2. First-pass specs (Kiro) Fed that framing into Kiro and asked for concrete requirements, design docs, and task breakdowns for xchecker as a long-running spec harness.

  3. Spec review and restructuring (ChatGPT + GitHub) Pulled those specs back into ChatGPT with the GitHub connector. Checked them against the real tree and the architecture I wanted. Tightened phase boundaries and artifact layout so they line up with what the Rust could support cleanly.

  4. Implementation (Kiro) Used the refined specs as source of truth and let Kiro drive most of the coding work, mainly on Claude Sonnet 4.5 (with Haiku for codebase research and Opus for the last couple of days):

    • LlmBackend and provider scaffolding
    • Receipt / status / doctor JSON contracts
    • Doc-validation harness using pulldown_cmark
    • WSL and Windows runner integration
    • CI/test plumbing around docs, schemas, and end-to-end flows

    My focus was architecture, invariants, and tests. MCP and hooks were intentionally left out here — they didn’t match what we needed, and I learned about Kiro’s steering at the end of Kiroween.


Challenges


What this build reinforces

Most of the ideas behind xchecker — specs, harnesses, “LLMs as juniors” — predate this repo. This build just made some of the trade-offs very concrete:

Try it out



Next Post
Proactively Augmented Generation (PAG): When the Harness Steers Retrieval