Building AI Dev Tooling in Public

This blog documents our hands-on experience building practical agent systems: harness_core, repo-level context mapping, and A/B-tested AI workflows for real engineering teams.

The Boring Infrastructure of Reliable Agents

Why reliability isn’t just prompt engineering—it’s boring DevOps, transport adapters, and handling messy API quirks.

The Dual-Loop: Deterministic Gates vs. Adversarial Review

Why passing the linter isn’t enough: pairing fast deterministic checks with slow, adversarial AI reviews.

AgentRig-on-AgentRig: Self-Verifying AI Workflows

How we test the harness itself: running the full agent lifecycle inside a unit test.

The Inversion of Responsibility: Onboarding AI as an Architect

Why ‘Creative Assistant’ mode fails for real software, and how we inverted the context flow.

From Script to Platform: Decoupling Domain Logic

How we transformed AgentRig from a ContextLab-specific script into a reusable, language-agnostic AI governance platform.

Evidence Contracts Over Placeholder Artifacts

Why strategic tasks fail agent execution, and how we fixed it with mandatory evidence contracts.

Designing Tasks for Low-Context Coding Agents

Why valid JSON tasks fail execution, and the strict briefing contracts required for low-context agents.

From Plan to Backlog: Deterministic Decomposition That Keeps Traceability

How we fixed plan-to-task decomposition so generated backlogs stayed accountable, traceable, and complete.

Why "Looks Good" Plans Fail: Building Enforceable Planning Gates

What broke in our early planning loop, and how we moved from subjective review to enforceable planning gates.

Introducing AgentRig: A Practical AI Dev Tooling Harness

Why we built AgentRig, what it does, and what we are learning while building AI-assisted engineering workflows.