DEV Community

Cover image for AI Harness v0.6.0 — Harness as Code Gets Its Reference Implementation
Hector Flores
Hector Flores

Posted on • Originally published at htek.dev

AI Harness v0.6.0 — Harness as Code Gets Its Reference Implementation

Most AI harnesses start as a prompt and a wrapper. They get to v1.0 by
accumulating branches in the wrapper. AI Harness took the opposite path:
codify governance as typed artifacts, then make the wrapper as small as
possible.

v0.6.0 is the first release where that bet looks proven.

If you've been following the Harness as Code thesis,
this is the release where the runtime catches up to the philosophy.

What v0.6.0 actually changes

Four things matter in this release. Everything else is supporting work.

1. Typed artifact bundles are real

Shape A bundles — .harness/{plugins,builtins,overrides}/*.md — are now
first-class. The bundle loader (PR #123) closed the gap where the artifact
registry already understood harness_artifact/v1alpha1 declarations but
serve and validate quietly ignored them.

One file = one capability bundle. Tools, hooks, and prompts that belong to
the same governance unit live in the same artifact.

2. The agent loop is hardened

A strict finish_reason guard now sits at the top of the loop (PRs #121, #123):

finish_reason Behavior
stop, end_turn, "" Fall through to a final answer
length Retriable error — context truncated
content_filter Hard error — no silent recovery
anything else, no tool calls Retriable error — no silent stop

No more "agent quietly stopped on turn 14 and we don't know why."

3. Reference docs are complete

Every public surface has an exhaustive reference page now:

  • harness.md frontmatter — every field, every default, every validate() check
  • Tool artifact schema — file shape, parameters, Starlark dialect, async reserved
  • Hook artifact schema — full event catalog, payload shapes, decision contract, when: semantics
  • Starlark built-ins — every builtin from scripting.Engine.makeBuiltins, per-module
  • CLI — every subcommand, flag, env var, exit code

No more "read the source." The docs are now the contract.

4. The live bot is governed

@htekdevaiharness on Telegram runs the
same Shape A bundles you'd ship to your own team:

$ harness validate -v
21 tools registered (across harness.md + 2 plugin bundles)
5 hooks registered
Enter fullscreen mode Exit fullscreen mode

That count comes from a notes-bundle (note save/list + audit hook) and a
safety-bundle (command guard + output redactor + status tool) both loaded
as typed artifacts. The same loader. The same precedence rules. The same
docs you'd read.

Why typed artifact bundles matter

This is the conceptual centerpiece, and it's where AI Harness takes the
strongest position against everything else in the category.

Most "extension" systems give you one file per capability and pretend that's
the answer. The reality: a real capability is rarely one tool. It's a tool
plus a hook plus a guard plus a default prompt fragment. Splitting those
across four files breaks composability — you can no longer move "the safety
capability" between repos as one diff.

Shape A bundles fix that. Each .md file declares a single capability bundle:

---
artifact: harness_artifact/v1alpha1
kind: plugin            # plugin | builtin | override
name: safety-bundle
priority: 40
---

# Safety bundle

Tools, hooks, and prompts that govern destructive operations.

## Tool: command_guard
...

## Hook: tool.pre / output_redactor
...
Enter fullscreen mode Exit fullscreen mode

Composition is deterministic. Precedence is declared at the kind level:

override > harness > builtin > plugin > model
Enter fullscreen mode Exit fullscreen mode

Per-turn evaluation re-checks each artifact's when: predicate every turn,
not just at startup. An artifact that's inactive on turn 3 can light up on
turn 4 without restarting the agent.

This is the line that separates "extensions" from Harness as Code:
the unit of governance is the bundle, not the individual file. You can
review one diff. You can move one folder. You can audit one artifact. The
runtime composes them deterministically.

Things you can actually inspect now

Three commands that didn't quite work two releases ago and now are the
daily-driver:

harness validate -v

Registers every artifact, runs every parser, prints a per-bundle tool/hook
count. On the live bot today: 21 tools / 5 hooks across harness.md +
two plugin bundles. If the number doesn't match what you expect, your
bundle isn't loading. That's the loop.

harness context --verbose

Shows what the agent saw on a given turn:

  • which chunks were assembled into the system prompt
  • where each chunk came from (which artifact, which file)
  • which artifacts were active vs inactive
  • which when: predicates passed
  • total token spend, broken down by source

Context observability is not an afterthought. It is shipped.

harness artifacts

Flat list of every loaded artifact with its priority, kind, source file,
and active/inactive state. Useful when you need to answer "is this hook
actually firing?" without grepping through bundles.

What's still off the menu

Honesty matters. v0.6.0 is not a "we figured it all out" release.

  • Compaction engine vs hooks — open question (#69 / roadmap). The leading candidate is hooks-driven compaction in v0.7.
  • Memory persistence — flat-files today; SQLite is on the table for v0.7.
  • Sub-agent supervision — primitive level, not orchestration level. Phase 7 territory.
  • Async tool callsasync: is reserved in the tool schema (parsed but not propagated through ToolConfig). Wired in Phase 3.
  • agent.stop hook event — the strict finish_reason guard ships in v0.6.0, but the proper hook primitive (issue #104) is held for v0.7.0 so it can get its own design pass.

If you need any of those today, you're early. That's fine. The core's
shape is what we're committing to in v0.6.0; the edges are still moving.

The pre-1.0 schema-evolution clause stays in effect: artifact frontmatter
fields can still change between minor releases. The CHANGELOG calls every
break out explicitly.

How to try it

go install github.com/htekdev/ai-harness/cmd/harness@latest
harness init my-agent
cd my-agent
harness validate -v
harness serve --source stdin
Enter fullscreen mode Exit fullscreen mode

Then drop a Shape A bundle into .harness/plugins/:

---
artifact: harness_artifact/v1alpha1
kind: plugin
name: my-first-bundle
priority: 50
---

## Tool: hello
Say hello and exit.

## Hook: tool.post / log-everything
Print every tool call to stderr.
Enter fullscreen mode Exit fullscreen mode

Re-run harness validate -v. The tool/hook count should go up. That's the
loop. That's the whole product surface.

The bigger arc

  • v0.4.0 was the first usable harness.
  • v0.5.0 was the first one with proper claims verification (Ralph loop at the delegation boundary).
  • v0.6.0 is the first one where the artifact model, the loop, and the docs all line up with the Harness-as-Code thesis.

That's the milestone worth marking. v0.7 is async, memory persistence, and
the compaction engine. After that, v1.0 is a positioning question, not an
engineering one.

Where to go next

If you've been waiting for "the small one with real governance," this is it.

Top comments (0)