Hi, I'm Ryan, CTO at airCloset.
Disclaimer: "cortex" in this article is the internal codename for an AI platform built in-house at airCloset. It ...
For further actions, you may consider blocking this person and/or reporting abuse
the harness as guardrail is what makes non-engineer PRs viable. without it you're just hoping. curious what breaks first when a PMO touches a prompt that's load-bearing for 3+ workflows downstream.
The scenario shouldn't be reachable by design — load-bearing prompts (Auto Review dimension prompts, Self-Healing triage prompts, etc.) live in the "harness itself" layer, not the "build on top of the harness" layer that non-engineer PRs operate in. Same boundary as the post (stack vs. on top of), applied at the prompt layer. If a PR did cross that line, the Auto Review's impact dimension would catch it via the knowledge graph — it traces the prompt's downstream usage and refuses the merge until a human design review approves the change to the harness layer itself.
the knowledge graph tracing impact is the part that operationalizes the boundary - you can't rely on convention when non-engineers are authoring. Auto Review blocking a merge that crosses layer boundaries is the right enforcement mechanism, not documentation and code review.
Right — and the dependency on the knowledge graph here goes deeper than enforcement. Without it, you can't even tell whether a given prompt is load-bearing across multiple workflows in the first place. Convention-based review (docs + human eyes) collapses well before the prompt layer; nobody can hold "this prompt feeds three downstream workflows" in their head reliably. The graph is what makes "load-bearing" a checkable property at all, and the enforcement (Auto Review blocking on impact) just rides on top of that. Without the graph, there's nothing for the gate to check.
yeah load-bearing prompt detection is the right frame - a prompt feeding three workflows looks identical to an isolated one until something breaks. the graph surfaces that dependency before it becomes a production incident. without it you are doing blast radius analysis by memory, which does not scale past 5 workflows.
Right — "looks identical until it breaks" is exactly the failure mode that makes convention-based review useless at scale. The 5-workflow ceiling matches what I've seen too; once a codebase has more than a handful of cross-references between prompts and downstream consumers, nobody can hold the dependency graph in their head, and blast radius decisions degrade from "informed" to "guessed." Surfacing it before incident is the part that turns the graph from a debugging tool into a gating mechanism — it stops being something you query after the fact and starts being something the merge can't proceed without.
The harness idea is the important part. If the workflow has strong constraints, tests, and review points, more people can author safely without pretending every author is also an engineer.
Exactly. "Without pretending every author is also an engineer" is the line — that pretense is what breaks down at scale. The harness is what lets the same workflow safely accept changes from very different authors without lowering the quality bar.
That is the right boundary. The harness should make contribution safer, not flatten everyone into the same role. A non-engineer can bring domain taste, examples, constraints, and edge cases; the harness translates that into checks the system can actually enforce.
That is a much better model than asking every author to become half-engineer just to participate.
Beautifully put — "the harness translates that into checks the system can actually enforce" is exactly the design problem. The series wrap-up I have coming next week is almost entirely about that translation work, which I've come to think of as the whole job. You named it cleaner than I did.
That translation work feels like the real product. The hard part is not giving non-engineers a prettier text box; it is turning domain intent into enforceable constraints without making the author learn the machinery.
Looking forward to the wrap-up. That boundary between expression and enforcement is where these systems either become empowering or quietly dangerous.
"Translation work feels like the real product" is exactly how I've come to think of it. And the "prettier text box" trap is real — you can ship a sleek interface on top of the same untranslated mess, and it just routes the failure further downstream. The hard work isn't in the surface, it's in turning intent into something the system enforces while the author keeps thinking in their own language.
Your "empowering or quietly dangerous" framing — I'd phrase it slightly differently: this is fundamentally about the quality of the harness design itself. A well-built harness empowers; a sloppy one becomes a hazard. That's an old systems-engineering truth, not AI-specific — what's new is the kind of author we're handing the harness to, not the principle.
This works when the harness turns intent into reviewable structure. A non-engineer can author a change, but the system still needs hard gates around tests, ownership, and risky surfaces. Otherwise “anyone can author” becomes “no one owns the failure.”
Exactly the right framing. The harness in cortex separates authorship from failure ownership: anyone (PMO included) can author the change, but the failure ownership sits at the system layer, not the PR layer. PR-level correctness is owned by Auto Review + the test / lint gates you mentioned. System-level reliability — "did this keep running in production" — is owned by the engineers who designed the harness, backed by Self-Healing for routine fixes and human escalation for what Self-Healing can't crack. "Anyone can author" only works because "someone owns the system" is true one layer up.
The four-mechanism flywheel is convincing, and the scope fall-through catch is a great example of the gate doing real work. The open question for me is the one you flag yourself: the harness still needs engineers to extend it, and those engineers got their judgment from years of exactly the implementation work you're now routing around non-engineers. So who lays the rails in ten years? The harness buys huge leverage today but quietly depends on a generation of engineers it doesn't reproduce.
Sharp question — and one I keep coming back to. My current bet: the rails-laying side stays engineering work for a long while, and the next generation of engineers grows by doing exactly that. The implementation work isn't going away; it's moving up one level, from "implement this feature" to "design the gate that catches the next class of bug like this." That's still hard, still requires years to mature, still builds the same muscle in a different shape. Engineering doesn't reproduce by accident — it reproduces by doing real work, and there's plenty of real work at the harness layer. Whether the shape of that judgment ends up identical to today's, I doubt — but the path isn't broken either.
standing up a stack is hard, building on top of one isn't 😀
Yes, exactly. The harness is what makes the "building on top of one" half safe enough to hand off. Without the harness, even that side stays engineering work.