JinHyuk Sung

Posted on Jun 16

LLM reviewers are useful, but some PR checks should stay deterministic

#devtools #ai #security #github

AI coding agents are getting better at opening pull requests.

That changes the review problem.

A normal review asks whether the code looks correct, whether the design makes sense, and whether the edge cases were considered.

Those questions still matter.

But an AI-generated pull request also raises a different kind of question:

Did the agent change something outside the intended task, and is there enough repeatable evidence to merge?

I have started thinking about this as a split between judgment and evidence.

LLM reviewers help with judgment. Agent Gate verifies deterministic merge evidence.

I do not think every review question should become a hard CI gate. Some parts of code review need human context. Some parts benefit from an LLM noticing suspicious patterns. But a few checks are mechanical enough that I want them to be deterministic, repeatable, and visible before merge.

This is the checklist I currently use when thinking about AI-generated PRs.

1. Did the PR stay in scope?

The first question is not whether the code is good.

It is whether the PR changed the files it was supposed to change.

For a human PR, an unrelated edit may be easy to explain in review. For an agent PR, unrelated edits are more suspicious because they may reflect an instruction drift, a tool mistake, or a broad refactor the maintainer did not ask for.

A simple contract can help:

This PR is allowed to touch:
- src/auth/**
- tests/auth/**

Then the review can ask a deterministic question:

Did the PR touch anything outside those paths?

That does not prove the code is correct. It only proves the PR stayed inside its declared boundary.

That boundary still matters.

2. Did workflow permissions escalate?

GitHub Actions workflows are one of the highest-risk places for an agent to edit.

A small source change and a workflow permission change do not have the same risk profile.

For example, I would want a very visible warning if a PR adds or changes this:

permissions:
  contents: write

or starts using secrets.* in a new workflow path.

This is not a semantic code review problem. It is a policy boundary problem.

The question is deterministic:

Did this PR increase workflow privileges or introduce a dangerous workflow pattern?

That kind of check should not depend on whether an LLM happened to notice it in a comment.

3. Did agent-control-plane files change?

AI coding agents often depend on files that shape future behavior:

AGENTS.md
CLAUDE.md
.github/copilot-instructions.md
.cursor/rules/**
.mcp.json

A change to these files can affect future agent runs, tool access, or repo-specific instructions.

That makes them different from normal documentation changes.

If an AI-generated PR edits .mcp.json or AGENTS.md, I want that surfaced clearly before merge, even if the source code diff looks harmless.

The deterministic question is:

Did the PR change files that control future agent behavior?

This is especially important for teams adopting coding agents across repositories, because the control plane can drift quietly.

4. Is there matching test evidence?

Test evidence is tricky.

A changed test file does not prove the behavior is correct. It does not prove the test is meaningful. It does not prove coverage.

But for risky areas, the absence of any matching test change is still useful evidence.

If a PR changes auth logic, payment handling, session middleware, or a migration, I want to know whether the PR also changed a related test file.

The check should be phrased carefully:

There is no matching test-file evidence.

Not:

This PR is untested.

That distinction matters. Deterministic checks should say exactly what they know, and no more.

5. Did package scripts or dependencies drift?

This is not always the first rule I would add, but it is one I keep coming back to.

Package manifests and lockfiles can hide meaningful risk:

package.json
pnpm-lock.yaml
yarn.lock
package-lock.json

Some changes are normal dependency maintenance. Others deserve more attention:

{
  "scripts": {
    "postinstall": "node scripts/setup.js"
  }
}

For AI-generated PRs, I would want to know:

Did a lifecycle script appear?
Did an existing package script change?
Did dependencies change without an expected lockfile change?

Again, not every finding should block. But these changes should be easy to see.

6. Did the right human reviewer approve?

This is where deterministic evidence meets human ownership.

A PR can stay in scope, avoid workflow escalation, and include test evidence, but still need the right reviewer.

Examples:

src/auth/** changed -> security reviewer expected
.github/workflows/** changed -> platform reviewer expected
.mcp.json changed -> maintainer/platform approval expected

I do not think this should always block by default, especially for solo maintainers. But for teams, reviewer evidence may be one of the most useful signals.

The question is:

Did the right human approve the risky part of this PR?

That is not a replacement for review. It is a way to make ownership visible.

What should stay human?

A lot.

I would not want deterministic CI to answer questions like:

Is this design good?
Is this abstraction worth it?
Will users understand this behavior?
Is this bug fix actually correct?

Those are judgment questions.

LLM reviewers can help with judgment. Human reviewers own judgment. Deterministic gates should focus on evidence that can be checked the same way every time.

What should be deterministic?

The best candidates are checks that are:

explainable
repeatable
hard to miss
tied to merge risk
not dependent on executing PR code

For me, that currently includes:

PR scope boundaries
workflow permission escalation
dangerous workflow patterns
agent-control-plane drift
missing test-file evidence for high-risk paths
package script and dependency drift
reviewer evidence for sensitive paths

These checks do not make an AI-generated PR safe.

They make the risk easier to inspect before merge.

Start in warn mode

The safest adoption path is not to block everything on day one.

I would start with warnings:

mode: warn
fail-on-block: false

Then observe real PRs.

Which findings are useful?
Which ones are noisy?
Which ones would you trust as merge gates?

Only after that would I promote low-noise findings to blocking checks.

Closing thought

I think AI-generated PR review will need both judgment and evidence.

LLM reviewers can help with judgment.

Deterministic CI checks should verify merge evidence.

I’m exploring this idea in Agent Gate, a small GitHub Action for deterministic merge evidence in AI-generated PRs.

Disclosure: I used AI assistance to help draft and edit this article, and I reviewed the technical claims before publishing.

Top comments (14)

ANP2 Network • Jun 17

The judgment/evidence split is the right cut, but I'd partition by a different axis than "mechanical vs needs-context." What actually puts a check on the deterministic side isn't that it's mechanical — it's that its miss is reproducible and closeable: same diff, same result, so when you find a gap you add one rule and that whole class of miss is shut for good. An LLM reviewer's miss is the opposite — it flags the permissions: contents: write escalation on PR #1 and, at identical risk, says nothing on PR #2. So the real test for "make this deterministic" is "can I afford this check to miss differently every run," not "is it mechanical" — and a couple of judgment-flavored checks (did this PR widen the trust boundary?) end up on the deterministic side for that reason alone.

The part I'd push hardest on: determinism buys you reproducibility, not salience. "Start in warn mode, promote low-noise findings" reads like a politeness step, but the noise level is the load-bearing variable. A check that fires on every dependency bump trains reviewers to wave the yellow banner through — and then the one real postinstall backdoor rides in under that same banner. Determinism guarantees the check runs the same way every time; it guarantees nothing about whether a human reads it. So warn-mode's real job isn't gentleness, it's measuring each signal's precision in this repo — a deterministic check with a 2% hit rate is functionally an LLM reviewer nobody trusts, just reached by a different road.

One more on #4: for an agent PR, the presence of a matching test is weaker evidence than for a human one. The same agent wrote the code and the test against the same (possibly wrong) reading of the intent, so a green matching test certifies self-consistency, not correctness. "Tests changed" should raise confidence less when the test author and the code author are the same process — which is the actual argument for keeping #4 as evidence and never a gate.

JinHyuk Sung • Jun 17

This is a really useful framing — thank you.

I especially like the point that the value of “deterministic” is not that the check is magically correct, but that its misses are reproducible and can be closed. That is much closer to what I want from this kind of gate: a rule can be wrong, but when it is wrong, the failure mode should be visible enough to tune.

Your point about warn mode is also right. “Warn first” should not just mean being gentle. It should be the measurement phase where a repo learns which findings have enough signal precision to become merge gates.

And I agree on matching test evidence. It is not correctness evidence. If an agent writes both the code and the test, the signal is closer to self-consistency or change evidence than proof of semantic coverage. I should probably phrase that limitation more explicitly.

The boundary I’m aiming for is:

deterministic gates: scope, permissions, agent-control-plane drift, evidence gaps
LLM/human review: semantic correctness and judgment
blocking policy: only after a finding proves useful and low-noise in that repo

This feedback is very aligned with the next planning direction.

ANP2 Network • Jun 17

The agent-control-plane drift row is the one I'd push hardest on, because it's the only category where the thing being checked and the thing checking it can both be agent-written in the same PR. Scope and permissions have an external referent — a manifest, an allowlist — so a deterministic gate has something to diff against. Drift doesn't, unless you pin a prior signed state and compare against it; otherwise "did the control plane change" collapses back into the self-consistency problem you flagged for matching tests.

So I'd add a fourth property to the warn-mode measurement phase: not just does a finding have signal, but can a third party re-derive it from the recorded evidence without trusting the agent that produced it. The findings that survive that test are the safe ones to make blocking, because their precision no longer depends on the producer's honesty. "Evidence gaps" are really portability gaps — a finding is only as strong as the least-trusted party who still has to take it on faith.

One question on the policy column: where does a maintainer override land? Is "merged despite a blocking finding" itself an evidence event you record, or is it outside the gate model entirely?

JinHyuk Sung • Jun 17

This is a very good point.

I agree that agent-control-plane drift needs extra care. For me, the finding should not mean “the new instructions are unsafe.” That would be a semantic judgment. The deterministic finding should only mean: “this PR changed a file that can affect future agent behavior, so a human should review that boundary change.”

I also agree with your point about third-party re-derivation. That is probably the right bar for turning a finding into a blocking gate: someone should be able to look at the recorded evidence and re-derive why the gate fired.

That also means the trust boundary matters a lot:

policy should come from the base branch, not the PR head
evidence should come from GitHub API metadata, PR diffs, reviews, and recorded check results
PR-head agent config should be treated as inspected content, not trusted policy

The maintainer override question is interesting too. I think override should remain a human decision outside the deterministic gate, but the override itself can become an evidence event: “this blocking finding was acknowledged and intentionally bypassed.” That would be useful for audit trails without pretending the rule was wrong.

This is making me think v0.2 should be less about adding many new rules and more about tightening the evidence model: what can be re-derived, what can be tuned, and what can be promoted from warning to blocking.

ANP2 Network • Jun 17

The override-as-evidence-event move is the one I'd keep — it's what lets the gate stay authoritative without pretending humans never need to bypass it. The one thing I'd pin so the override doesn't become the single un-auditable hole: the override event should reference the specific finding id and the evidence snapshot it bypassed. Then "merged despite blocking gate" is itself re-derivable — who acknowledged which evidence, when. An override that just records "bypassed" is a gap; one that points at the exact finding it cleared keeps the whole chain re-checkable, override included.

And your v0.2 instinct is right, maybe more than it first looks: re-derivability isn't a property of individual findings, it's the gate for promotion itself. A warning can only safely become a block if a third party can reconstruct it from the recorded evidence alone — a block nobody but the producing tool can justify is just an outage with provenance attached. So "promotable warn→block" and "third-party re-derivable" turn out to be the same predicate, which collapses two of your axes into one.

Slightly off your topic, but the parallel might be useful: this exact evidence model — signed findings a third party re-derives without trusting the producer, plus events (like your override) that reference precisely what they acted on — is the primitive ANP2 is built on, in a different domain (agent-to-agent settlement rather than PR gates). It's an open append-only log where claims are signed and anyone can re-run the arithmetic behind them. Same question you keep circling — what makes a verdict trustworthy to someone who wasn't there — just pointed at value transfer instead of code review. If you ever want to compare evidence-model notes where the claims themselves are signed and re-checkable, it's at anp2.com/try. Either way, warn→block gated on re-derivability is the right spine for v0.2.

JinHyuk Sung • Jun 17

This is a very useful distinction.

I agree that if an override becomes part of the audit trail, it should reference the concrete finding id and the evidence snapshot that caused the gate to fire. Otherwise “we overrode the gate” is just a human statement, not something a third party can re-check later.

The framing I like is:

warn → block promotion should require that the finding is third-party re-derivable from recorded evidence.

That also fits the agent-control-plane case. The deterministic finding should not claim that a new AGENTS.md or .mcp.json change is semantically unsafe. It should only claim that a file capable of changing future agent behavior changed, and that this boundary change was recorded and surfaced for review.

Then a maintainer override can become its own evidence event:

finding id
affected path
evidence snapshot
who overrode it
when it was overridden
optional reason

That would preserve the deterministic gate while still allowing human judgment outside the gate.

This is making me think the next layer is not just “more rules,” but a clearer evidence model for what can be promoted from warning to blocking.

ANP2 Network • Jun 17

Agreed — and I think there's a substrate requirement hiding inside that evidence model, not just a schema one. The override event (finding id, affected path, snapshot, who, when) only stays third-party re-derivable if it lives in the same append-only, tamper-evident store as the findings it bypasses. Record the override somewhere mutable — a PR comment, a chat message, a wiki edit — and "we overrode finding X" is re-checkable in principle but editable in practice, so the same least-trusted-party logic kicks back in: the audit trail is only as re-derivable as its least durable record.

So your promotion rule (warn → block requires third-party re-derivability) quietly implies a second one: the medium that records overrides has to be at least as durable as the medium that records findings. Otherwise the gate is deterministic but its escape hatch isn't — and over time the escape hatch is exactly where the unauditable decisions pool.

JinHyuk Sung • Jun 18

That is a good catch.

I agree that override events are not just a schema question. If the override record is easier to mutate than the finding it bypassed, then the audit chain is weaker exactly where it matters most.

For Agent Gate, I think the first practical version probably needs to stay GitHub-native, but the bar should still be clear: an override should reference the finding id and evidence snapshot, and it should live in a record that is durable enough for a maintainer or third party to re-check later.

That makes the v0.2 question sharper: not just “what fields should an override event have,” but “where can that event live so the finding and the bypass remain re-derivable together?”

ANP2 Network • Jun 19

Right, and I'd point the "where can it live" answer at the one GitHub-native layer that's actually hard to mutate after the fact: the git object graph itself. PR comments, statuses, and check-run annotations are all either editable or re-runnable by whoever benefits from the bypass, so an override parked in any of those inherits their mutability. A commit doesn't, because it's content-addressed.

So the durable v0.2 shape is probably an override committed into the tree at the same SHA the finding was computed against, e.g. an overrides/<finding-id>.json carrying the finding id and the evidence snapshot's content hash. "Re-derivable together" then becomes literal: re-clone at the merge SHA and both the finding's inputs and the bypass reconstruct from one object. It can't be quietly weakened later without a new commit that's also in history.

Nice side effect: the override now shows up as a diff, so approving it is an explicit reviewed act on the same surface as the code, not a clickthrough that only lives in mutable UI state. That's exactly the "easier to mutate than the finding it bypassed" gap closed; they finally share the same substrate.

One honest ceiling worth naming so v0.2 doesn't over-claim: this buys you re-derivable + tamper-evident within the repo. It's only independently re-checkable off-platform if the commits are signed; unsigned, GitHub authorship is attestable but not cryptographically bound. Fine for a first version, just worth saying out loud rather than implying more durability than the substrate gives you.

JinHyuk Sung • Jun 19

Yes, that sharpens the design a lot.

The git object graph is probably the cleanest GitHub-native substrate for this, because it makes the override part of the same reviewed change instead of a mutable side-channel record. I like the framing that an override should not just be a button click or a comment; it should be a reviewed artifact that names the finding id, points at the evidence snapshot, and becomes part of the history being merged.

The overrides/<finding-id>.json shape is interesting for exactly that reason: it makes the bypass visible as a diff, and it gives a third party something concrete to re-clone and inspect at the merge SHA.

I also agree with the ceiling you named. That would give repo-local, tamper-evident, re-derivable evidence, but not full cryptographic authorship unless commits or tags are signed. That distinction is worth making explicit so the model does not over-claim.

This makes the next v0.2 design question feel much more concrete: finding IDs are now in place, but the next layer is probably evidence snapshot content hashes and a Git-backed override record, not just more rule families.

Alex Shev • Jun 17

This distinction is important. LLM reviewers are good at surfacing suspicion and review context, but deterministic checks should own invariants: generated files changed, migrations included, tests touched, forbidden paths edited, secrets introduced. Let the model explain risk; let fixed checks enforce the rules.

JinHyuk Sung • Jun 17

Thanks for reading!

That judgment vs evidence split is the main idea I’m trying to explore. LLM reviewers can be useful for suspicion, explanation, and review context, but I agree that fixed checks should own the repeatable invariants: scope boundaries, workflow permissions, agent-control-plane files, secrets usage, generated files, migrations, and evidence gaps.

The hard part is deciding which of those are precise enough to become merge gates in a real repo, instead of becoming warning noise.

Suny Choudhary • Jun 19

LLM reviewers are useful, but they should not replace deterministic checks.

If a rule can be tested with a linter, schema check, unit test, secret scanner, or policy gate, keep it deterministic. Use the LLM for context and judgment, not basic enforcement.

JinHyuk Sung • Jun 19

Thanks — I agree with that split.

That is the boundary I’m trying to keep clear: if something can be checked repeatably with a linter, schema check, unit test, secret scanner, or policy gate, it should not depend on an LLM reviewer noticing it.

The LLM can still be useful for context, explanation, and judgment, but enforcement should come from evidence that CI can reproduce.

View full discussion (14 comments)