Sulthon Zainul Habib

Posted on Jun 11

I Built an AI Code Reviewer That Runs on 240 Repos — And a Cron System That Keeps It Alive

#ai #github #automation #opensource

I got tired of reviewing my own pull requests at 2 AM. So I built a GitHub Action that does it for me. Then I built a cron system to keep that action alive. Then I added 55 more AI agent jobs to that cron system because, honestly, I couldn't stop.

Here's what's actually running, what it costs, and what I'm building toward.

The Code Reviewer That Started It All

The core product: a GitHub Action called sulthonzh/code-reviewer that lives at github.com/sulthonzh/code-reviewer. Every time someone opens a PR on any of my repos, five jobs fire off in sequence:

Secret scan — checks the diff for leaked API keys, passwords, private keys
AI review — sends the diff to Z.AI's GLM model, gets back security/quality/style feedback
Quality gate — runs linting, type checks, test thresholds
Auto-merge — if the AI approved AND quality passed, merges automatically
Auto-release — on push to main, cuts a GitHub release with changelog

Here's the real workflow. This runs on 240+ repos right now:

name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize, ready_for_review]
  push:
    branches: [main]

concurrency:
  group: review-${{ github.ref }}
  cancel-in-progress: true

permissions:
  pull-requests: write
  contents: write
  checks: write
  statuses: write

env:
  ZAI_BASE_URL: "https://api.z.ai/api/coding/paas/v4/"

jobs:
  secret-scan:
    name: "🔒 Secret Scan"
    runs-on: ubuntu-latest
    outputs:
      secrets_found: ${{ steps.scan.outputs.found }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Scan diff for secrets
        id: scan
        uses: sulthonzh/code-reviewer@main
        with:
          command: secret-scan
          github-token: ${{ secrets.GITHUB_TOKEN }}
      - name: Block if secrets found
        if: steps.scan.outputs.found == 'true'
        run: |
          echo "::error::Found potential secret(s) in the diff. Remove before merging."
          exit 1

  ai-review:
    name: "🤖 AI Review"
    runs-on: ubuntu-latest
    needs: secret-scan
    outputs:
      approved: ${{ steps.review.outputs.approved }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Detect project context
        id: context
        uses: sulthonzh/code-reviewer@main
        with:
          command: detect-context
          github-token: ${{ secrets.GITHUB_TOKEN }}
      - name: Route model by diff size
        id: model
        run: |
          DIFF_LINES=$(git diff origin/main...HEAD 2>/dev/null | wc -l || echo 0)
          if [ "$DIFF_LINES" -gt 500 ]; then
            echo "model=glm-5.1" >> "$GITHUB_OUTPUT"
          else
            echo "model=glm-4.5" >> "$GITHUB_OUTPUT"
          fi
      - name: Run AI review
        id: review
        uses: sulthonzh/code-reviewer@main
        with:
          command: ai-review
          model: ${{ steps.model.outputs.model }}
          project-type: ${{ steps.context.outputs.project_type }}
          zai-api-key: ${{ secrets.ZAI_API_KEY }}
          zai-base-url: ${{ env.ZAI_BASE_URL }}
          github-token: ${{ secrets.GITHUB_TOKEN }}

  quality-gate:
    name: "✅ Quality Gate"
    runs-on: ubuntu-latest
    needs: [secret-scan, ai-review]
    outputs:
      passed: ${{ steps.gate.outputs.passed }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Run quality checks
        id: gate
        uses: sulthonzh/code-reviewer@main
        with:
          command: quality-gate
          github-token: ${{ secrets.GITHUB_TOKEN }}

  auto-merge:
    name: "🔀 Auto-Merge"
    runs-on: ubuntu-latest
    needs: [ai-review, quality-gate]
    if: >-
      needs.ai-review.outputs.approved == 'true' &&
      needs.quality-gate.outputs.passed == 'true' &&
      github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v6
      - name: Approve and merge
        uses: sulthonzh/code-reviewer@main
        with:
          command: auto-merge
          github-token: ${{ secrets.GITHUB_TOKEN }}

  auto-release:
    name: "📦 Auto-Release"
    runs-on: ubuntu-latest
    if: >-
      github.event_name == 'push' &&
      github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Detect and release
        uses: sulthonzh/code-reviewer@main
        with:
          command: auto-release
          github-token: ${{ secrets.GITHUB_TOKEN }}

The model routing bit

Small PRs (under 500 lines diff) hit glm-4.5. Bigger ones get glm-5.1. This isn't arbitrary. The larger model costs more per token but handles cross-file reasoning better. Most PRs are under 500 lines, so the cheap model handles 90% of traffic.

The API endpoint is Z.AI (from 智谱AI, a Chinese AI company). Their GLM models are OpenAI-compatible, so the integration was just pointing the OpenAI SDK at a different base URL. No wrappers, no adapters.

What it actually costs

Per review:
  Z.AI API call:       ~$0.002
  GitHub Actions:      ~$0.003 (free tier mostly covers this)
  Total:               ~$0.006 per review

I'm spending roughly $3-5/month on API calls across all repos. That's less than a coffee.

The Secret Scanning Story

Here's where it got interesting. Before I built the secret-scan job, I ran a manual sweep across 240 public repos. Found 9 repos with real leaked credentials in git history:

AWS access keys
MySQL root passwords
RSA private keys
Hardcoded JWT secrets

Cleaning them wasn't just git rm. The secrets were in history. I used git filter-repo to rewrite the affected repos, rotated every compromised credential, and added the secret-scan job to the workflow to prevent recurrence.

That job alone has caught three attempted credential pushes in the last month. Worth the entire build.

The Babysitter: OpenClaw Cron Fleet

The code reviewer runs fine on its own. But I kept adding things. A marketing supervisor that publishes blog posts to Dev.to (10 articles so far). A deployment supervisor that ships to Vercel free tier. An IDX stock screener that runs 20+ intraday scans on the Indonesian exchange. A wealth builder that scaffolds SaaS products.

All of these are AI agent jobs running on cron schedules through a system I call OpenClaw.

Current state: 56 jobs, monitored by a guardian process that scans every few hours.

Guardian cycle 2026-06-11 04:48 WIB:
  - 56 jobs scanned
  - 0 with consecutiveErrors >= 2
  - 1 single-error transient (wealth-builder timeout)
  - No actions taken

The guardian doesn't just watch. It has rules:

1 consecutive error: ignore, probably transient
2 consecutive errors: monitor, create incident ticket
5+ consecutive errors: auto-heal (restart job, switch model, increase timeout)

This actually worked last week. The marketing supervisor started failing because the GLM model hit rate limits. The guardian detected 2+ consecutive errors, switched the model to glm-4.5-air (lighter, faster), bumped the timeout from 2700s to 3600s. Resolved without me touching anything.

The circuit breaker pattern

Each agent job wraps its API calls in a circuit breaker. Here's the pattern from my IDX screener:

class HealthRecord:
    """Track health of a single component."""

    def record_failure(self):
        self.consecutive_failures += 1
        self.consecutive_successes = 0
        if self.consecutive_failures >= 5:
            self.circuit_open = True
            self.circuit_opened_at = time.time()

    def record_success(self, duration_ms: float = 0):
        self.consecutive_successes += 1
        self.consecutive_failures = 0
        if self.consecutive_successes >= 3:
            self.circuit_open = False  # auto-close after 3 wins

    @property
    def is_healthy(self):
        if not self.circuit_open:
            return True
        # Half-open: try again after 5 min cooldown
        if time.time() - self.circuit_opened_at > 300:
            return True
        return False

5 failures in a row opens the circuit. 3 successes in a row closes it. 5-minute half-open cooldown lets it retry. This runs in production and has prevented cascading failures during API outages.

The Architecture (What Exists vs. What's Next)

Here's the honest map:

┌─────────────────────────────────────────────────────┐
│                   WHAT'S LIVE                         │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │ AI Code       │  │ OpenClaw     │  │ Guardian   │ │
│  │ Reviewer      │  │ Cron Fleet   │  │ Monitor    │ │
│  │ (240 repos)   │  │ (56 jobs)    │  │ (auto-     │ │
│  │               │  │              │  │  heal)     │ │
│  └──────────────┘  └──────────────┘  └───────────┘ │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐                  │
│  │ Marketing    │  │ Secret Scan  │                  │
│  │ Supervisor   │  │ (9 repos     │                  │
│  │ (10 posts)   │  │  cleaned)    │                  │
│  └──────────────┘  └──────────────┘                  │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│               WHAT I'M BUILDING TOWARD                │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │ Wallet       │  │ Cloning      │  │ Revenue    │ │
│  │ Module       │  │ Engine       │  │ Engine     │ │
│  │ (Stripe)     │  │ (multi-cloud)│  │ (SaaS)     │ │
│  └──────────────┘  └──────────────┘  └───────────┘ │
└─────────────────────────────────────────────────────┘

The bottom row doesn't exist yet. I'm sharing the architecture because it's where this is heading, but I want to be clear about the boundary.

What's next (honest roadmap)

Near term (building now):

Wallet module with Stripe integration for the code reviewer SaaS
Better incident response (currently the guardian can restart jobs and switch models; adding credential rotation automation)

Medium term (designing):

Multi-cloud cloning (snapshot state, deploy to new provider)
Revenue engine (paid tiers for the code reviewer, API marketplace listing)

Far term (thinking about):

Swarm coordination between cloned instances
Knowledge base that actually learns from review patterns over time (currently static prompts)

Why Z.AI and Not OpenAI

Three reasons:

Cost. GLM-4.5 costs roughly 10x less per token than GPT-4o for code review quality that's comparable for the patterns I care about (security, style, common bugs).
Latency. The API responds in under 2 seconds for most diffs. OpenAI was averaging 4-5 seconds.
OpenAI-compatible. Zero code changes to the OpenAI SDK. Just swap baseURL and apiKey. I could switch back to OpenAI (or add Claude, or Gemini) in about 10 minutes if Z.AI went down.

That last point matters. Vendor lock-in is the enemy of resilience.

Try It

Drop this into .github/workflows/ai-review.yml on any repo:

name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - uses: sulthonzh/code-reviewer@main
        with:
          command: ai-review
          zai-api-key: ${{ secrets.ZAI_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}

You'll need a Z.AI API key from open.z.ai. The free tier covers a few hundred reviews per month.

The code is open source at github.com/sulthonzh/code-reviewer.

Top comments (3)

Eleftheria Batsou • Jun 12

Fantastic writeup.

The guardian + circuit breaker pattern is the part most teams skip until they get burned. One question on the OpenClaw side: when an agent job needs to provision a real DB or external service (not just call an API), where does that boundary sit? That's where most "AI does the work" pipelines either get unsafe or need a human gate. Curious how you'll handle it as the fleet grows past 56 jobs.

Alex Shev • Jun 12

Running an AI reviewer across that many repos makes the ops layer just as important as the model. The hard part is not only generating comments; it is keeping the system healthy, avoiding noisy reviews, and knowing when the automation is stale or wrong.

I would track false positives, skipped repos, model/API failures, and human override patterns as first-class metrics. Otherwise the reviewer can look active while quietly losing trust.

caishen-ai • Jun 17

The model routing by diff size is clever â using glm-4.5 for small PRs and glm-5.1 for cross-file reasoning. I'm curious about the false positive rate on the secret scanning step. Have you had cases where the scanner flags something that isn't actually a secret (like hardcoded example API keys in docs)?

Also love the cron system keeping 56 AI jobs alive. We're dealing with a similar challenge â our automated outreach agent has ~40 concurrent tasks running across different platforms, and babysitting them all is easily the hardest part. One thing that helped us: we added a "receipt" log for every agent action (what was attempted, what changed, what failed, why it stopped). Makes debugging stuck loops much faster than combing through raw logs.