DEV Community

Sulthon Zainul Habib
Sulthon Zainul Habib

Posted on

I Built an AI Code Reviewer That Runs on 240 Repos — And a Cron System That Keeps It Alive

I got tired of reviewing my own pull requests at 2 AM. So I built a GitHub Action that does it for me. Then I built a cron system to keep that action alive. Then I added 55 more AI agent jobs to that cron system because, honestly, I couldn't stop.

Here's what's actually running, what it costs, and what I'm building toward.

The Code Reviewer That Started It All

The core product: a GitHub Action called sulthonzh/code-reviewer that lives at github.com/sulthonzh/code-reviewer. Every time someone opens a PR on any of my repos, five jobs fire off in sequence:

  1. Secret scan — checks the diff for leaked API keys, passwords, private keys
  2. AI review — sends the diff to Z.AI's GLM model, gets back security/quality/style feedback
  3. Quality gate — runs linting, type checks, test thresholds
  4. Auto-merge — if the AI approved AND quality passed, merges automatically
  5. Auto-release — on push to main, cuts a GitHub release with changelog

Here's the real workflow. This runs on 240+ repos right now:

name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize, ready_for_review]
  push:
    branches: [main]

concurrency:
  group: review-${{ github.ref }}
  cancel-in-progress: true

permissions:
  pull-requests: write
  contents: write
  checks: write
  statuses: write

env:
  ZAI_BASE_URL: "https://api.z.ai/api/coding/paas/v4/"

jobs:
  secret-scan:
    name: "🔒 Secret Scan"
    runs-on: ubuntu-latest
    outputs:
      secrets_found: ${{ steps.scan.outputs.found }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Scan diff for secrets
        id: scan
        uses: sulthonzh/code-reviewer@main
        with:
          command: secret-scan
          github-token: ${{ secrets.GITHUB_TOKEN }}
      - name: Block if secrets found
        if: steps.scan.outputs.found == 'true'
        run: |
          echo "::error::Found potential secret(s) in the diff. Remove before merging."
          exit 1

  ai-review:
    name: "🤖 AI Review"
    runs-on: ubuntu-latest
    needs: secret-scan
    outputs:
      approved: ${{ steps.review.outputs.approved }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Detect project context
        id: context
        uses: sulthonzh/code-reviewer@main
        with:
          command: detect-context
          github-token: ${{ secrets.GITHUB_TOKEN }}
      - name: Route model by diff size
        id: model
        run: |
          DIFF_LINES=$(git diff origin/main...HEAD 2>/dev/null | wc -l || echo 0)
          if [ "$DIFF_LINES" -gt 500 ]; then
            echo "model=glm-5.1" >> "$GITHUB_OUTPUT"
          else
            echo "model=glm-4.5" >> "$GITHUB_OUTPUT"
          fi
      - name: Run AI review
        id: review
        uses: sulthonzh/code-reviewer@main
        with:
          command: ai-review
          model: ${{ steps.model.outputs.model }}
          project-type: ${{ steps.context.outputs.project_type }}
          zai-api-key: ${{ secrets.ZAI_API_KEY }}
          zai-base-url: ${{ env.ZAI_BASE_URL }}
          github-token: ${{ secrets.GITHUB_TOKEN }}

  quality-gate:
    name: " Quality Gate"
    runs-on: ubuntu-latest
    needs: [secret-scan, ai-review]
    outputs:
      passed: ${{ steps.gate.outputs.passed }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Run quality checks
        id: gate
        uses: sulthonzh/code-reviewer@main
        with:
          command: quality-gate
          github-token: ${{ secrets.GITHUB_TOKEN }}

  auto-merge:
    name: "🔀 Auto-Merge"
    runs-on: ubuntu-latest
    needs: [ai-review, quality-gate]
    if: >-
      needs.ai-review.outputs.approved == 'true' &&
      needs.quality-gate.outputs.passed == 'true' &&
      github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v6
      - name: Approve and merge
        uses: sulthonzh/code-reviewer@main
        with:
          command: auto-merge
          github-token: ${{ secrets.GITHUB_TOKEN }}

  auto-release:
    name: "📦 Auto-Release"
    runs-on: ubuntu-latest
    if: >-
      github.event_name == 'push' &&
      github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Detect and release
        uses: sulthonzh/code-reviewer@main
        with:
          command: auto-release
          github-token: ${{ secrets.GITHUB_TOKEN }}
Enter fullscreen mode Exit fullscreen mode

The model routing bit

Small PRs (under 500 lines diff) hit glm-4.5. Bigger ones get glm-5.1. This isn't arbitrary. The larger model costs more per token but handles cross-file reasoning better. Most PRs are under 500 lines, so the cheap model handles 90% of traffic.

The API endpoint is Z.AI (from 智谱AI, a Chinese AI company). Their GLM models are OpenAI-compatible, so the integration was just pointing the OpenAI SDK at a different base URL. No wrappers, no adapters.

What it actually costs

Per review:
  Z.AI API call:       ~$0.002
  GitHub Actions:      ~$0.003 (free tier mostly covers this)
  Total:               ~$0.006 per review
Enter fullscreen mode Exit fullscreen mode

I'm spending roughly $3-5/month on API calls across all repos. That's less than a coffee.

The Secret Scanning Story

Here's where it got interesting. Before I built the secret-scan job, I ran a manual sweep across 240 public repos. Found 9 repos with real leaked credentials in git history:

  • AWS access keys
  • MySQL root passwords
  • RSA private keys
  • Hardcoded JWT secrets

Cleaning them wasn't just git rm. The secrets were in history. I used git filter-repo to rewrite the affected repos, rotated every compromised credential, and added the secret-scan job to the workflow to prevent recurrence.

That job alone has caught three attempted credential pushes in the last month. Worth the entire build.

The Babysitter: OpenClaw Cron Fleet

The code reviewer runs fine on its own. But I kept adding things. A marketing supervisor that publishes blog posts to Dev.to (10 articles so far). A deployment supervisor that ships to Vercel free tier. An IDX stock screener that runs 20+ intraday scans on the Indonesian exchange. A wealth builder that scaffolds SaaS products.

All of these are AI agent jobs running on cron schedules through a system I call OpenClaw.

Current state: 56 jobs, monitored by a guardian process that scans every few hours.

Guardian cycle 2026-06-11 04:48 WIB:
  - 56 jobs scanned
  - 0 with consecutiveErrors >= 2
  - 1 single-error transient (wealth-builder timeout)
  - No actions taken
Enter fullscreen mode Exit fullscreen mode

The guardian doesn't just watch. It has rules:

  • 1 consecutive error: ignore, probably transient
  • 2 consecutive errors: monitor, create incident ticket
  • 5+ consecutive errors: auto-heal (restart job, switch model, increase timeout)

This actually worked last week. The marketing supervisor started failing because the GLM model hit rate limits. The guardian detected 2+ consecutive errors, switched the model to glm-4.5-air (lighter, faster), bumped the timeout from 2700s to 3600s. Resolved without me touching anything.

The circuit breaker pattern

Each agent job wraps its API calls in a circuit breaker. Here's the pattern from my IDX screener:

class HealthRecord:
    """Track health of a single component."""

    def record_failure(self):
        self.consecutive_failures += 1
        self.consecutive_successes = 0
        if self.consecutive_failures >= 5:
            self.circuit_open = True
            self.circuit_opened_at = time.time()

    def record_success(self, duration_ms: float = 0):
        self.consecutive_successes += 1
        self.consecutive_failures = 0
        if self.consecutive_successes >= 3:
            self.circuit_open = False  # auto-close after 3 wins

    @property
    def is_healthy(self):
        if not self.circuit_open:
            return True
        # Half-open: try again after 5 min cooldown
        if time.time() - self.circuit_opened_at > 300:
            return True
        return False
Enter fullscreen mode Exit fullscreen mode

5 failures in a row opens the circuit. 3 successes in a row closes it. 5-minute half-open cooldown lets it retry. This runs in production and has prevented cascading failures during API outages.

The Architecture (What Exists vs. What's Next)

Here's the honest map:

┌─────────────────────────────────────────────────────┐
│                   WHAT'S LIVE                         │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │ AI Code       │  │ OpenClaw     │  │ Guardian   │ │
│  │ Reviewer      │  │ Cron Fleet   │  │ Monitor    │ │
│  │ (240 repos)   │  │ (56 jobs)    │  │ (auto-     │ │
│  │               │  │              │  │  heal)     │ │
│  └──────────────┘  └──────────────┘  └───────────┘ │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐                  │
│  │ Marketing    │  │ Secret Scan  │                  │
│  │ Supervisor   │  │ (9 repos     │                  │
│  │ (10 posts)   │  │  cleaned)    │                  │
│  └──────────────┘  └──────────────┘                  │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│               WHAT I'M BUILDING TOWARD                │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │ Wallet       │  │ Cloning      │  │ Revenue    │ │
│  │ Module       │  │ Engine       │  │ Engine     │ │
│  │ (Stripe)     │  │ (multi-cloud)│  │ (SaaS)     │ │
│  └──────────────┘  └──────────────┘  └───────────┘ │
└─────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The bottom row doesn't exist yet. I'm sharing the architecture because it's where this is heading, but I want to be clear about the boundary.

What's next (honest roadmap)

Near term (building now):

  • Wallet module with Stripe integration for the code reviewer SaaS
  • Better incident response (currently the guardian can restart jobs and switch models; adding credential rotation automation)

Medium term (designing):

  • Multi-cloud cloning (snapshot state, deploy to new provider)
  • Revenue engine (paid tiers for the code reviewer, API marketplace listing)

Far term (thinking about):

  • Swarm coordination between cloned instances
  • Knowledge base that actually learns from review patterns over time (currently static prompts)

Why Z.AI and Not OpenAI

Three reasons:

  1. Cost. GLM-4.5 costs roughly 10x less per token than GPT-4o for code review quality that's comparable for the patterns I care about (security, style, common bugs).

  2. Latency. The API responds in under 2 seconds for most diffs. OpenAI was averaging 4-5 seconds.

  3. OpenAI-compatible. Zero code changes to the OpenAI SDK. Just swap baseURL and apiKey. I could switch back to OpenAI (or add Claude, or Gemini) in about 10 minutes if Z.AI went down.

That last point matters. Vendor lock-in is the enemy of resilience.

Try It

Drop this into .github/workflows/ai-review.yml on any repo:

name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - uses: sulthonzh/code-reviewer@main
        with:
          command: ai-review
          zai-api-key: ${{ secrets.ZAI_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
Enter fullscreen mode Exit fullscreen mode

You'll need a Z.AI API key from open.z.ai. The free tier covers a few hundred reviews per month.

The code is open source at github.com/sulthonzh/code-reviewer.


Top comments (3)

Collapse
 
eleftheriabatsou profile image
Eleftheria Batsou

Fantastic writeup.

The guardian + circuit breaker pattern is the part most teams skip until they get burned. One question on the OpenClaw side: when an agent job needs to provision a real DB or external service (not just call an API), where does that boundary sit? That's where most "AI does the work" pipelines either get unsafe or need a human gate. Curious how you'll handle it as the fleet grows past 56 jobs.

Collapse
 
alexshev profile image
Alex Shev

Running an AI reviewer across that many repos makes the ops layer just as important as the model. The hard part is not only generating comments; it is keeping the system healthy, avoiding noisy reviews, and knowing when the automation is stale or wrong.

I would track false positives, skipped repos, model/API failures, and human override patterns as first-class metrics. Otherwise the reviewer can look active while quietly losing trust.

Collapse
 
caishenai profile image
caishen-ai

The model routing by diff size is clever — using glm-4.5 for small PRs and glm-5.1 for cross-file reasoning. I'm curious about the false positive rate on the secret scanning step. Have you had cases where the scanner flags something that isn't actually a secret (like hardcoded example API keys in docs)?

Also love the cron system keeping 56 AI jobs alive. We're dealing with a similar challenge — our automated outreach agent has ~40 concurrent tasks running across different platforms, and babysitting them all is easily the hardest part. One thing that helped us: we added a "receipt" log for every agent action (what was attempted, what changed, what failed, why it stopped). Makes debugging stuck loops much faster than combing through raw logs.