I got tired of reviewing my own pull requests at 2 AM. So I built a GitHub Action that does it for me. Then I built a cron system to keep that action alive. Then I added 55 more AI agent jobs to that cron system because, honestly, I couldn't stop.
Here's what's actually running, what it costs, and what I'm building toward.
The Code Reviewer That Started It All
The core product: a GitHub Action called sulthonzh/code-reviewer that lives at github.com/sulthonzh/code-reviewer. Every time someone opens a PR on any of my repos, five jobs fire off in sequence:
- Secret scan — checks the diff for leaked API keys, passwords, private keys
- AI review — sends the diff to Z.AI's GLM model, gets back security/quality/style feedback
- Quality gate — runs linting, type checks, test thresholds
- Auto-merge — if the AI approved AND quality passed, merges automatically
- Auto-release — on push to main, cuts a GitHub release with changelog
Here's the real workflow. This runs on 240+ repos right now:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize, ready_for_review]
push:
branches: [main]
concurrency:
group: review-${{ github.ref }}
cancel-in-progress: true
permissions:
pull-requests: write
contents: write
checks: write
statuses: write
env:
ZAI_BASE_URL: "https://api.z.ai/api/coding/paas/v4/"
jobs:
secret-scan:
name: "🔒 Secret Scan"
runs-on: ubuntu-latest
outputs:
secrets_found: ${{ steps.scan.outputs.found }}
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Scan diff for secrets
id: scan
uses: sulthonzh/code-reviewer@main
with:
command: secret-scan
github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Block if secrets found
if: steps.scan.outputs.found == 'true'
run: |
echo "::error::Found potential secret(s) in the diff. Remove before merging."
exit 1
ai-review:
name: "🤖 AI Review"
runs-on: ubuntu-latest
needs: secret-scan
outputs:
approved: ${{ steps.review.outputs.approved }}
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Detect project context
id: context
uses: sulthonzh/code-reviewer@main
with:
command: detect-context
github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Route model by diff size
id: model
run: |
DIFF_LINES=$(git diff origin/main...HEAD 2>/dev/null | wc -l || echo 0)
if [ "$DIFF_LINES" -gt 500 ]; then
echo "model=glm-5.1" >> "$GITHUB_OUTPUT"
else
echo "model=glm-4.5" >> "$GITHUB_OUTPUT"
fi
- name: Run AI review
id: review
uses: sulthonzh/code-reviewer@main
with:
command: ai-review
model: ${{ steps.model.outputs.model }}
project-type: ${{ steps.context.outputs.project_type }}
zai-api-key: ${{ secrets.ZAI_API_KEY }}
zai-base-url: ${{ env.ZAI_BASE_URL }}
github-token: ${{ secrets.GITHUB_TOKEN }}
quality-gate:
name: "✅ Quality Gate"
runs-on: ubuntu-latest
needs: [secret-scan, ai-review]
outputs:
passed: ${{ steps.gate.outputs.passed }}
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Run quality checks
id: gate
uses: sulthonzh/code-reviewer@main
with:
command: quality-gate
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-merge:
name: "🔀 Auto-Merge"
runs-on: ubuntu-latest
needs: [ai-review, quality-gate]
if: >-
needs.ai-review.outputs.approved == 'true' &&
needs.quality-gate.outputs.passed == 'true' &&
github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v6
- name: Approve and merge
uses: sulthonzh/code-reviewer@main
with:
command: auto-merge
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-release:
name: "📦 Auto-Release"
runs-on: ubuntu-latest
if: >-
github.event_name == 'push' &&
github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Detect and release
uses: sulthonzh/code-reviewer@main
with:
command: auto-release
github-token: ${{ secrets.GITHUB_TOKEN }}
The model routing bit
Small PRs (under 500 lines diff) hit glm-4.5. Bigger ones get glm-5.1. This isn't arbitrary. The larger model costs more per token but handles cross-file reasoning better. Most PRs are under 500 lines, so the cheap model handles 90% of traffic.
The API endpoint is Z.AI (from 智谱AI, a Chinese AI company). Their GLM models are OpenAI-compatible, so the integration was just pointing the OpenAI SDK at a different base URL. No wrappers, no adapters.
What it actually costs
Per review:
Z.AI API call: ~$0.002
GitHub Actions: ~$0.003 (free tier mostly covers this)
Total: ~$0.006 per review
I'm spending roughly $3-5/month on API calls across all repos. That's less than a coffee.
The Secret Scanning Story
Here's where it got interesting. Before I built the secret-scan job, I ran a manual sweep across 240 public repos. Found 9 repos with real leaked credentials in git history:
- AWS access keys
- MySQL root passwords
- RSA private keys
- Hardcoded JWT secrets
Cleaning them wasn't just git rm. The secrets were in history. I used git filter-repo to rewrite the affected repos, rotated every compromised credential, and added the secret-scan job to the workflow to prevent recurrence.
That job alone has caught three attempted credential pushes in the last month. Worth the entire build.
The Babysitter: OpenClaw Cron Fleet
The code reviewer runs fine on its own. But I kept adding things. A marketing supervisor that publishes blog posts to Dev.to (10 articles so far). A deployment supervisor that ships to Vercel free tier. An IDX stock screener that runs 20+ intraday scans on the Indonesian exchange. A wealth builder that scaffolds SaaS products.
All of these are AI agent jobs running on cron schedules through a system I call OpenClaw.
Current state: 56 jobs, monitored by a guardian process that scans every few hours.
Guardian cycle 2026-06-11 04:48 WIB:
- 56 jobs scanned
- 0 with consecutiveErrors >= 2
- 1 single-error transient (wealth-builder timeout)
- No actions taken
The guardian doesn't just watch. It has rules:
- 1 consecutive error: ignore, probably transient
- 2 consecutive errors: monitor, create incident ticket
- 5+ consecutive errors: auto-heal (restart job, switch model, increase timeout)
This actually worked last week. The marketing supervisor started failing because the GLM model hit rate limits. The guardian detected 2+ consecutive errors, switched the model to glm-4.5-air (lighter, faster), bumped the timeout from 2700s to 3600s. Resolved without me touching anything.
The circuit breaker pattern
Each agent job wraps its API calls in a circuit breaker. Here's the pattern from my IDX screener:
class HealthRecord:
"""Track health of a single component."""
def record_failure(self):
self.consecutive_failures += 1
self.consecutive_successes = 0
if self.consecutive_failures >= 5:
self.circuit_open = True
self.circuit_opened_at = time.time()
def record_success(self, duration_ms: float = 0):
self.consecutive_successes += 1
self.consecutive_failures = 0
if self.consecutive_successes >= 3:
self.circuit_open = False # auto-close after 3 wins
@property
def is_healthy(self):
if not self.circuit_open:
return True
# Half-open: try again after 5 min cooldown
if time.time() - self.circuit_opened_at > 300:
return True
return False
5 failures in a row opens the circuit. 3 successes in a row closes it. 5-minute half-open cooldown lets it retry. This runs in production and has prevented cascading failures during API outages.
The Architecture (What Exists vs. What's Next)
Here's the honest map:
┌─────────────────────────────────────────────────────┐
│ WHAT'S LIVE │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ AI Code │ │ OpenClaw │ │ Guardian │ │
│ │ Reviewer │ │ Cron Fleet │ │ Monitor │ │
│ │ (240 repos) │ │ (56 jobs) │ │ (auto- │ │
│ │ │ │ │ │ heal) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Marketing │ │ Secret Scan │ │
│ │ Supervisor │ │ (9 repos │ │
│ │ (10 posts) │ │ cleaned) │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ WHAT I'M BUILDING TOWARD │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Wallet │ │ Cloning │ │ Revenue │ │
│ │ Module │ │ Engine │ │ Engine │ │
│ │ (Stripe) │ │ (multi-cloud)│ │ (SaaS) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────┘
The bottom row doesn't exist yet. I'm sharing the architecture because it's where this is heading, but I want to be clear about the boundary.
What's next (honest roadmap)
Near term (building now):
- Wallet module with Stripe integration for the code reviewer SaaS
- Better incident response (currently the guardian can restart jobs and switch models; adding credential rotation automation)
Medium term (designing):
- Multi-cloud cloning (snapshot state, deploy to new provider)
- Revenue engine (paid tiers for the code reviewer, API marketplace listing)
Far term (thinking about):
- Swarm coordination between cloned instances
- Knowledge base that actually learns from review patterns over time (currently static prompts)
Why Z.AI and Not OpenAI
Three reasons:
Cost. GLM-4.5 costs roughly 10x less per token than GPT-4o for code review quality that's comparable for the patterns I care about (security, style, common bugs).
Latency. The API responds in under 2 seconds for most diffs. OpenAI was averaging 4-5 seconds.
OpenAI-compatible. Zero code changes to the OpenAI SDK. Just swap
baseURLandapiKey. I could switch back to OpenAI (or add Claude, or Gemini) in about 10 minutes if Z.AI went down.
That last point matters. Vendor lock-in is the enemy of resilience.
Try It
Drop this into .github/workflows/ai-review.yml on any repo:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- uses: sulthonzh/code-reviewer@main
with:
command: ai-review
zai-api-key: ${{ secrets.ZAI_API_KEY }}
github-token: ${{ secrets.GITHUB_TOKEN }}
You'll need a Z.AI API key from open.z.ai. The free tier covers a few hundred reviews per month.
The code is open source at github.com/sulthonzh/code-reviewer.
Top comments (3)
Fantastic writeup.
The guardian + circuit breaker pattern is the part most teams skip until they get burned. One question on the OpenClaw side: when an agent job needs to provision a real DB or external service (not just call an API), where does that boundary sit? That's where most "AI does the work" pipelines either get unsafe or need a human gate. Curious how you'll handle it as the fleet grows past 56 jobs.
Running an AI reviewer across that many repos makes the ops layer just as important as the model. The hard part is not only generating comments; it is keeping the system healthy, avoiding noisy reviews, and knowing when the automation is stale or wrong.
I would track false positives, skipped repos, model/API failures, and human override patterns as first-class metrics. Otherwise the reviewer can look active while quietly losing trust.
The model routing by diff size is clever â using glm-4.5 for small PRs and glm-5.1 for cross-file reasoning. I'm curious about the false positive rate on the secret scanning step. Have you had cases where the scanner flags something that isn't actually a secret (like hardcoded example API keys in docs)?
Also love the cron system keeping 56 AI jobs alive. We're dealing with a similar challenge â our automated outreach agent has ~40 concurrent tasks running across different platforms, and babysitting them all is easily the hardest part. One thing that helped us: we added a "receipt" log for every agent action (what was attempted, what changed, what failed, why it stopped). Makes debugging stuck loops much faster than combing through raw logs.