Andrew

Posted on Jun 20 • Originally published at andrew.ooo

codebase-memory-mcp Review: 99% Token Cut for Code Agents

#codebasememorymcp #mcpserver #codeknowledgegraph #aicodingagents

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

codebase-memory-mcp is a single-binary MCP server that indexes any codebase into a persistent knowledge graph in milliseconds, answers structural queries in under 1ms, and cuts agent token spend by ~99% on the kinds of "where is this called from?" / "what does this affect?" questions that otherwise drown your model in grep-and-read loops. The repo is at 8,703 GitHub stars with 4,212 added this week (#3 on GitHub Trending) and is published with an arXiv preprint reporting 83% answer quality at 10× fewer tokens and 2.1× fewer tool calls vs. file-by-file exploration across 31 real-world repos.

What makes this different from the dozen other "code graph for AI" projects: it's written in pure C (zero runtime dependencies), ships as a single static binary for macOS / Linux / Windows, vendors 158 tree-sitter grammars directly into the binary, ships Hybrid LSP semantic type resolution for 11 major languages (Python, TypeScript/JavaScript/JSX/TSX, PHP, C#, Go, C, C++, Java, Kotlin, Rust), and the install command auto-detects and configures 11 different coding agents — Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro.

Key facts:

8,703 GitHub stars, +4,212 this week (#3 GitHub Trending)
Linux kernel (28M LOC, 75K files) full-indexed in 3 minutes → 4.81M nodes, 7.72M edges
Sub-1ms structural queries via in-memory SQLite, <10ms name search, ~150ms dead-code detection on full graphs
14 MCP tools: search, trace, architecture, impact analysis, Cypher-like queries, dead code, cross-service HTTP/gRPC/GraphQL linking, ADR management
5,604 tests passing, SLSA 3 provenance, OpenSSF Scorecard badged, VirusTotal scanned every release
MIT licensed, available on npm, PyPI, Homebrew, Scoop, Winget, Chocolatey, AUR, go install
arXiv:2603.27277 preprint with reproducible benchmarks across 31 repos

Why grep-and-read loops are killing your agent bill

If you've watched a coding agent answer "what calls ProcessOrder?" in a non-trivial codebase, you've seen the pathology. It opens five files, greps for the symbol, opens five more, follows imports, greps again, and by the time it produces a half-correct answer it has consumed 50,000 tokens — most of them spent re-reading the same file headers, license blocks, and unrelated functions.

The paper behind codebase-memory-mcp (arXiv:2603.27277) puts numbers on this. Across 31 real-world repositories and five structural questions per repo, file-by-file exploration consumed ~412,000 tokens versus ~3,400 tokens when the same questions were answered from a pre-built knowledge graph. That's a 120× reduction on token spend, or 99.2% fewer tokens depending on how you frame it. Answer quality was 83% vs. 92% for file exploration — a 9-point drop, but at one tenth the cost and 2.1× fewer tool calls.

The structural insight: most "code understanding" questions are graph queries in disguise. "What calls X?" is inbound traversal. "What does Y affect?" is outbound traversal. "Where is this HTTP route defined?" is a node lookup. None of these need an LLM to read source code line by line — they need a pre-built graph and a query engine. codebase-memory-mcp is that graph + query engine, exposed over MCP so any compatible agent can use it.

How it works

You: "what calls ProcessOrder?"

Agent calls: trace_path(function_name="ProcessOrder", direction="inbound")

codebase-memory-mcp: executes graph query in <1ms, returns structured results

Agent: presents the call chain in plain English

The deliberate design choice: no built-in LLM. Other code graph tools embed one for natural-language-to-query translation, which means extra API keys, extra cost, and another model to configure. With MCP, the agent you're already talking to is the query translator. codebase-memory-mcp is a pure structural analysis backend; the intelligence layer is whatever agent you point at it.

What's inside the binary:

158 vendored tree-sitter grammars compiled in — no installs, nothing that breaks on system updates
Hybrid LSP semantic type resolution for 11 languages — a lightweight C reimplementation of major language-server type-resolution algorithms (compatible with tsserver, pyright, gopls, Roslyn, JDT, rust-analyzer)
Bundled Nomic nomic-embed-code embeddings (40K tokens, 768d int8) for semantic search — no API key, no Ollama, no Docker
In-memory SQLite with FTS5 full-text search and cbm_camel_split tokenizer (camelCase / snake_case aware)
Aho-Corasick fused pattern matching for the indexing pipeline
LZ4 compression for RAM-resident graph storage; memory released back to the OS after indexing

Quick start (60 seconds)

One-line install (macOS / Linux):

curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash

With the optional 3D graph visualization UI:

curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash -s -- --ui

Windows (PowerShell):

Invoke-WebRequest -Uri https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.ps1 -OutFile install.ps1
.\install.ps1

Restart your coding agent. Say "Index this project" — done. The install command auto-detects every coding agent on the box and writes the MCP server entry, instruction file, and any pre-tool hooks each one needs. On macOS it also handles quarantine attributes and ad-hoc codesigning automatically.

The 14 MCP tools (what your agent gets)

A condensed map of what shows up when your agent connects:

Tool	Purpose
`index_repository`	Build or refresh the graph for a path
`search_graph`	Structural search: regex names, label filters, degree bounds
`search_code`	Graph-augmented grep over indexed files only
`semantic_query`	Vector search via bundled Nomic embeddings (11-signal scoring)
`trace_path`	Inbound or outbound traversal from any node
`get_architecture`	Languages, packages, entry points, routes, hotspots, clusters in one call
`detect_changes`	Git diff → affected symbols + risk classification
`manage_adr`	Persist Architecture Decision Records across sessions
`cypher_query`	Cypher-like graph queries (`MATCH (f:Function)-[:CALLS]->(g)...`)
`dead_code`	Functions with zero callers, excluding entry points

Plus four more for cross-service linking and graph maintenance. The full list is in the README.

Real code: a Cypher-like query against your own codebase

Once installed and indexed, your agent can do this directly (and so can you, via CLI):

codebase-memory-mcp cli cypher_query '{
  "query": "MATCH (f:Function)-[:CALLS]->(g:Function) WHERE f.name = '\''handleRequest'\'' RETURN g.name, g.file_path"
}'

Or a regex name search across 158 languages:

codebase-memory-mcp cli search_graph '{"name_pattern": ".*Handler.*", "label": "Function"}'

For agents, the same calls go through MCP and return structured JSON the agent stitches into a natural-language answer.

Cross-service intelligence (the part that surprised me)

Most code-graph tools stop at the single-repo level. codebase-memory-mcp goes further:

HTTP routes ↔ call sites — links REST handlers to the code that calls them, with a confidence score.
gRPC, GraphQL, tRPC service detection, including protobuf Route node extraction.
Channel detection — EMITS / LISTENS_ON edges for Socket.IO, EventEmitter, and generic pub-sub across 8 languages, with constant resolution so EVENTS.USER_CREATED is matched correctly.
Cross-repo edges (CROSS_*) — index multiple repos under the same store and the graph stitches them together. The optional 3D UI variant has a multi-galaxy layout for this.
Infrastructure-as-code — Dockerfiles, Kubernetes manifests, and Kustomize overlays are first-class graph nodes with Resource and Module types and IMPORTS edges.

If you operate a service mesh and your agent has access to the meshed repos, this turns "what services consume the new auth header?" into a single graph query instead of a half-day grep session across 12 repos.

Performance (M3 Pro)

Operation	Time	Notes
Linux kernel full index	3 min	28M LOC, 75K files → 4.81M nodes, 7.72M edges
Linux kernel fast index	1m 12s	1.88M nodes
Django full index	~6s	49K nodes, 196K edges
Cypher query	<1ms	Relationship traversal
Name search (regex)	<10ms	SQL LIKE pre-filtering
Dead code detection	~150ms	Full graph scan + degree filtering
Trace call path (depth=5)	<10ms	BFS traversal

The RAM-first pipeline is unusual: LZ4-compressed reads, in-memory SQLite, single dump at the end, and the process releases memory back to the OS after indexing completes. Persistent storage lives in ~/.cache/codebase-memory-mcp/ and a background watcher does git-aware incremental re-indexing when files change.

Team-shared graph artifact (skip the reindex)

A clever bit of operational design: .codebase-memory/graph.db.zst is an optional, zstd-compressed snapshot of the knowledge graph that you can commit to your repo. When a teammate clones and runs codebase-memory-mcp for the first time, the artifact is decompressed and only the local diff is incrementally indexed — no full reindex.

Format: SQLite with indexes stripped, VACUUM INTO compacted, zstd 1.5.7 compressed (8–13:1 typical)
Two tiers: best (zstd -9) on explicit index_repository, fast (zstd -3) by the watcher
No merge pain: .gitattributes auto-writes merge=ours for the artifact on first export
Opt-in: if you'd rather have everyone reindex from scratch, add .codebase-memory/ to .gitignore

This is similar in spirit to graphify's graphify-out/ directory but as a single compressed file with explicit two-tier export and integrity-checked import.

Community reactions

The reception is unusually strong for a tool that does one thing well:

GitHub Trending #3 in the AI/dev category this week with 4,212 stars in seven days on top of a base of ~4.5K.
5,604 tests passing, SLSA Level 3 build provenance, OpenSSF Scorecard badged, and every release scanned by 70+ antivirus engines via VirusTotal — unusually serious supply-chain hygiene for a 4-week-old viral project.
The accompanying arXiv preprint with reproducible benchmarks lends real credibility — most "code graph for AI" projects make claims; this one publishes the methodology.
Hacker News and Reddit r/LocalLLaMA discussions have focused on the pure-C, zero-dependency angle as the differentiator vs. graphify (TypeScript/Node) and similar tooling. Single static binary + 158 vendored grammars is genuinely operationally easier.

The skeptical takes are worth holding in mind too: the arXiv paper's 83% answer quality vs. 92% for file exploration is a real 9-point drop. For exploratory questions where the agent needs to read prose comments or inline docstrings, raw file access still wins on quality. The right mental model is "graph queries for structural questions, file reads for narrative questions" — and the project actively encourages this split.

Honest limitations

No built-in LLM, by design. You need an MCP-compatible agent. If your stack doesn't speak MCP, this isn't for you (yet).
9-point answer-quality drop vs. file exploration on the arXiv benchmark. Token savings buy you 99% off the bill, not 99% off the work.
Hybrid LSP covers 11 languages, not all 158. The other 147 languages get tree-sitter AST parsing only, which is excellent for structure but weaker on type resolution.
Windows SmartScreen will warn on the unsigned binary the first time you run it — expected, mitigated by published SHA-256 checksums and VirusTotal scans.
Graph UI is a separate binary variant. If you want the 3D visualization at localhost:9749, you need the -ui- archive, not the standard one.
Indexing is RAM-hungry mid-run. On a 28M-LOC monorepo you'll want headroom (no pun intended) even though memory is released after the indexing pass completes.

When to use codebase-memory-mcp, when to skip

Great fit if you…

Use Claude Code, Codex, Cursor, or any MCP-compatible agent on a non-trivial codebase.
Pay real money for tokens on structural questions ("what calls X?", "what does Y affect?", "where is route Z defined?").
Operate multiple repos or a service mesh and want cross-repo edges.
Want a single binary you can install and forget — no Docker, no API keys, no runtime.

Skip it if you…

Work entirely in a single small repo where grep-and-read is already cheap.
Use an agent stack that doesn't speak MCP.
Need 92%+ answer quality on long, narrative-style questions where reading inline comments matters more than structure.

FAQ

Q: How does this compare to graphify, Understand-Anything, or other "code knowledge graph" tools?
A: Same problem, different operational profile. Most alternatives are Node/TypeScript with npm install chains; codebase-memory-mcp is pure C as a single static binary with 158 grammars and Hybrid LSP for 11 languages compiled in. The cross-service HTTP/gRPC/GraphQL linking and the IaC indexing (Dockerfiles, K8s, Kustomize) are also broader than what most competitors ship. The arXiv preprint makes the benchmarks reproducible.

Q: Does my code leave my machine?
A: No. All processing happens locally. The bundled Nomic embedding model is compiled into the binary; SQLite storage lives in ~/.cache/codebase-memory-mcp/. The only outbound traffic is an optional startup update check, which can be disabled.

Q: How big is the index?
A: It depends on graph density, but typical mid-size repos compress to a few MB in the .codebase-memory/graph.db.zst artifact. The Linux kernel produces 4.81M nodes and 7.72M edges — large, but still queryable in milliseconds because SQLite is in-memory during operation.

Q: Does it work with self-hosted models like Llama via Ollama?
A: Yes — through whichever MCP-compatible agent you use to drive it. The MCP server is model-agnostic; it just answers graph queries. Claude Code, Codex, Cursor, OpenCode, and others all work, and several of them support routing to local models.

Q: Is the team-shared graph.db.zst artifact safe to commit?
A: Yes, if you want to. It's an opaque SQLite snapshot, and the auto-written .gitattributes line uses merge=ours so concurrent edits don't produce binary-merge conflicts. The savings — teammates skipping a full reindex on first run — are usually worth the few MB.

Q: What if my project is a polyglot monorepo?
A: That's the sweet spot. Multi-language manifest resolution (package.json, go.mod, Cargo.toml, pyproject.toml, composer.json, pubspec.yaml, pom.xml, build.gradle, mix.exs, *.gemspec) is built in, and CROSS_* edges link nodes across the indexed fleet. Cross-service HTTP_CALLS and EMITS / LISTENS_ON edges connect services that talk over HTTP, gRPC, GraphQL, tRPC, or pub-sub channels.

Try it today

curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash

Restart Claude Code (or any agent in the supported list), and tell it: "Index this project." Then ask the question that usually triggers a 50-file grep tour — "what calls our auth middleware?", "what's affected by changing this DB schema?" — and watch the agent answer from a single trace_path call instead.

Repo: github.com/DeusData/codebase-memory-mcp · Paper: arXiv:2603.27277 · License: MIT