Siddharth Pandey

Posted on Jun 13

Why Infrawise Uses Deterministic Analysis Instead of an LLM

#aws #ai #opensource #typescript

Ask your AI coding assistant which Global Secondary Indexes exist on your Orders table. It will read your repository, find a few QueryCommand calls, and answer — fluent, specific, and confident. It also has no way to know. GSI definitions live in AWS, not in your source files. The model isn't lying; the fact simply isn't available to it, so it generates the most statistically plausible substitute and delivers it in the same tone it uses for things it actually knows.

That failure mode is why Infrawise (npm) — an MCP server that gives AI coding assistants infrastructure context — contains no LLM calls at all. Every answer it serves comes from AST parsing, schema introspection, rule-based analyzers, and graph correlation. The LLM is only ever a consumer of that context, never a producer of it. This post is about why that boundary exists, and what it looks like in code.

Infrastructure questions are lookups, not generation

There are two kinds of questions you can ask a tool. "How should I model sessions in DynamoDB?" is a judgment question — many defensible answers, context matters, an LLM is genuinely useful. "Does the Sessions table have a GSI on userId?" is a fact question. It has exactly one correct answer, and that answer is sitting in a DescribeTable response.

When you route a fact question through a generative model, you convert a lookup with a perfectly accurate source into a prediction with an unknown error rate. The motivating examples in the Infrawise README are all of this shape: an assistant suggesting a .scan() on an Orders table with 50 million rows, recommending a GSI on status that already exists, or not noticing that five functions are already hammering the same partition key. None of these are reasoning failures. They are missing-fact failures, and no amount of model quality fixes them — a better model just produces a more convincing wrong answer.

So Infrawise draws a hard line: facts get extracted deterministically, and the model receives them through MCP tool calls instead of guessing.

What deterministic extraction looks like

Infrawise builds its picture of your system from three sources, none of which involve a model.

Your code, through the compiler's eyes. scanRepository() in src/context/index.ts loads the repo with ts-morph — using your own tsconfig.json when one exists — and walks every CallExpression node in every source file. It doesn't regex for the word "scan". It matches call structure against known client patterns: a DYNAMO_OPERATIONS set covering both SDK v2 method names (query, scan, getItem) and SDK v3 command classes (ScanCommand, QueryCommand, PutItemCommand), query/execute/exec calls on PostgreSQL and MySQL clients, and MongoDB collection methods — where find and aggregate are classified as scan-type operations and the rest as queries. The output is a list of extracted operations: this function performs this operation type against this table.

Your databases, through their own catalogs. The PostgreSQL adapter doesn't ask a model to summarize your schema. It runs the same introspection queries you would run by hand — information_schema.tables for tables, information_schema.columns for columns, pg_indexes for indexes, and the constraint tables for keys. The docs recommend pointing it at a dedicated read-only user, and the DynamoDB side needs only dynamodb:ListTables and dynamodb:DescribeTable permissions. What comes back isn't a description of your schema; it is your schema.

Correlation, through a graph. Both streams land in a SystemGraph: typed nodes for tables, functions, indexes, queues, topics, lambdas, buckets, secrets, parameters, and log groups, connected by typed edges like query, scan, and uses_index. The graph is what turns two boring fact lists into something an analyzer can interrogate — not just "this table exists" and "this function scans something," but "listAllOrders() scans the Orders table, and no index covers that access."

Rules, not vibes

The analysis layer is where most tools would reach for a model — and where Infrawise stays deterministic. The analyzer index exports 27 rule classes covering DynamoDB, PostgreSQL, MySQL, MongoDB, SQS, S3, Lambda, RDS, secrets, log retention, and Terraform drift. Each one is an ordinary class with an analyze(graph) method that walks the graph and emits findings.

FullTableScanAnalyzer follows scan-type edges to DynamoDB table nodes and emits a high-severity finding naming the table and every calling function. MissingGSIAnalyzer flags tables that receive query edges but have no uses_index edge — medium severity, because it might be intentional. HotPartitionAnalyzer fires when a table is accessed by five or more distinct code paths (the threshold is a constructor parameter, defaulting to 5).

Two properties fall out of this design that a model can't give you:

Findings are testable. Every analyzer is a pure function of the graph. Feed it a fixture, assert on the output, done. There's no eval harness, no sampling temperature, no "run it three times and hope." If FullTableScanAnalyzer regresses, a unit test catches it.

Failures are contained and honest. runAllAnalyzers() wraps each analyzer in its own try/catch — one analyzer crashing logs a warning while the rest keep running. The combined findings are then sorted by a fixed severity order: high, medium, low, and notably verify — a severity that exists precisely so a deterministic system can say "I detected a pattern but can't confirm the intent" instead of bluffing. An LLM has no equivalent of verify; everything it says arrives with the same confident fluency.

The LLM is the consumer, not the analyst

None of this means LLMs are useless here. It means they belong at a specific layer. Infrawise exposes the graph and findings through 15 MCP tools: get_infra_overview for a quick snapshot, analyze_function to trace a single function's tables, queues, secrets, and trigger event shapes, suggest_gsi to generate a ready-to-use GSI definition for a table and attribute, postgres_index_suggestions for index advice, and so on. The assistant decides when to ask and what to do with the answer. It never produces the answer.

The plumbing is deliberately boring: analysis results are cached as JSON files under .infrawise/cache, and the infrawise stdio process your editor spawns re-runs the analysis when the cache is older than 24 hours. Run infrawise start --claude once and it writes .mcp.json so Claude Code reconnects automatically on every future launch.

This division of labor generalizes well beyond one project. The model handles intent ("the user wants this query to be cheaper") and synthesis ("given these findings, here's the migration plan"). The deterministic layer handles every claim that has a ground truth. The test is simple: if asking the same question twice should yield the same answer, don't generate the answer — look it up.

If your AI assistant writes code against AWS or a database, give it facts instead of letting it guess: GitHub · npm.

Key takeaways

A fact question routed through a generative model turns a lookup with a perfect source into a prediction with an unknown error rate. Route facts around the model, not through it.
AST-level extraction (ts-morph walking CallExpression nodes) catches what schema introspection alone can't see — which function scans which table, and how.
Rule-based analyzers are unit-testable and fail loudly per rule; model-based analysis is neither.
A deterministic system can emit a verify severity when it isn't sure. A model can't reliably tell you when it's guessing.
Put the LLM at the boundary: it consumes structured facts over MCP and decides what to do next — it never gets to invent the facts.

Top comments (2)

HARD IN SOFT OUT • Jun 13

This is the clearest articulation of the fact‑vs‑judgment boundary I've seen. The distinction between "how should I model sessions?" (good for LLM) and "does this table have a GSI?" (bad for LLM) is exactly where teams get into trouble. The verify severity is a brilliant touch — admitting uncertainty is more honest than a confident bluff.

Two directions that could extend this:

The "ground truth staleness" problem. You cache analysis for 24 hours, but infrastructure drifts in real time. A table could get a new GSI five minutes after the cache writes. A hybrid approach — deterministic analysis on‑demand for critical checks, with a freshness header — would let the assistant say "my info is 6 hours old, let me refresh." That's more useful than a static TTL.
Cross‑service correlation beyond edges. You have typed connections (query, uses_index), but causal chains like "Lambda A writes to queue → Lambda B processes it → writes to DynamoDB → table X" aren't captured. A simple path‑tracing analyzer that warns about end‑to‑end latency or double‑scanning across a pipeline would catch whole‑system inefficiencies that single‑function analysis misses.

One small improvement: the rule classes are clean, but the threshold for HotPartitionAnalyzer (default 5) might be too low for real systems. A configurable per‑table "expected concurrent paths" would reduce false positives. Also, the verify severity is great — but what's the UI for it in the assistant? Does the tool emit a warning that the human has to resolve, or does it return a structured flag? Clarifying that would help implementers.

And the dark joke (because the LLM confidently inventing a GSI is too real):

Dev: "Does our Orders table have a GSI on customerId?"

AI: "Yes, it's called gsi_customerId. It's defined in CloudFormation."

Dev: "We don't use CloudFormation."

AI: "Then it's in the CDK stack."

Dev: "We don't use CDK either."

AI: "In that case, you should add one. I've drafted the SAM template."

Excellent engineering. This should be required reading for anyone building AI tooling for infrastructure.

Siddharth Pandey • Jun 14 • Edited

This might be the highest signal-to-noise comment the post has gotten, so thank you. I liked all three enough to turn them into tracked enhancements:

Ground-truth staleness -> #55 (github.com/Sidd27/infrawise/issues/55). You nailed the real gap. Refreshing already exists (infrawise analyze, --no-cache, and auto-refresh once the 24h TTL lapses), but the assistant has no idea the data is six hours old, so it never thinks to refresh. The cache entries already carry a timestamp internally, it's just not surfaced through MCP. So the fix is your freshness header: expose analyzedAt + age, and a "stale, go refresh" nudge past the TTL. Deterministic facts are only trustworthy if their freshness is known, which is the whole thesis, so this one stung a little.
Cross-service path tracing -> #56 (github.com/Sidd27/infrawise/issues/56). Agreed, and it's an extension rather than a new subsystem. The graph already has typed edges (publishes_to, query, scan), so an analyzer can walk Lambda A -> queue -> Lambda B -> table X and flag whole-pipeline patterns like a double scan across the chain. Keeping it to static structural correlation, not runtime latency, so it stays on the deterministic side of the fact/judgment line.
HotPartition threshold -> #57 (github.com/Sidd27/infrawise/issues/57). Caught me red-handed. The constructor accepts a threshold and then I cheerfully call new HotPartitionAnalyzer() with nothing, so the "configurable" 5 is configurable in spirit only. A per-table override is the right call. Filed as a good-first-issue, in case you want to be the person who fixes the thing you found.
The verify severity UI. It's a structured flag, not a blocking prompt. It rides on the finding object through the MCP responses and is filterable via --severity verify, so the assistant gets it as data and decides what to do, rather than the tool slamming on a human-in-the-loop brake. Uncertainty made machine-readable, not annoying.

Thanks again for pointing these out, genuinely useful. And the GSI dialogue hurt because I have watched an AI confidently draft the SAM template for infrastructure that does not exist.