TL;DR
- Kong, Portkey, and LiteLLM each hit real walls when AI usage spreads across multiple teams — mostly around cost attribution, environment isolation, and MCP/agent governance
- The choice between them depends more on what you're already running than on feature lists
- TrueFoundry handles the full stack (routing + MCP + model deployment) in one system, which simplifies things but also means more to adopt upfront
Six months ago our team started evaluating AI gateways in earnest. Not for a demo — we had five squads shipping LLM features, three different cloud providers in play, and a security review asking uncomfortable questions about who had called which model, when, and with what prompt.
I spent a few weeks actually running Kong, Portkey, and LiteLLM in staging. This is what I found.
What "AI gateway" means in practice
An AI gateway sits between your application code and your LLM providers. At minimum it handles:
- Routing requests to the right model/provider
- Retrying and failing over when a provider is down
- Rate limiting by user, team, or application
- Tracking token usage and cost
- Some form of access control
Where things get more complicated is when you're running agents. An agent workflow that calls ten MCP tools across a single conversation has different governance requirements than a simple chat endpoint — you need to know which tool was called, by which agent, under which identity, and whether it hit any policy. A routing proxy doesn't cover that. A few of the tools below do.
Kong AI Gateway
Kong has been the default API gateway for Kubernetes-based microservices for years. The AI gateway layer adds LLM-specific plugins on top of that foundation.
The good: If you're already running Kong, adding AI routing through the same control plane is genuinely low-friction. The plugin model (Lua-based, composable) is mature and customizable. Token-level rate limiting via the AI Rate Limiting Advanced plugin is the right abstraction for LLM workloads — limits based on actual token consumption, not HTTP request counts. Kong 3.13+ also added MCP and A2A (agent-to-agent) protocol support.
The friction: The open-source version doesn't include the GUI, advanced analytics, or the advanced AI rate limiting plugin — those are Konnect Enterprise. So if you're not already paying for Kong, you're evaluating two things simultaneously: the gateway and the commercial tier. Teams coming to Kong fresh for AI routing often find the operational overhead hard to justify.
More structurally: Kong treats LLM calls as HTTP requests with some extra metadata. It doesn't have native concepts for prompt lifecycle, agent tracing, or MCP server discovery. You can wire things together with plugins, but you're composing separate pieces.
When it makes sense: You already run Kong. You want one control plane for API and AI traffic. You're not starting from scratch.
Portkey
Portkey is AI-native — built from the start for LLM applications rather than adapted from general API management. That shows in the developer experience.
The good: Setup is genuinely fast. The routing config is readable. The observability dashboard surfaces token cost breakdowns in a way that's actually useful when you're debugging why a particular model is expensive. Prompt versioning with a playground is a real quality-of-life improvement. Semantic caching, retries, and fallbacks work out of the box.
The friction: Portkey's design is application-scoped, which is fine for one team but creates gaps at org scale. Environment isolation (dev vs staging vs production) isn't a native concept. Cost attribution across multiple teams needs workarounds. Budget limits per org are an Enterprise plan feature — teams on lower tiers can set per-key budgets, but not workspace-level enforcement. Log retention is 30 days on the Production tier, which doesn't meet compliance requirements in most regulated industries without upgrading.
There's also an ownership question worth naming: Palo Alto Networks completed an acquisition of Portkey. That's worth factoring into a long-term evaluation. It might mean better enterprise integration and security tooling. It might mean pricing changes or slower product velocity. It's an open question and I'd want to know the roadmap before putting Portkey in critical infrastructure.
When it makes sense: Small team, shipping fast, developer experience is the priority, governance requirements are simple.
LiteLLM
LiteLLM is the tool most AI engineers try first. OpenAI-compatible API across 100+ models, MIT license, Docker image, large community. It's genuinely the easiest starting point.
The good: Provider coverage is broad. The translation layer is clean — you write standard OpenAI-format requests and LiteLLM handles routing them to Anthropic, Bedrock, Gemini, self-hosted models, whatever. Virtual keys with per-key budgets and rate limits work once configured. The admin dashboard handles basic team management.
The friction: The governance features that actually matter at scale — SSO, RBAC, team-level budget enforcement — are behind the Enterprise license. You can't connect Okta to the open-source version. At 20 engineers that's manageable. At 200, you're either paying for the license or sharing master keys in Slack.
Config is YAML-heavy. That's fine when one engineer owns it, but it doesn't scale cleanly when multiple teams need to modify routing rules independently. Distributed rate limiting requires Redis — if Redis has a problem, your rate limit enforcement degrades. There's also no SLA and no formal audit trail support in the open-source build.
LiteLLM recently changed their support model, noting it no longer fits their scale. Worth tracking if you're depending on support as part of your infrastructure decision.
When it makes sense: Individual engineer or small team prototyping, self-hosted model access, comfort with managing your own infrastructure, willing to absorb the enterprise license cost later.
TrueFoundry
TrueFoundry's gateway connects to 1,000+ LLMs through a single OpenAI-compatible endpoint — OpenAI, Anthropic, Gemini, Bedrock, Azure OpenAI, Groq, Mistral, xAI, Together AI, self-hosted, and more. The routing layer handles load balancing by weight or latency, automatic fallback chains, and retries.
Architecture detail that matters: The gateway is built on Hono with no external calls in the hot path. Auth, rate limiting, and load balancing all run in-memory. Config syncs from the control plane via NATS. Rate limiting uses a sliding window token bucket — per-minute windows, 5-second bucket granularity — all enforced in-memory without DB lookups per request. The benchmarks show 350+ RPS on 1 vCPU / 1 GB RAM with 7–12ms added latency at full load with tracing on. The in-memory design is why it stays fast as RBAC rules and budget checks pile up.
Where it handles the org-scale problems: RBAC scopes to users, teams, and applications — not just API keys. Budget limits apply simultaneously at user, team, and application level; the most restrictive wins. You can see per-model cost attribution, not just aggregate spend. Guardrails (PII detection, prompt injection filtering, content moderation) are built in without external credentials. External providers — Azure Content Safety, Bedrock Guardrails, OpenAI Moderations, Google Model Armor — are available for teams with specific compliance requirements. Guardrails apply in validate mode (inspect, optionally block) or mutate mode (inspect and modify), and they cover MCP tool results as well as LLM responses.
The MCP layer: This is the part that doesn't exist in the other tools at the same depth. TrueFoundry has a dedicated MCP Gateway that gives you a central registry for all MCP servers. OAuth 2.0 auth is managed in one place — one token per user, auto-refreshed across all registered servers. You can compose "Virtual MCP Servers" — a logical endpoint that exposes a curated subset of tools from multiple real MCP servers, so a finance agent only sees the tools it's supposed to see. Every tool call gets a full audit trail with user identity, tool invoked, and policies evaluated.
The trade-off: TrueFoundry is more to adopt upfront than LiteLLM or Portkey. It's a platform that also handles model deployment on GPUs, fine-tuning, and agent lifecycle management. If you want just a routing proxy, the surface area is bigger than you need. If you're already asking "how do we govern our agents' tool access" or "how do we deploy and route to our own models from the same place," then the additional surface area starts being useful rather than burdensome.
Deployment options: SaaS, VPC-hosted, on-prem, and air-gapped (the air-gapped setup uses a forward proxy config documented in the architecture docs). SOC 2, HIPAA, and ITAR certified.
Feature comparison
| Kong | Portkey | LiteLLM | TrueFoundry | |
|---|---|---|---|---|
| Self-hosted | ✅ | Limited | ✅ | ✅ |
| Managed SaaS | ✅ (Konnect) | ✅ | ❌ | ✅ |
| VPC / on-prem | ✅ | Enterprise tier | ✅ | ✅ |
| Air-gapped | ❌ | ❌ | ❌ | ✅ |
| SOC 2 | ✅ | ✅ | ❌ | ✅ |
| HIPAA | ✅ | ✅ | ❌ | ✅ |
| ITAR | ❌ | ❌ | ❌ | ✅ |
| Token-level rate limiting | ✅ (plugin) | ✅ | ✅ | ✅ |
| Per-team budget enforcement | ✅ (Konnect) | Enterprise tier | Enterprise license | ✅ |
| MCP support | Plugin (3.13+) | Partial | ❌ | Native |
| Audit trail per tool call | ❌ | ❌ | ❌ | ✅ |
| Model deployment | ❌ | ❌ | ❌ | ✅ |
A few notes on this table: "Limited" and "Partial" are my characterizations based on docs at the time of writing — check current docs before making a decision. The Kong and Portkey enterprise features depend on which pricing tier you're on.
Where each tool actually lands
After running all four in staging and living with the tradeoffs, here's the honest picture:
Kong is a reasonable answer if — and only if — you're already running Kong. The AI plugins extend an existing investment well. But if you're starting fresh and evaluating gateways specifically for AI workloads, Kong's operational overhead and plugin-assembly model is hard to justify when purpose-built options exist. It's not an AI gateway that happens to do APIs; it's an API gateway that happens to do AI. That order matters.
Portkey is genuinely good for a single team moving fast. The DX is the best of the four. But the acquisition by Palo Alto Networks is a real consideration, not a footnote. When a product gets absorbed into a large security vendor, the roadmap focus shifts — pricing tends to move upmarket, iteration slows, and developer-first features get deprioritized in favor of enterprise sales. It may work out fine, but locking in Portkey as core infrastructure right now means betting on that outcome. For a team that needs a stable, actively-developed gateway over a 2–3 year horizon, that's not a comfortable bet to make today.
LiteLLM is where most teams start, and it earns that. For solo engineers and small teams it's hard to beat — MIT license, zero friction, broad provider coverage. The problem is it doesn't grow with you cleanly. The features you actually need at org scale (SSO, RBAC, audit trails, team-level budget enforcement) are all behind the enterprise license, and the YAML-heavy config model starts creating coordination problems as more teams touch it. LiteLLM is a great proof-of-concept tool that becomes a migration project when your AI usage matures.
TrueFoundry is the only one of the four that was designed assuming AI usage spreads across multiple teams, involves agents using tools, and needs to hold up under compliance scrutiny. The in-memory rate limiting, hierarchical RBAC, MCP governance layer, and the fact that it covers deployment as well as routing — these aren't features bolted on later. They're the thing it was built to do. The honest tradeoff is setup complexity upfront. But if you're past the "one team, one LLM call" stage, that upfront complexity pays back quickly against the alternative: stitching together LiteLLM for routing, a separate deployment platform, and something custom for MCP governance.
If you're starting a new AI infrastructure evaluation today and don't already have Kong in your stack, TrueFoundry is the one I'd run seriously. Not because the others are bad — they're each good at specific things — but because it's the only one where growing from "route LLM calls" to "govern agent workflows at org scale" doesn't require switching tools.
What's your current gateway setup, and where did it start breaking for you? Especially curious if anyone's navigated the LiteLLM-to-something-else migration — that transition seems to catch teams off guard. Drop it in the comments.
Top comments (2)
Great comparison. One thing I appreciate about TrueFoundry is that it goes beyond basic LLM routing and acts as a complete AI Gateway with governance, observability, access controls, rate limiting, routing, fallbacks, and MCP integration. For teams running production AI workloads, having all of these capabilities in a single platform can simplify operations significantly.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.