Rirash

Posted on Jun 17

WhatsApp Has 2 Million Connections Per Server. What's Your Number?

#elixir #erlang

The engineering case studies that prove what the BEAM can actually do

When engineers debate programming languages, the conversation often stays abstract. Performance benchmarks. Theoretical comparisons. "In principle, the actor model should scale better than..."

Let's stop being abstract. There are companies running the BEAM at a scale most engineers will never encounter, with documented results. The numbers are not theoretical. The engineers who built these systems have written about them. Let's look at the evidence.

WhatsApp: 2 Million Connections Per Server, 50 Engineers

In 2014, Facebook acquired WhatsApp for $19 billion. At the time, WhatsApp served 900 million users — with a team of 50 engineers.

The technical foundation: Erlang. Not for political reasons or engineering tribalism. Because at the time of their founding in 2009, Erlang was the only practical option for what they were building.

The numbers that engineers cite most:

2 million simultaneous connections per server — a benchmark their engineers explicitly reported
50 billion messages per day at peak
50 engineers building and maintaining this infrastructure

The 50-engineer number is the most striking. At scale comparable to mid-sized social networks, companies typically have thousands of engineers. WhatsApp did it with 50.

Their CTO Jan Koum chose Erlang for one reason: "Erlang is the right tool for the job." When you need millions of concurrent connections per node with predictable latency and automatic fault recovery, the BEAM is simply the right foundation.

The engineering blog posts from the WhatsApp team describe the specific properties that made Erlang the right choice:

Each connection is a BEAM process (~2KB)
Process isolation means one bad connection can't affect others
The preemptive scheduler keeps latency predictable even at extreme load
Supervisors restart connections automatically when they fail

This is not a language that works at scale despite being weird. It's a language that was built for exactly this scale.

Discord: 5 Million Concurrent Users on Elixir

Discord launched in 2015 and chose Elixir from the start. Their CTO Jason Citron explained the decision: they were building a real-time communication platform, and the BEAM's process model was the natural fit.

By 2020, Discord was serving 5 million concurrent users on their Elixir platform. Their engineering team published a post documenting the experience: "Real-Time Communication at Scale with Elixir at Discord."

Key quotes from Discord engineers:

"What we do in Discord would not be possible without Elixir."
"We wouldn't be able to build it with five engineers if it was a C++ codebase."

The second quote is the remarkable one. Discord's infrastructure team was small — and they attributed that to the BEAM's built-in reliability. When the runtime handles process supervision and crash recovery, your infrastructure team doesn't need to build those systems.

Later, Discord did add Rust for specific latency-sensitive components (their member list sorting and presence system). But they kept Elixir for the core real-time messaging infrastructure. This is significant: when Discord needed maximum performance, they reached for Rust, not Go or C++. They kept Elixir for the system that handles millions of concurrent connections.

Bleacher Report: Infrastructure Costs Cut 8x

Bleacher Report runs sports content for millions of concurrent users. Before Elixir, they ran Ruby infrastructure. After the migration:

Server count reduced dramatically
Infrastructure costs fell significantly
Response times improved substantially

Their engineering team wrote about the migration. The key insight: the Ruby infrastructure required many servers at peak traffic. The Elixir replacement handled the same traffic with far fewer servers. Each Elixir server handled what previously required multiple Ruby servers.

The specific mechanism: a Phoenix server handles concurrent connections in lightweight BEAM processes. A Ruby Puma server uses OS threads or processes. At the same traffic level, Elixir uses a fraction of the memory and CPU.

Klarna: Financial Infrastructure at Bank Scale

Klarna, the Swedish fintech company, uses Erlang for their payment processing infrastructure. They process billions of dollars in transactions through systems built on the BEAM.

Financial infrastructure has requirements that make the BEAM uniquely appropriate:

Zero tolerance for data corruption: BEAM processes share no memory, so a bug in one transaction can't corrupt another's state
Predictable latency: Payment confirmation needs to happen in milliseconds consistently, not just on average
Never-down requirements: Payments infrastructure can't be offline for scheduled maintenance — the BEAM's hot code upgrade capability addresses this

Klarna uses Erlang's hot code loading to deploy updates without downtime. Their infrastructure literally upgrades itself while running, without dropping connections or requiring a restart.

Riot Games: 7.5 Million Concurrent LoL Players

Riot Games runs League of Legends' chat infrastructure on Erlang. At peak, this handles 7.5 million concurrent players.

The choice was explicit: they evaluated multiple options and chose Erlang because no other option could handle the specific combination of requirements:

Millions of persistent WebSocket connections
Low-latency message delivery
Automatic recovery from node failures
Efficient memory usage at scale

Their engineering team documented that Erlang was the only technology they evaluated that met all four requirements simultaneously.

Remote.com: Unicorn Scale Monolith

Remote.com, the global HR platform, built their entire product on Elixir. They grew from startup to unicorn ($3B+ valuation) without ever decomposing their application into microservices.

In January 2025, José Valim featured Remote as an official Elixir case study. Remote's engineering team made the case that:

The Elixir monolith was simpler to develop and deploy than microservices would have been
Phoenix's performance was more than adequate for their traffic levels
OTP's supervision model gave them reliability properties that would have required significant infrastructure work in other stacks

Remote is particularly significant because it's not a "we needed millions of connections" story. It's a "we built a serious business software product and Elixir was simply the right choice" story.

BBC: Serving Hundreds of Millions with a Small Team

The BBC serves web and app traffic to hundreds of millions of users globally. They've adopted Elixir for significant parts of their infrastructure, explicitly for the small-team-large-scale properties.

Their engineering teams report that Phoenix applications handle traffic spikes gracefully — without the horizontal scaling drama that Node.js or Ruby applications require under sudden load.

What These Case Studies Have in Common

Pattern one: small teams, massive scale. WhatsApp's 50 engineers. Discord's small infrastructure team. This isn't coincidence. The BEAM's built-in reliability reduces the engineering work required to keep systems running.

Pattern two: long-lived connections at extreme concurrency. WhatsApp's 2M connections per server. Discord's 5M concurrent users. Riot's 7.5M League chat connections. The BEAM's process model — each connection is a process, each process uses 2KB — makes this scale achievable.

Pattern three: reliability without complexity. Financial systems (Klarna) that can't go down and can't corrupt data. Media systems (BBC) that can handle traffic spikes. Business applications (Remote) that iterate quickly without infrastructure drama.

Pattern four: the engineers chose it deliberately. These aren't companies that backed into Erlang/Elixir by accident. They evaluated alternatives and made an explicit, documented choice. The engineers who built these systems continue to advocate for the technology.

The Skeptic's Objection

"These are outliers. You don't need to handle 2 million connections per server."

Fair. But consider the counterargument: if you're designing a system that might grow, don't you want to choose a foundation that scales well rather than one that will require architectural rewrites under load?

Every scaling crisis looks avoidable in hindsight. The teams who avoided the crisis chose foundations that handled growth gracefully. The BEAM is one of those foundations.

The other objection: "These companies had expert Erlang/Elixir engineers."

True. But the point is that those expert engineers consistently choose the BEAM when the requirements include concurrent connections, reliability, and small team sizes. That's informative signal.

Your System's Requirements

Let me ask you to think about your own system:

How many concurrent connections do you handle at peak?
What happens when one part of your system crashes — does it affect other parts?
How much of your engineering team's time goes to infrastructure reliability rather than features?
What would happen if you could handle 10x your current load on the same hardware?

If any of these questions make you uncomfortable, the case studies above are worth reading carefully. These aren't academic exercises — they're engineering decisions made by teams with real problems, documented honestly.

Where to Go From Here

Read the original source material:

Discord's "Real-Time Communication at Scale with Elixir at Discord" (elixir-lang.org blog, 2020)
Remote's case study (elixir-lang.org blog, January 2025)
WhatsApp's Erlang engineering blog posts (search "WhatsApp Erlang")
Riot Games' "Riot's Engineering Postmortem: League of Legends 2013" (discusses the move to Erlang)

These aren't blog posts making claims. They're engineers describing systems they built, the problems they faced, and the solutions that worked.

The BEAM's reputation is earned. These companies didn't choose it for fun. They chose it because it was the right tool for genuinely hard problems — and it delivered.

The Monterail blog post "Eight Famous Companies Using Elixir — And Why They Made the Switch" collects additional case studies in one place. The official Elixir production stories at elixir-lang.org are worth bookmarking — they're primary sources from engineering teams.

DEV Community