BlueWhale-Quant-Lab

Posted on Jun 17

How to Benchmark API Latency to Any Endpoint (Polymarket Case Study)

#python #performance #networking #devops

"Just ping it" is bad latency advice. ICMP gets deprioritized behind CDNs and tells you almost nothing about real request latency. This is how to benchmark API latency properly, with a real case study: finding where Polymarket's order book lives.

Why `ping` lies

ping measures ICMP echo round-trip. But:

CDNs and load balancers often rate-limit or deprioritize ICMP, so the number is noisy or misleadingly high/low.
It ignores TLS handshake cost, which dominates short HTTPS requests.
It tells you nothing about server processing time (TTFB).

For an API, measure what the API actually does: TCP connect, TLS, and time-to-first-byte.

A proper latency harness (Python)

import socket, ssl, time, statistics, http.client

def percentiles(xs):
    xs = sorted(xs); n = len(xs)
    return {
        "min": round(xs[0], 2),
        "p50": round(statistics.median(xs), 2),
        "p95": round(xs[int(n*0.95)-1], 2),
        "p99": round(xs[int(n*0.99)-1], 2),
        "max": round(xs[-1], 2),
    }

def tcp_connect_ms(host, port=443):
    t = time.perf_counter()
    s = socket.create_connection((host, port), timeout=5); s.close()
    return (time.perf_counter() - t) * 1000

def ttfb_ms(host, path="/"):
    t = time.perf_counter()
    c = http.client.HTTPSConnection(host, 443, timeout=5,
                                    context=ssl.create_default_context())
    c.request("GET", path); r = c.getresponse(); r.read(1); c.close()
    return (time.perf_counter() - t) * 1000

def bench(host, n=200):
    return {
        "tcp_connect": percentiles([tcp_connect_ms(host) for _ in range(n)]),
        "ttfb":        percentiles([ttfb_ms(host) for _ in range(n)]),
    }

import json
print(json.dumps(bench("clob.polymarket.com"), indent=2))

Read p99 and jitter, not just the average

The average is marketing. What kills a trading bot is the p99 — your latency during the volatile windows you actually trade in. Always report min / p50 / p95 / p99 / max. A box with p50=1.2 ms but p99=12 ms is worse than a steady p50=3 ms box.

Check jitter over time, too:

ping -i 0.5 -c 600 clob.polymarket.com | tail -3   # watch min/avg/max/mdev spread

The case study: where is Polymarket's CLOB?

I ran the harness from VPS boxes in five regions:

Region	TCP connect p50	TTFB p50
Amsterdam	~1.4 ms	~6 ms
Frankfurt	~9 ms	~16 ms
US-East	~90 ms	~110 ms
Singapore	~168 ms	~195 ms

A ~1.4 ms TCP connect is only possible within ~100 km (fiber does ~200 km/ms RTT). So the endpoint is in Amsterdam — proven by physics, not vibes. (Whether the matching engine is co-located vs behind an edge is a fair inference from the low TTFB, but the hosting decision is the same either way.)

Turning the benchmark into a decision

The whole point of benchmarking is to act on it. For Polymarket, the data says: host in Amsterdam. I moved my bot to an AMS-metro VPS and the connect time went from ~90 ms to ~1.2 ms. The box I use: the Amsterdam box I use
Disclosure: affiliate link — I earn a referral. The numbers above are from this box.

Reusable checklist

✅ Measure TCP connect + TTFB, not just ICMP.
✅ Report percentiles, especially p99.
✅ Test jitter over minutes, at different times of day.
✅ Compare multiple regions with hourly VPS boxes.
✅ Convert sub-2 ms numbers into "same metro" conclusions via the fiber speed limit.

This harness works for any endpoint — exchanges, RPCs, your own APIs. Polymarket just happens to have a satisfying answer: Amsterdam.

Numbers from my own 2026 tests. Not financial advice.

DEV Community

How to Benchmark API Latency to Any Endpoint (Polymarket Case Study)

Why `ping` lies

A proper latency harness (Python)

Read p99 and jitter, not just the average

The case study: where is Polymarket's CLOB?

Turning the benchmark into a decision

Reusable checklist

Top comments (0)

Why ping lies

A proper latency harness (Python)

Read p99 and jitter, not just the average

The case study: where is Polymarket's CLOB?

Turning the benchmark into a decision

Reusable checklist

Why `ping` lies