Gabriel Anhaia

Posted on Jun 13

select With Timeouts: 3 Channel Patterns That Prevent Goroutine Leaks

#go #concurrency #backend #goroutines

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You've seen the shape. A select with two arms: one waits for a
result on a channel, the other gives up after a timeout. It reads
like the safe version. Add a deadline, never block forever, ship
it.

Then pprof shows a goroutine count that climbs all day and never
comes back down. The handlers return on time. The callers get their
timeout errors. Somewhere underneath, goroutines are parked on a
channel send that nobody will ever receive.

A select with a timeout protects the reader. It does nothing
for the writer. If the writer is a goroutine you spawned to
produce the value, and you walked away on the timeout, that
goroutine is now trying to send into a channel with no receiver. It
blocks on that send. It never returns. That is the leak.

Three patterns below fix three versions of this. Each one comes
with a repro you can run with go run and watch leak.

Why a blocked send leaks

Start with the mechanism, because every pattern after this is a
variation on it.

An unbuffered channel send blocks until a receiver is ready. That
is the whole contract. So when you spawn a goroutine to do work and
report back on an unbuffered channel, and the parent abandons the
receive, the goroutine is stuck:

func slowValue() int {
    time.Sleep(2 * time.Second)
    return 42
}

func get() (int, error) {
    ch := make(chan int) // unbuffered
    go func() {
        ch <- slowValue() // blocks until someone receives
    }()

    select {
    case v := <-ch:
        return v, nil
    case <-time.After(500 * time.Millisecond):
        return 0, errors.New("timeout")
    }
}

The timeout fires at 500ms. get returns an error. The caller is
happy. But slowValue keeps sleeping, wakes at 2s, and tries
ch <- 42. No one is on the other end of ch anymore. The send
blocks. The goroutine parks. It is still there an hour later.

You can watch it. Drop this in main:

func main() {
    runtime.GC()
    fmt.Println("before:", runtime.NumGoroutine())
    for i := 0; i < 1000; i++ {
        get()
    }
    time.Sleep(3 * time.Second)
    runtime.GC()
    fmt.Println("after:", runtime.NumGoroutine())
}

You start near 1 and end near 1000. A thousand goroutines, each
holding its stack and whatever slowValue captured, all parked on
a dead send. That is the bug the next three patterns prevent.

Pattern 1: buffer the channel so the send never blocks

The smallest fix. Give the channel a buffer of one. The producer
can always complete its send, even if no one ever receives.

func get() (int, error) {
    ch := make(chan int, 1) // buffered, capacity 1
    go func() {
        ch <- slowValue() // never blocks: buffer has room
    }()

    select {
    case v := <-ch:
        return v, nil
    case <-time.After(500 * time.Millisecond):
        return 0, errors.New("timeout")
    }
}

One character changed: make(chan int, 1). Now when the timeout
wins the race, slowValue still wakes at 2s, still sends 42, but
the buffer accepts it. The send completes. The goroutine returns.
The value sits in the buffer until the channel is garbage
collected, which is fine — it is one int, not a parked stack.

Run the same main loop and the after-count drops back to near
1. The leak is gone.

The rule: any time a goroutine sends a single result that a
select might abandon, the result channel needs a buffer of at
least one. This is the most common channel-leak fix in real Go
code, and it costs nothing.

It only covers the "one result, fire and forget" case. If the
goroutine produces a stream of values, or you want it to actually
stop doing work when you walk away, a buffer is not enough. That
is the next pattern.

Pattern 2: signal the producer with context.Done()

A buffer lets the producer finish and exit. It does not tell the
producer to give up early. If slowValue is an expensive call
(a query, an HTTP round trip, a loop over a big dataset), you want
it to stop the moment the reader walks away, not run to completion
and discard the result.

Thread a context in and have the producer watch ctx.Done():

func get(ctx context.Context) (int, error) {
    ctx, cancel := context.WithTimeout(
        ctx, 500*time.Millisecond)
    defer cancel()

    ch := make(chan int, 1)
    go func() {
        v, err := slowValueCtx(ctx)
        if err != nil {
            return // ctx cancelled, drop the result
        }
        ch <- v // buffered, so this never blocks
    }()

    select {
    case v := <-ch:
        return v, nil
    case <-ctx.Done():
        return 0, ctx.Err()
    }
}

The producer now calls slowValueCtx, which respects the context:

func slowValueCtx(ctx context.Context) (int, error) {
    select {
    case <-time.After(2 * time.Second):
        return 42, nil
    case <-ctx.Done():
        return 0, ctx.Err()
    }
}

Two things work together. The buffer of 1 keeps the success path
from blocking on a late send. The ctx.Done() check inside
slowValueCtx lets the work itself bail at the deadline instead of
sleeping the full two seconds. When the timeout fires, the context
cancels, slowValueCtx returns immediately with ctx.Err(), the
goroutine sees the error and returns without sending. Nothing
parks, and no wasted work runs after the deadline.

Note the time.After inside slowValueCtx. That timer is harmless
here because the surrounding select returns as soon as
ctx.Done() closes — the timer is abandoned but it is a stand-in
for real cancellable work (a DB call, an HTTP request) you would
have in production. That timer is worth a closer look on its own.

The time.After trap

time.After(d) is convenient and it has a sharp edge. It creates a
time.Timer that fires after d, and that timer is not garbage
collected until it fires — even if your select already returned
through a different arm.

In a one-shot function that is nothing. In a hot loop it piles up:

func poll(ctx context.Context, ch <-chan int) {
    for {
        select {
        case v := <-ch:
            handle(v)
        case <-time.After(30 * time.Second):
            // new 30s timer allocated EVERY iteration
            log.Println("no data in 30s")
        case <-ctx.Done():
            return
        }
    }
}

Every time ch delivers a value, the select returns through the
first arm and the 30-second timer from time.After is left to run
its full 30 seconds before it is collected. A busy channel means
thousands of live timers stacking up in the runtime's timer heap.
It is not a goroutine leak, but it is a memory and CPU leak in the
same family.

The fix is a time.Timer you own and reset:

func poll(ctx context.Context, ch <-chan int) {
    t := time.NewTimer(30 * time.Second)
    defer t.Stop()

    for {
        select {
        case v := <-ch:
            handle(v)
            if !t.Stop() {
                <-t.C // drain if it already fired
            }
            t.Reset(30 * time.Second)
        case <-t.C:
            log.Println("no data in 30s")
            t.Reset(30 * time.Second)
        case <-ctx.Done():
            return
        }
    }
}

One timer for the life of the loop. As of Go 1.23 the runtime
collects an unreferenced time.After timer earlier than it used
to, so the memory pressure is lower than on old versions — but the
owned-timer version is still the one to reach for in any loop,
because it is explicit about lifetime and works the same on every
Go version.

Pattern 3: a done channel to fan in many producers

The hardest version: you have several goroutines doing work, you
want the first result that comes back, and you want every other
goroutine to stop and exit once you have it. A timeout might also
win the race. Either way, no goroutine should leak.

This is the fan-in with a shared done channel. Close the done
channel once, every producer sees it, every producer leaves.

func first(ctx context.Context, urls []string) (string, error) {
    ctx, cancel := context.WithTimeout(
        ctx, 500*time.Millisecond)
    defer cancel() // closing this signals every producer

    // buffered to len(urls): every producer can send
    // its result without blocking, even the losers.
    results := make(chan string, len(urls))

    for _, u := range urls {
        go func(u string) {
            r, err := fetch(ctx, u)
            if err != nil {
                return
            }
            select {
            case results <- r:
            case <-ctx.Done(): // we already have a winner
            }
        }(u)
    }

    select {
    case r := <-results:
        return r, nil // cancel() in defer stops the rest
    case <-ctx.Done():
        return "", ctx.Err()
    }
}

Two safety nets, and you need both.

The results channel is buffered to len(urls). Even if every
producer finishes at the same instant, each one has a slot, so no
send blocks waiting for a receiver that already left with the
winner.

Inside each producer the send is itself a select against
ctx.Done(). The buffer handles the common case; the
ctx.Done() arm is the belt-and-suspenders for the moment a
producer is slow enough that even the buffered send could be the
thing that hangs if the design changes later. When first returns,
the deferred cancel() closes the context, fetch returns early
for every loser, and the producers that were still in flight see
ctx.Done() and exit.

fetch has to respect the context for the early-exit to work:

func fetch(ctx context.Context, url string) (string, error) {
    req, err := http.NewRequestWithContext(
        ctx, http.MethodGet, url, nil)
    if err != nil {
        return "", err
    }
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    b, err := io.ReadAll(resp.Body)
    return string(b), err
}

http.NewRequestWithContext ties the request to ctx. When
cancel() runs, the in-flight HTTP call returns with a cancellation
error, the producer returns, the goroutine is gone. No leak even
when nine out of ten fetches were still running.

The checklist

Three patterns, one shared idea: a select timeout protects the
reader, and you have to protect the writer separately. Run these
checks on the channel code you already have.

Any channel a goroutine sends a one-shot result on, where a select might walk away — give it a buffer of 1.
Any producer doing real work — thread a context and have it watch ctx.Done(), so the work stops instead of finishing into the void.
Any time.After inside a for loop — replace it with an owned time.Timer you Reset, so timers don't pile in the heap.
Any fan-in of multiple producers — buffer the result channel to the producer count, guard each send with ctx.Done(), and let a deferred cancel() stop the losers.

A goroutine that can't send and can't see a cancel is a goroutine
that never returns. Give every producer one of those two exits and
the leaks close.

If this was useful

Channels and select look small until you trace a goroutine leak
back through a timeout that only ever protected one side of the
conversation. The Complete Guide to Go Programming works through
channels, select, timers, and the scheduler from the ground up,
including how the runtime parks and wakes goroutines on a blocked
send. Hexagonal Architecture in Go shows where to put this kind
of concurrency so it stays testable instead of scattered across
handlers.