Ugur Aslim

Posted on Jun 15 • Originally published at uguraslim.com

SQLAlchemy Relationship Lazy Loading Strategies in Multi-Tenant FastAPI: N+1 Queries and the Cost of Joinedload

#sqlalchemy #fastapi #performance #postgres

SQLAlchemy Relationship Lazy Loading Strategies in Multi-Tenant FastAPI: N+1 Queries and the Cost of Joinedload

I burned a week debugging production performance issues at CitizenApp before I realized: the problem wasn't my query strategy—it was that I was using the same strategy everywhere. I had blindly adopted joinedload across all relationships, thinking "eager is always better than lazy." I was wrong.

Here's the uncomfortable truth: there is no universal SQLAlchemy loading strategy. The "best" approach depends on your concurrency model, tenant count, and whether you're serving a single user or 50 concurrent requests. Let me show you what I learned.

The N+1 Problem Everyone Knows (And Still Gets Wrong)

Let's say you have a CitizenApp-style multi-tenant structure:

from sqlalchemy import Column, Integer, String, ForeignKey, select
from sqlalchemy.orm import relationship, Session

class Tenant(Base):
    __tablename__ = "tenants"
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Organization(Base):
    __tablename__ = "organizations"
    id = Column(Integer, primary_key=True)
    tenant_id = Column(Integer, ForeignKey("tenants.id"))
    name = Column(String)
    tenant = relationship("Tenant")

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    organization_id = Column(Integer, ForeignKey("organizations.id"))
    name = Column(String)
    organization = relationship("Organization")

This naive endpoint will trigger N+1 queries:

@app.get("/users")
async def list_users(session: Session):
    users = session.query(User).all()  # 1 query
    return [
        {
            "id": u.id,
            "name": u.name,
            "org_name": u.organization.name  # N queries (one per user)
        }
        for u in users
    ]

Everyone knows this. But here's what I missed: the "fix" can be worse than the disease.

The Joinedload Trap I Fell Into

My first instinct was to use joinedload:

from sqlalchemy.orm import joinedload

@app.get("/users")
async def list_users(session: Session):
    users = session.query(User).options(
        joinedload(User.organization)
    ).all()
    return [...]

This works perfectly for 5 users. For 50 concurrent requests with 100 users each?

The query becomes a 6-table LEFT JOIN with 50 concurrent threads all acquiring locks. Here's what that query looks like:

SELECT users.id, users.organization_id, organizations.id, organizations.name, organizations.tenant_id
FROM users
LEFT JOIN organizations ON users.organization_id = organizations.id
LEFT JOIN tenants ON organizations.tenant_id = tenants.id
-- ... repeated for each relationship

Under concurrent load, PostgreSQL's planner starts thrashing. Lock contention explodes. You get better "query counts" but worse wall-clock latency.

The Three Strategies and When They Actually Win

1. Selectinload: The Multi-Tenant Default

I prefer selectinload for most multi-tenant endpoints. Here's why:

from sqlalchemy.orm import selectinload

@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: Session):
    users = session.query(User).where(
        User.organization.has(Organization.tenant_id == tenant_id)
    ).options(
        selectinload(User.organization),
        selectinload(User.organization).selectinload(Organization.tenant)
    ).all()
    return [...]

What happens:

Query 1: SELECT * FROM users WHERE organization_id IN (...)
Query 2: SELECT * FROM organizations WHERE id IN (...)
Query 3: SELECT * FROM tenants WHERE id IN (...)

Why this wins for multi-tenant:

Each query is simple and fast
No lock contention from massive JOINs
Scales linearly with result set size, not complexity
Under concurrent load, each query hits separate index scans
PostgreSQL planner stays predictable

Benchmark on CitizenApp (1000 users, 50 concurrent requests):

Joinedload: 2.3s (lock wait time spike)
Selectinload: 1.1s
Lazy loading N+1: 8.4s

2. Joinedload: When It Actually Makes Sense

Joinedload wins in specific cases:

@app.get("/users/{user_id}")
async def get_user(user_id: int, session: Session):
    # Single record fetch - joinedload is superior
    user = session.query(User).where(User.id == user_id).options(
        joinedload(User.organization).joinedload(Organization.tenant)
    ).first()
    return user

Single-record scenarios where you need everything in one round-trip. You're not fighting lock contention, and the JOIN is targeted.

Real-world case: User profile page. You fetch 1 user and always need their org + tenant data. One efficient query beats two round-trips.

3. Lazy Loading + Explicit Filtering

Here's where I was completely wrong about "lazy is always bad":

@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: Session):
    # Don't eagerly load everything - let the app layer decide
    users = session.query(User).where(
        User.organization.has(Organization.tenant_id == tenant_id)
    ).all()

    # Only serialize what we actually use
    return [
        {
            "id": u.id,
            "name": u.name,
            # Skip org loading if response schema doesn't need it
        }
        for u in users
    ]

If your response schema doesn't include the relationship, loading it is waste. I've seen endpoints that eagerly load 5 relationships and return 2 in JSON. That's pure overhead.

The Gotcha: Async Sessions Break Everything

This burned me hard. With FastAPI async endpoints:

@app.get("/users")
async def list_users(session: AsyncSession):  # AsyncSession!
    users = await session.execute(
        select(User).options(selectinload(User.organization))
    )
    return users.scalars().all()  # This fails silently

Async SQLAlchemy has footguns:

Selectinload may not execute the second query if you don't access the relationship in the same session context
Joinedload works better with async because it's one query, but you lose the concurrency benefit
You must expunge objects or you'll get DetachedInstanceError

My solution at CitizenApp: I explicitly fetch related data in separate queries and assemble in Python:

@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: AsyncSession):
    # Query users
    users_result = await session.execute(
        select(User).where(
            User.organization.has(Organization.tenant_id == tenant_id)
        )
    )
    users = users_result.scalars().all()

    # Query organizations in bulk
    org_ids = [u.organization_id for u in users]
    orgs_result = await session.execute(
        select(Organization).where(Organization.id.in_(org_ids))
    )
    orgs_map = {o.id: o for o in orgs_result.scalars()}

    return [
        {
            "id": u.id,
            "organization": orgs_map[u.organization_id],
        }
        for u in users
    ]

This is verbose but predictable. No surprises under load.

What I Actually Do Now

For CitizenApp (9 concurrent AI features, multi-tenant):

List endpoints: selectinload for 1-2 levels, then lazy-load in serializer if needed
Detail endpoints: joinedload to single record
Reporting queries: Raw SQL with explicit SELECT columns (no ORM overhead)
Async endpoints: Explicit multi-query pattern, assemble in Python

The counterintuitive win: fewer total queries doesn't always mean faster endpoints. Lock contention, network round-trips, and serialization all matter.

Stop defaulting to joinedload everywhere. Measure. Profile. Use selectinload for multi-tenant list queries.

Top comments (1)

Mustafa ERBAY • Jun 15

This is a great example of why production experience matters.

I’ve seen teams celebrate reducing 100 queries to 1 query, only to discover they replaced a simple workload with a massive JOIN that became the new bottleneck under concurrency.

The most dangerous phrase in software architecture is probably:

“We found the best practice.”

Most performance problems start when a context-dependent optimization becomes a universal rule.

Thanks for sharing the benchmarks and trade-offs instead of another “always use X” article.