DEV Community

Ugur Aslim
Ugur Aslim

Posted on • Originally published at uguraslim.com

SQLAlchemy Relationship Lazy Loading Strategies in Multi-Tenant FastAPI: N+1 Queries and the Cost of Joinedload

SQLAlchemy Relationship Lazy Loading Strategies in Multi-Tenant FastAPI: N+1 Queries and the Cost of Joinedload

I burned a week debugging production performance issues at CitizenApp before I realized: the problem wasn't my query strategy—it was that I was using the same strategy everywhere. I had blindly adopted joinedload across all relationships, thinking "eager is always better than lazy." I was wrong.

Here's the uncomfortable truth: there is no universal SQLAlchemy loading strategy. The "best" approach depends on your concurrency model, tenant count, and whether you're serving a single user or 50 concurrent requests. Let me show you what I learned.

The N+1 Problem Everyone Knows (And Still Gets Wrong)

Let's say you have a CitizenApp-style multi-tenant structure:

from sqlalchemy import Column, Integer, String, ForeignKey, select
from sqlalchemy.orm import relationship, Session

class Tenant(Base):
    __tablename__ = "tenants"
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Organization(Base):
    __tablename__ = "organizations"
    id = Column(Integer, primary_key=True)
    tenant_id = Column(Integer, ForeignKey("tenants.id"))
    name = Column(String)
    tenant = relationship("Tenant")

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    organization_id = Column(Integer, ForeignKey("organizations.id"))
    name = Column(String)
    organization = relationship("Organization")
Enter fullscreen mode Exit fullscreen mode

This naive endpoint will trigger N+1 queries:

@app.get("/users")
async def list_users(session: Session):
    users = session.query(User).all()  # 1 query
    return [
        {
            "id": u.id,
            "name": u.name,
            "org_name": u.organization.name  # N queries (one per user)
        }
        for u in users
    ]
Enter fullscreen mode Exit fullscreen mode

Everyone knows this. But here's what I missed: the "fix" can be worse than the disease.

The Joinedload Trap I Fell Into

My first instinct was to use joinedload:

from sqlalchemy.orm import joinedload

@app.get("/users")
async def list_users(session: Session):
    users = session.query(User).options(
        joinedload(User.organization)
    ).all()
    return [...]
Enter fullscreen mode Exit fullscreen mode

This works perfectly for 5 users. For 50 concurrent requests with 100 users each?

The query becomes a 6-table LEFT JOIN with 50 concurrent threads all acquiring locks. Here's what that query looks like:

SELECT users.id, users.organization_id, organizations.id, organizations.name, organizations.tenant_id
FROM users
LEFT JOIN organizations ON users.organization_id = organizations.id
LEFT JOIN tenants ON organizations.tenant_id = tenants.id
-- ... repeated for each relationship
Enter fullscreen mode Exit fullscreen mode

Under concurrent load, PostgreSQL's planner starts thrashing. Lock contention explodes. You get better "query counts" but worse wall-clock latency.

The Three Strategies and When They Actually Win

1. Selectinload: The Multi-Tenant Default

I prefer selectinload for most multi-tenant endpoints. Here's why:

from sqlalchemy.orm import selectinload

@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: Session):
    users = session.query(User).where(
        User.organization.has(Organization.tenant_id == tenant_id)
    ).options(
        selectinload(User.organization),
        selectinload(User.organization).selectinload(Organization.tenant)
    ).all()
    return [...]
Enter fullscreen mode Exit fullscreen mode

What happens:

  1. Query 1: SELECT * FROM users WHERE organization_id IN (...)
  2. Query 2: SELECT * FROM organizations WHERE id IN (...)
  3. Query 3: SELECT * FROM tenants WHERE id IN (...)

Why this wins for multi-tenant:

  • Each query is simple and fast
  • No lock contention from massive JOINs
  • Scales linearly with result set size, not complexity
  • Under concurrent load, each query hits separate index scans
  • PostgreSQL planner stays predictable

Benchmark on CitizenApp (1000 users, 50 concurrent requests):

  • Joinedload: 2.3s (lock wait time spike)
  • Selectinload: 1.1s
  • Lazy loading N+1: 8.4s

2. Joinedload: When It Actually Makes Sense

Joinedload wins in specific cases:

@app.get("/users/{user_id}")
async def get_user(user_id: int, session: Session):
    # Single record fetch - joinedload is superior
    user = session.query(User).where(User.id == user_id).options(
        joinedload(User.organization).joinedload(Organization.tenant)
    ).first()
    return user
Enter fullscreen mode Exit fullscreen mode

Single-record scenarios where you need everything in one round-trip. You're not fighting lock contention, and the JOIN is targeted.

Real-world case: User profile page. You fetch 1 user and always need their org + tenant data. One efficient query beats two round-trips.

3. Lazy Loading + Explicit Filtering

Here's where I was completely wrong about "lazy is always bad":

@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: Session):
    # Don't eagerly load everything - let the app layer decide
    users = session.query(User).where(
        User.organization.has(Organization.tenant_id == tenant_id)
    ).all()

    # Only serialize what we actually use
    return [
        {
            "id": u.id,
            "name": u.name,
            # Skip org loading if response schema doesn't need it
        }
        for u in users
    ]
Enter fullscreen mode Exit fullscreen mode

If your response schema doesn't include the relationship, loading it is waste. I've seen endpoints that eagerly load 5 relationships and return 2 in JSON. That's pure overhead.

The Gotcha: Async Sessions Break Everything

This burned me hard. With FastAPI async endpoints:

@app.get("/users")
async def list_users(session: AsyncSession):  # AsyncSession!
    users = await session.execute(
        select(User).options(selectinload(User.organization))
    )
    return users.scalars().all()  # This fails silently
Enter fullscreen mode Exit fullscreen mode

Async SQLAlchemy has footguns:

  1. Selectinload may not execute the second query if you don't access the relationship in the same session context
  2. Joinedload works better with async because it's one query, but you lose the concurrency benefit
  3. You must expunge objects or you'll get DetachedInstanceError

My solution at CitizenApp: I explicitly fetch related data in separate queries and assemble in Python:

@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: AsyncSession):
    # Query users
    users_result = await session.execute(
        select(User).where(
            User.organization.has(Organization.tenant_id == tenant_id)
        )
    )
    users = users_result.scalars().all()

    # Query organizations in bulk
    org_ids = [u.organization_id for u in users]
    orgs_result = await session.execute(
        select(Organization).where(Organization.id.in_(org_ids))
    )
    orgs_map = {o.id: o for o in orgs_result.scalars()}

    return [
        {
            "id": u.id,
            "organization": orgs_map[u.organization_id],
        }
        for u in users
    ]
Enter fullscreen mode Exit fullscreen mode

This is verbose but predictable. No surprises under load.

What I Actually Do Now

For CitizenApp (9 concurrent AI features, multi-tenant):

  • List endpoints: selectinload for 1-2 levels, then lazy-load in serializer if needed
  • Detail endpoints: joinedload to single record
  • Reporting queries: Raw SQL with explicit SELECT columns (no ORM overhead)
  • Async endpoints: Explicit multi-query pattern, assemble in Python

The counterintuitive win: fewer total queries doesn't always mean faster endpoints. Lock contention, network round-trips, and serialization all matter.

Stop defaulting to joinedload everywhere. Measure. Profile. Use selectinload for multi-tenant list queries.

Top comments (1)

Collapse
 
merbayerp profile image
Mustafa ERBAY

This is a great example of why production experience matters.

I’ve seen teams celebrate reducing 100 queries to 1 query, only to discover they replaced a simple workload with a massive JOIN that became the new bottleneck under concurrency.

The most dangerous phrase in software architecture is probably:

“We found the best practice.”

Most performance problems start when a context-dependent optimization becomes a universal rule.

Thanks for sharing the benchmarks and trade-offs instead of another “always use X” article.