Context Engineering for Enterprise AI, Part 2: The Memory Layer That Makes Agents Useful

#ai #llm #dotnet #python

published on PrepStack.*

Your AI agent forgets everything the moment a request ends. That's not a model limitation — it's a missing memory layer. This is Part 2 of my Context Engineering series.

The reframe

The model is a stateless function. Memory is an enterprise database with embeddings bolted on — owned, scoped, audited, and forgettable. Not a chat transcript you keep pasting back into the prompt.

The architecture

Built across ASP.NET Core (system-of-record + governance on Azure SQL) and a Python FastAPI service (embeddings + semantic recall on Azure AI Search):

Tiered memory — working → short-term (Redis + SQL) → long-term episodic + semantic (vector index)
A salience-gated write policy — store what matters, not every turn (long-term writes cut ~85%)
Retrieval blends similarity + recency, packed into the Part 1 token budget
Tenant + user as hard query filters — never prompt instructions
Right-to-be-forgotten that fans out to SQL + vectors + cache in under 5 minutes (GDPR Art. 17)

The results

Metric	Outcome
Context tokens at turn 30	~3,500 (vs ~14,000 history-stuffing)
Cost per query	$0.021 → $0.008
Cross-tenant leaks (red-team)	0
Memory retrieval p95	74 ms
Cross-session continuity	resumes the prior thread, no re-explaining