published on PrepStack.*
Your AI agent forgets everything the moment a request ends. That's not a model limitation — it's a missing memory layer. This is Part 2 of my Context Engineering series.
The reframe
The model is a stateless function. Memory is an enterprise database with embeddings bolted on — owned, scoped, audited, and forgettable. Not a chat transcript you keep pasting back into the prompt.
The architecture
Built across ASP.NET Core (system-of-record + governance on Azure SQL) and a Python FastAPI service (embeddings + semantic recall on Azure AI Search):
- Tiered memory — working → short-term (Redis + SQL) → long-term episodic + semantic (vector index)
- A salience-gated write policy — store what matters, not every turn (long-term writes cut ~85%)
- Retrieval blends similarity + recency, packed into the Part 1 token budget
- Tenant + user as hard query filters — never prompt instructions
- Right-to-be-forgotten that fans out to SQL + vectors + cache in under 5 minutes (GDPR Art. 17)
The results
| Metric | Outcome |
|---|---|
| Context tokens at turn 30 | ~3,500 (vs ~14,000 history-stuffing) |
| Cost per query | $0.021 → $0.008 |
| Cross-tenant leaks (red-team) | 0 |
| Memory retrieval p95 | 74 ms |
| Cross-session continuity | resumes the prior thread, no re-explaining |
The model is stateless. Memory is infrastructure you own — scoped, audited, and forgettable.
Read the full breakdown — with all the C# and Python code — on PrepStack:
https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-2-memory-layer
Top comments (0)