Your AI agent remembered the user's name.
Then it forgot what it was doing.
Here's the setup:
User asks the agent: book the cheapest flight to NYC, search hotels under $150/night, then compare total trip cost.
By step 3, the agent calls the LLM with 8,000 tokens of raw conversation history — and still answers as if it's turn 1.
You need a memory architecture before this ships. Which one do you pick?
A) In-context window only — full conversation stays in the system prompt. Simple. Breaks at ~15 turns or 8K tokens, whichever comes first.
B) Vector memory store — embed past turns, retrieve the top-k by semantic similarity at query time. Works great until "NYC flight" pulls a memory about a past NYC trip instead of the current task.
C) Episodic memory with summarization — compress old turns into structured event summaries, inject the relevant ones per request. More complex to build. Much harder to confuse.
D) Redis session state — structured key-value store, explicit agent reads/writes. Deterministic. Requires the agent to know what to store and when.
One of these collapses past 15 turns. One retrieves the wrong context at exactly the wrong moment. One is the right answer for task-oriented agents.
Pick A, B, C, or D — and tell me where you've hit this in production. Full breakdown in the comments.
Top comments (5)
C — Episodic memory with summarization
Right answer for task-oriented agents. The pattern:
→ Keep the last N turns in context (short-term)
→ Compress older turns into structured event summaries (episodic)
→ Inject only relevant summaries per new request
It's how production agents at Anthropic, LangChain Memory v2, and OpenAI Assistants API all converge. More engineering upfront — but the only pattern that degrades gracefully at scale.
A — In-context window only
Collapses fast. At ~15 turns or 8K tokens you're either truncating early context or hitting the model's limit. Works for demos. Fails in production agents with multi-step tasks. The "just increase context length" argument ignores cost and the lost-in-the-middle problem — LLMs reliably ignore content buried in the middle of long prompts.
B — Vector memory store
The retrieval problem is nastier than it looks. Semantic similarity isn't the same as task relevance. In a multi-step booking task, embedding "find hotels under $150" might retrieve a memory from a different user session — or a past conversation about a different city entirely. Great for knowledge retrieval. Unreliable for procedural task state.
D — Redis session state
Solid and deterministic. The catch: the agent has to explicitly decide what to write to state and when to read it. That's a non-trivial design problem — you're building a working memory protocol on top of your tool calls. Most teams underestimate the schema design this requires.