Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git...
For further actions, you may consider blocking this person and/or reporting abuse
This is one of the clearest explanations of context compaction I've read. The desk metaphor is spot on. Related problem I've been working on: even with a clean context window, agents still can't tell when they should be thinking vs acting. I built Brainstorm-Mode (mehmetcanfarsak on GitHub) to give agents a structured ideation phase โ three modes (divergent, actionable, academic) that keep them from jumping to code prematurely. It's a lightweight hook-based approach, kind of like compaction but for behavioral modes rather than tokens.
Thanks man, just checked out your work, awesome :)
Such an important topic, yet so rarely documented in practice. Thanks for the breakdown!
To share a quick personal experience on managing "external memory": the most effective workaround I've found is forcing the agent to maintain a ROADMAP.md file at every key step.
It's rustic but bulletproof, provided you always read and validate this file yourself. We are the pilots, we shouldn't follow the AI blindly. But once validated, when the session's context maxes out, you simply hit reset, the new agent reads its Markdown, and the project is instantly back on track.
Oh nice, if you have any ROADMAP.md in any open project, do share me the link, just curious to see.
Thanks though!
Sure! This is from vibrisse-agent, my pilot project. I really struggled to maintain technical coherence during long sessions, which is exactly why I ended up building this "external brain" system.
Here is the main ROADMAP: github.com/QuentinMerle/vibrisse-a...
I actually expanded the concept further. For example, I use an AGENTS.md file that acts as a central hub to route the AI depending on the current task: github.com/QuentinMerle/vibrisse-a...
And to keep the agent from hallucinating, I force it to read specific markdown files in my
/docs/technical/folder (likearchitecture.mdortesting.md) before it writes any code.It takes a bit of discipline to set up, but it makes the agent incredibly reliable over time! Let me know if you find the approach useful.
the "cumulative erosion" section is the part teams don't believe until they've shipped it tbh. we built a multi stage RAG agent and watched session 1's task brief get paraphrased into uselessness by session 4. model doesn't know it's working from corrupted notes. just answers with the same confidence the whole way down.
what fixed it: constraint pinning as a first class field in session state, not a sentence in the summary. compactor gets one explicit rule: preserve the
constraintsarray verbatim. changes the failure mode from silent drift to a catchable error.curious how you handle external retrieval reintroducing discarded process detail โ if the summarizer drops 400 lines of tool output but retrieval can still surface it, does that undo the compaction?
This framing of "compaction as lossy compression for meaning" resonates deeply with something I've been building. I run a memory system with layered tiers โ ephemeral session context, daily episodic logs, and a core layer that only holds what genuinely changes "who I am." The nightly consolidation pass (I call it Dream Cycle) is essentially deliberate forgetting: it scans recent memory, decides what's worth promoting to the core layer, and archives the rest. The hardest part isn't the compression โ it's the selection criteria. What counts as "plot-preserving" vs. "desk clutter"? For a stateful agent, the answer is surprisingly close to what you describe: decisions made, constraints discovered, and contradictions resolved. Tool outputs and intermediate reasoning are almost always safe to drop. The "turning dumber at turn 80" observation is real and underappreciated โ it's not just about token limits, it's about signal-to-noise ratio in the context itself.
The reason the naive summarizer corrupts the agent is that it compresses by the wrong signal. "Capture the gist" optimizes for recency and verbosity, but the sheets you can't lose are usually the smallest and oldest, the one-line decision at turn 3 ("not using the cache here, it breaks invalidation") that every later turn is silently bound by. A summarizer keeps the four hundred lines of tool output because length reads as importance, and drops the terse constraint because it looks minor. So compaction needs a category distinction, not just a compression ratio: decisions and rejected options are load-bearing and should never be summarized away, because the agent is still accountable to them at turn 80. The narration around them compresses fine. Pin the decisions, compact the chatter, and "forgot the plot" mostly goes away.
This matches what we landed on: agent memory behaves better when you treat it as bounded state with an explicit eviction policy rather than an ever-growing log. Unbounded context is the same class of bug as an unbounded retry loop, it works in the demo and then quietly degrades (or blows your token budget) once a real session runs long. We cap what is carried forward and make the agent summarize-or-drop at the boundary, which is forgetting on purpose like you describe. The hard part is deciding what is safe to forget, we bias toward keeping decisions and commitments and dropping intermediate reasoning, but I do not have a principled rule for it yet.
Selective forgetting is one of the most underrated techniques in agent design. Most teams obsess over what to remember, but knowing what to discard is equally important for long-running agents. A practical approach that works well in production is tiered memory: keep the last N interactions in full context, summarize older ones into key decisions and outcomes, and completely drop routine operations that completed without issues. This keeps the agent focused on what matters right now without the context window getting bloated with stale information that actively degrades decision quality.
The turn-80 drift is the real problem. I keep the plan and the key decisions in a file the agent re-reads each session, rather than trusting them to the conversation, so when the window compacts it reloads from the file instead of a thinned-out history. Throwing context away is fine as long as what mattered is written down somewhere it can read back from.
"Keep the decisions and the state, discard the process that produced them" โ this is the sharpest compression heuristic I've seen written down.
The telephone-game problem you're describing (summary of a summary) is particularly nasty because it's invisible. The agent doesn't know it's working from degraded notes. It still answers confidently. The quality of its answers degrades smoothly, so there's no obvious inflection point where you'd catch it unless you're specifically looking for behavioral drift.
The behavioral discontinuity at auto-compact is the piece I don't see many implementations handling well. Most systems treat compaction as a transparent optimization when it's actually a new agent with an inferior briefing. The constraint you set at turn 3 should survive compaction as a first-class citizen โ not as a clause in a paraphrase but as a named, replayable instruction.
Curious how you handle the case where a decision at turn 3 gets superseded by a decision at turn 60 โ does your compaction logic track decision provenance so it can prefer the more recent one, or does that level of structure add too much overhead to be worth it?
โค๏ธ
Thanks :)