The Context Window: an LLM's Short-Term Memory, Explained

#ai #beginners #llm #machinelearning

A chatbot feels like it remembers you. It doesn't — it's stateless. Everything it "knows" is just text resent each call, up to a fixed limit: the context window. When the box fills, the oldest messages fall off the edge and are genuinely gone.

🪟 Watch tokens fall off: https://dev48v.infy.uk/ai/days/day8-context-window.html

The model is stateless

reply = model(allMessagesSoFar);  // the app resends the whole history every turn

"Memory" is just text you keep pasting back in.

The window is a hard token limit

Prompt + conversation + pasted docs + the reply must all fit inside a fixed number of tokens. When the chat grows past it, the oldest messages get dropped — in the demo, faded messages have scrolled OUT and the model literally can't see them. Ask about something dropped and it truly has no idea.

It's also the cost meter

You're billed per token in the window, every call. Pasting a whole book each turn is slow and expensive — so you don't just CAN'T fit unlimited text, you don't WANT to.

"Lost in the middle"

Even within the limit, models attend best to the START and END; facts buried in the middle of a huge context can be overlooked. Bigger isn't automatically better.

Managing it is the skill

Summarise old turns + keep recent ones verbatim + use RAG to fetch only the relevant chunks instead of pasting everything. Understanding the window explains chatbot "amnesia" and most prompt-engineering tactics.

Fill the box.