In third grade I had to write a how-to for making a peanut butter and jelly sandwich. I thought I'd nailed it. Four steps:
- Get the bread.
- Get the peanut butter and the jelly.
- Spread the peanut butter on one slice, the jelly on the other.
- Put them together.
My teacher read it, looked up, and asked one question:
Where did the knife come from?
I had the vision. I'd skipped the environment. The knife was real in my head (I'd seen it in the drawer that morning) so I assumed it was real on the page. It wasn't.
three phases of getting this wrong with LLMs
When I started taking prompt engineering seriously last year, I realized I was repeating the third-grade mistake. The arc was familiar enough that I think most people walk it.
Phase one: the one-liner. Single sentence. Expect the model to read your mind. When it fails, fight with it in the next turn instead of fixing the prompt.
Phase two: the notebook. Start saving prompts that worked. Notice that consistency matters. Notice that some prompts are doing work the model can't actually do without more setup around it.
Phase three: the environment. Realize the prompt isn't an instruction. It's a room. The model can only use what's in the room. If the knife isn't in the room, the sandwich doesn't get made, no matter how clearly you described the spreading motion.
what I write now
Three things stacked on top of each other:
- Context: where the ingredients live. What the model has access to.
- Constraints: how to use the tools, and how not to.
- Acceptance criteria: what "finished sandwich" actually looks like, in enough detail that the model can self-check.
There's no magic word. There's no clever phrasing trick. Prompt engineering is the same skill as writing a good bug report or a clear design doc: assume the reader doesn't have your context, then put the context in the document.
Top comments (1)
Strong framing on the “prompt as environment” model. The knife analogy maps well to real agent failures in production systems—missing context is usually the root cause, not model capability.
I’ve been working on agentic workflows where this exact problem shows up in orchestration layers: tools, memory, and constraints often aren’t explicitly surfaced to the model, so execution degrades in predictable ways.
There’s a meaningful overlap here in approach. Your breakdown of context / constraints / acceptance criteria aligns closely with how we structure agent runtime environments for multi-step task execution.
Would be interesting to collaborate on this direction—specifically around:
designing a shared “agent environment spec” so prompts are no longer isolated instructions
building a reusable structure for tool + memory + constraint injection
stress-testing these phases against real agent workflows (not just prompt experiments)
If you’re open to it, I’d be interested in exploring a joint prototype around structured agent environments and failure-mode reduction in production LLM systems.