If you’ve used AI for anything more than quick lookups, you’ve noticed it’s inconsistent. Sometimes sharp, sometimes generic, sometimes wrong in ways that make you wonder if it understood a single thing you said.
Most people blame the prompt. Or themselves. Or the tool. I did too, for months.
The actual problem is more interesting than any of those – and once you see it, you can’t unsee it.
Your words aren’t the only thing in the room
When you type a message to Claude or ChatGPT, you’re looking at a text box. That’s what you see. What the AI sees is completely different.
Every AI conversation runs inside something called a context window – a fixed block of working memory. Everything goes in there. Your messages, the AI’s responses, any documents you’ve shared, hidden system instructions, tool definitions, memory retrieved from past conversations. All of it, stacked on top of each other, competing for the same limited space.
You’re writing in what feels like an empty room. The AI is reading in a crowded one.
That distinction matters more than almost anything else about how AI works. Because you’re not just competing with your own earlier messages for the AI’s attention. You’re competing with everything else that got loaded in before you even started typing.
The attention problem
Here’s the part most people don’t know.
The context window has a hard limit – somewhere between 100,000 and a million tokens depending on the model. A token is roughly three-quarters of a word. So that’s a lot of space. Should be fine, right?
It isn’t. Because the AI doesn’t use that space evenly.
Research from Chroma in mid-2025 tested 18 different models and found a consistent pattern: performance degrades as the window fills up. Not at the end. Much earlier. Once roughly 40% of the window is in use, things start getting worse. The AI pays most attention to what’s at the beginning and end. The middle gets lost.
They called it context rot.
So your 30-minute conversation technically fits in the window. But the AI stopped paying equal attention to all of it a while ago. The nuance you established in minute twelve? It’s in the middle. It might as well not be there.
That’s why you get the repetition. The generic responses. The feeling that the AI forgot what you told it. It didn’t forget – it’s just not looking at that part anymore.
What’s actually competing for space
This is where it gets practical.
If your words aren’t the only thing in the context window, what else is in there? Roughly eight categories:
System instructions – how the AI should behave. Its role, constraints, formatting rules. On platforms like Claude, this can be substantial. You never see it, but it’s taking up space.
Your conversation history – every message you’ve sent and every response the AI generated. This is usually the biggest consumer. A 30-minute back-and-forth generates a lot of tokens.
Retrieved documents – anything you’ve shared or that the AI pulled in. Upload a PDF or paste a long document and you’ve just consumed a big chunk of the window.
Tool definitions – if the AI has access to web search, code execution, file handling, or other tools, each one comes with a description of what it does and how to call it. More tools, more tokens, less room for your actual work.
Memory – facts the AI retrieved about you from past conversations. Your name, your preferences, things you’ve discussed before.
Examples – demonstrations of desired behaviour. These are powerful but expensive in tokens.
State – where you are in a multi-step process. What’s done, what’s pending.
Every one of these is eating into the same attention budget. And none of them are optional – the AI needs most of them to function. So the space available for your actual conversation is smaller than you think, and it’s getting smaller with every message.
Why “just add more context” backfires
There’s a natural instinct when AI output disappoints: give it more information. Share more documents. Write longer prompts. Paste in more background.
This often makes things worse.
Irrelevant context doesn’t just waste space. It actively dilutes attention. The Chroma research found that adding information the AI doesn’t need for the current task increases errors – not because it confuses the model, but because it spreads attention thinner across more material.
Every token should be load-bearing. If it’s not contributing to the task at hand, it’s actively hurting.
That’s a different way of thinking about AI work. Most people add context hoping some of it helps. The better approach is to curate – only include what earns its place, and remove everything that doesn’t.
What this means for how you work
Three things follow from understanding context.
Position matters. Critical information should go at the start or end of your input, not buried in the middle. The middle is where things get lost. If you’re giving the AI instructions and reference material, put the instructions first and last.
Shorter is usually better. Not because brevity is a virtue, but because attention is finite. A focused conversation with clean handoffs between sessions beats a marathon conversation where the AI’s attention degrades steadily. I start fresh sooner than most people think is necessary – and the output is better for it.
Phase your work. Don’t load everything at once. If you’re doing research and then writing, the writing session doesn’t need all the research context. Only load what’s relevant to the current task. What was useful in the last phase might be noise in this one.
None of this is difficult. It’s a shift in how you think about what the AI is working with. Instead of “how do I write a better prompt?” the question becomes “what does the AI need to see right now – and what should it not see?”
The compounding problem
Here’s one more thing that catches people out.
When conversations run very long in Claude, you’ll sometimes see a message: “this conversation has been compacted to free up space.” That’s the AI automatically summarising earlier parts of the conversation to make room.
It’s lossy. Details get dropped. Nuance gets flattened. Decisions you made together get compressed into summaries that miss the reasoning behind them. The conversation has quietly rewritten its own history, and you might not notice until the AI does something that contradicts what you agreed on an hour ago.
When you see that message, the conversation is past its useful life. The better approach is to manage transitions yourself – before the AI does it for you.
That’s where context capsules come in. I’ve written a companion post on the practical technique: how to package what matters from a conversation so the next session starts informed instead of blank.
The bigger point
Context isn’t a technical detail. It’s the operating environment for everything the AI does. The quality of your output is bounded by the quality of your context – not your prompt, not the model, not how cleverly you phrase things.
Most people are optimising the wrong layer. They’re refining their prompts when they should be managing their context. The prompt matters. But it’s one component in a much larger information architecture that the AI is processing all at once.
I think that’s the shift that separates people who get consistent results from AI from people who get lucky sometimes. It’s not about asking better questions. It’s about understanding what the AI is actually working with – and being deliberate about it.
I’m still learning this myself. But it’s changed how I work more than any prompting technique I’ve tried.
Want the practical technique? Context Capsules: How to Transfer AI Context Between Chats covers the method I use daily.

