Most Agent Failures Aren’t Model Failures. They’re Context Failures.

When AI agents break, the instinct is to blame the model. But Anthropic, Manus, and every practitioner building at scale are finding the same thing: the failures are in the context, not the capability.

Every time an AI agent breaks, the same conversation happens. The model must not be smart enough. It hallucinated. It chose the wrong tool. It forgot what we told it.

The model is almost never the problem.

I build with Claude Code most days – scaffolding websites, running multi-step workflows, iterating on systems that span dozens of files. When something goes wrong, and it does go wrong, my first instinct used to be the same as everyone else’s. Blame the model. Wonder if I should switch to a different one. Wait for the next release.

Then I started paying attention to what was going wrong. Not the symptom – the structure.

The agent didn’t forget my instructions. My instructions weren’t in the context when it needed them. The agent didn’t pick the wrong tool. It had thirty tools loaded and couldn’t distinguish which three mattered for this step. The agent didn’t hallucinate a file path. The file tree had been compressed out of the context window two turns ago.

Same model. Same capability. Different context, different result.

The pattern everyone’s finding

This isn’t just my observation. The teams building the most capable agent systems in the world are arriving at the same conclusion.

Anthropic’s engineering team published their context engineering guide after years of building Claude Code. Their framing is direct: building with language models is becoming less about finding the right words for your prompts and more about answering the question of what configuration of context is most likely to produce the desired behaviour. Not smarter models. Better context.

Manus – now owned by Meta after a multi-billion dollar acquisition in late 2025 – rebuilt their entire agent framework four times. Not because the models changed. Because they kept discovering better ways to shape context. Their founder describes the process as experimental science, not engineering in any traditional sense. They tried fine-tuning early on and abandoned it. The lesson was stark: context engineering let them ship improvements in hours. Fine-tuning took weeks – and the improvements became irrelevant when the next model dropped.

Different teams, same conclusion. Anthropic, Manus, LangChain, independent practitioners – all landing on the same insight. The model is the engine. Context is the steering.

What “context failure” actually looks like

If you’ve only used AI through a chat interface, context failures are the conversation going weird. The AI repeats itself, misses something obvious, gives generic answers. Annoying but recoverable – you just start a new chat.

In agent systems, context failures are structural. The agent has autonomy. It’s making decisions, calling tools, writing files, executing code in a loop. When the context degrades, the agent doesn’t just give a bad answer. It takes bad actions. It builds on wrong assumptions. It compounds errors across steps until the output is unsalvageable.

Context bloat. Every tool call adds observations to the context. A web search returns a page of results. A file read dumps content in. After fifty steps, the context window is packed – and most of what’s in there is stale. The agent is swimming in old data that’s drowning out current instructions. Manus measured this: their average input-to-output token ratio is 100:1. One hundred tokens of context for every one token of action.

Attention decay. The transformer architecture that powers every major model creates relationships between all tokens in the window. At 10,000 tokens, that’s 100 million relationships. At 100,000, it’s 10 billion. The model doesn’t attend to all of that evenly. Research consistently shows performance drops well before the window fills up. Your agent technically has room. It just stopped using it effectively.

Tool overload. Give an agent too many tools and it struggles to choose the right one. The tool definitions alone eat context. Manus found that loading tools dynamically made things worse – removing a tool definition mid-session confused the model because previous actions still referenced it. The solution wasn’t more tools. It was fewer tools, better scoped.

Lost goals. Complex tasks require the agent to maintain a plan across many steps. But the plan was stated at the beginning of the conversation, and the middle of the context window is exactly where attention is weakest. The agent executes step 47 without remembering why it’s doing any of this.

Why this matters beyond developers

You might read this and think it’s a technical problem for engineers. It isn’t.

If you’re a business owner using AI to produce content, manage data, or run any kind of multi-step process, you’re running into context failures. You just don’t have the vocabulary for it yet.

When your AI assistant “forgets” what you told it earlier in the conversation – context failure. When a workflow produces inconsistent results with the same inputs – context failure. When an AI tool works brilliantly on simple tasks and falls apart on complex ones – probably context failure.

The instinct is always the same: the tool isn’t good enough yet. Wait for the next version. Try a different product.

The tools are already good enough. The context isn’t.

What changes when you see this

Once you frame failures as context problems rather than model problems, different questions become obvious. Instead of “which model should I use?” you ask “what does the model need to know at this step?” Instead of building longer conversations, you build structured handoffs. Instead of loading everything upfront, you retrieve what’s needed just in time.

This is the shift from prompt engineering to context engineering. Not a rebrand – a fundamentally different discipline. Prompts are what you say. Context is everything the model can see. Managing the latter is harder, more architectural, and more consequential than crafting the former.

I didn’t arrive at this through theory. I arrived at it through watching my own agent sessions fail – and noticing that the failures had patterns. The patterns pointed to context, not capability.

The uncomfortable part

If you’re waiting for smarter models to solve your problems, you’ll keep waiting. Models will improve – they always do. But the context problem doesn’t shrink as models get smarter. It shifts. Smarter models can handle more autonomy, but more autonomy means more steps, which means more context to manage, which means more opportunities for context to degrade.

Manus rebuilt their framework five times in six months. Not because the models got worse. Because the models got better, and the existing context architecture couldn’t keep up.

The teams that succeed with AI agents aren’t the ones with access to the best models. Everyone has access to the best models. The teams that succeed are the ones that engineer their context – that treat the information environment around the model as the primary design surface, not an afterthought.

That’s the shift. It’s not glamorous. But it’s where the actual leverage is.

Let’s fix the context

If your AI tools work brilliantly on simple tasks and fall apart on complex ones, context is almost certainly the reason. I help business owners architect their AI context so it works consistently – not just when you get lucky with a prompt.

Most Agent Failures Aren’t Model Failures. They’re Context Failures.

The pattern everyone’s finding

What “context failure” actually looks like

Why this matters beyond developers

What changes when you see this

The uncomfortable part

Let’s fix the context

More Blog Posts

Prompting Into The Void

I Built a Website Entirely With AI. It Worked — But It Forced Me to Rethink What “Fast” Really Means.

Use Protocols, Not Prompts

Get in touch

Get in touch

Get the latest