Agents fail for many reasons. Tool misuse, poor planning, broken loops. But the most common failure mode I encounter in production is simpler: the agent reasons from stale or missing information.
It is not stupid. It is confused.
The grounding problem
A long-running agent is, by default, blind to anything that happens after its context window was constructed. It knows what it was told at the start of the task. It does not know what has changed since.
For short tasks, this is fine. For tasks that span minutes, hours, or tool calls that modify the world, it is a significant problem. The agent will confidently reason from premises that are no longer true.
The naive fix is to inject more information: append current state at each step, dump database records into the context, fetch everything that might be relevant. This works until it doesn't — usually when you run out of tokens, when the signal-to-noise ratio degrades, or when you realize you are paying for 50,000 tokens of context when the agent only ever looks at the last 500.
Grounding is a retrieval problem
The right frame is not "how do I inject more information" but "how do I inject exactly the information this agent needs to reason correctly at this step?"
This reframes grounding as a retrieval problem. You need:
- A model of what the agent is currently trying to do
- A function that maps that intent to a set of relevant facts
- A compression step that renders those facts into the minimum viable context
The first is just the agent's current goal or subtask. The second is retrieval — semantic search, structured lookup, or a mix. The third is where most of the engineering work lives.
Compression is not summarization
Compression in this context is not "generate a summary." It is "given a retrieval result, produce the tokens that maximally inform the agent's next action."
These are different. A summary preserves broad coverage. A compression for agent grounding optimizes for decision relevance. If the agent is deciding whether to approve a transaction, you want the facts that bear on that decision, rendered in a format that minimizes reasoning friction.
I find it useful to think of this as writing the context the way you would write a briefing document: what does this decision-maker need to know, and what can they safely not know?
The pattern
Concretely, the pattern I have settled on:
- At each significant agent step, re-evaluate what information the agent needs
- Run targeted retrieval against current state, not historical context
- Apply a compression pass that scores and trims retrieved facts by relevance to the current goal
- Inject the compressed result into the context at the relevant position
This adds latency. It also produces agents that are dramatically more reliable in long-running tasks, because they are reasoning from accurate information rather than increasingly stale context.
The token cost is usually lower too. Targeted, compressed grounding beats context dumps in both quality and efficiency.