Grounding Agents in Real Time

Agents fail for many reasons. Tool misuse, poor planning, broken loops. But the most common failure mode I encounter in production is simpler: the agent reasons from stale or missing information.

It is not stupid. It is confused.

The grounding problem

A long-running agent is, by default, blind to anything that happens after its context window was constructed. It knows what it was told at the start of the task. It does not know what has changed since.

For short tasks, this is fine. For tasks that span minutes, hours, or tool calls that modify the world, it is a significant problem. The agent will confidently reason from premises that are no longer true.

The naive fix is to inject more information: append current state at each step, dump database records into the context, fetch everything that might be relevant. This works until it doesn't — usually when you run out of tokens, when the signal-to-noise ratio degrades, or when you realize you are paying for 50,000 tokens of context when the agent only ever looks at the last 500.

Grounding is a retrieval problem

The right frame is not "how do I inject more information" but "how do I inject exactly the information this agent needs to reason correctly at this step?"

This reframes grounding as a retrieval problem. You need:

A model of what the agent is currently trying to do
A function that maps that intent to a set of relevant facts
A compression step that renders those facts into the minimum viable context

The first is just the agent's current goal or subtask. The second is retrieval — semantic search, structured lookup, or a mix. The third is where most of the engineering work lives.

Compression is not summarization

Compression in this context is not "generate a summary." It is "given a retrieval result, produce the tokens that maximally inform the agent's next action."

These are different. A summary preserves broad coverage. A compression for agent grounding optimizes for decision relevance. If the agent is deciding whether to approve a transaction, you want the facts that bear on that decision, rendered in a format that minimizes reasoning friction.

I find it useful to think of this as writing the context the way you would write a briefing document: what does this decision-maker need to know, and what can they safely not know?

The pattern

Concretely, the pattern I have settled on:

At each significant agent step, re-evaluate what information the agent needs
Run targeted retrieval against current state, not historical context
Apply a compression pass that scores and trims retrieved facts by relevance to the current goal
Inject the compressed result into the context at the relevant position

This adds latency. It also produces agents that are dramatically more reliable in long-running tasks, because they are reasoning from accurate information rather than increasingly stale context.

The token cost is usually lower too. Targeted, compressed grounding beats context dumps in both quality and efficiency.