How AI coding agents work—and what to remember if you use them

This context limitation naturally limits the size of the codebase that LLM can process at one time, and if you load a lot of huge code files into the AI ​​model (which LLM has to re-evaluate every time you submit a different response), it can use up tokens or usage limits pretty quickly.

Tricks of the trade

To get around these limitations, encoding agent creators use several tricks. For example, artificial intelligence models are finely tuned to write code to communicate actions to other software tools. For example, they can write Python scripts to extract data from images or files rather than passing the entire file through LLM, which saves tokens and avoids inaccurate results.

Anthropic Documentation notes that Claude Code also uses this approach to perform complex data analysis on large databases, writing targeted queries and using Bash commands such as “head” and “tail” to analyze large amounts of data without even loading full data objects into the context.

(In a sense, these AI agents are supervised but semi-autonomous programs using tools that are a basic extension of the concept we saw it for the first time at the beginning of 2023.)

Another major breakthrough in agent technology comes from dynamic context management. Agents can do this in several ways that are not fully exposed in proprietary coding models, but we know the most important method they use: context compression.

Command line version of OpenAI Codex running in a macOS terminal window.


Credit: Benj Edwards

When LLM encoding approaches the limit of its context, this method compresses the history of the context by summarizing it, losing details in the process, but reducing the history to key details. Anthropic Documentation describes This “compression” consists of highly precise distillation of context content, preserving key details such as architectural decisions and unresolved errors, while discarding redundant tool results.

This means that AI coding agents periodically “forget” much of what they are doing whenever this compression occurs, but unlike older LLM-based systems, they are not completely unaware of what happened and can quickly reorient themselves by reading existing code, written notes left in files, change logs, and so on.

Leave a Comment