10 Things the Best AI Coding Agents Do Differently

Everyone is building AI agents. Most of them plateau at “chat with your codebase” and never get meaningfully better. But a handful of open-source coding agents have crossed a threshold where developers actually trust them with real work — refactoring modules, fixing bugs across files, scaffolding entire features.

What are they doing that the rest aren’t?

After studying the architecture of several top-tier open-source coding agents, here are ten patterns that keep showing up. Not theory — structural decisions you can apply the next time you build an agent, whether it’s a coding assistant, a support bot, or an internal tool.

1. They split “primary” and “subagent” as first-class concepts

The best agents don’t run as one monolithic loop. They distinguish between primary agents (user-facing, conversational) and subagents (specialized workers that get delegated to). A coding agent might have a primary agent for building and a separate one for planning. Behind those, subagents handle focused tasks — deep codebase exploration, multi-step research — and report back.

Then there’s a third, often invisible tier: housekeeping agents. These handle context compaction, session titling, and conversation summarization. They’re unglamorous, but they’re the reason long sessions remain coherent instead of degenerating into confused repetition.

The lesson: every agent you build should answer the question “is this user-facing or delegated?” If you only have one agent doing everything, your context window is doing double duty as both a scratchpad and a conversation — and it’s bad at both.

2. They use permission modes, not sandboxes

A read-only “planning” agent isn’t a separate model or a rewritten system prompt. It’s the same agent with file editing set to “deny” and shell commands set to “ask.” The best agents express risk as a permission matrix — per-agent, per-tool, with glob-pattern granularity:

{
  "bash": {
    "*": "ask",
    "git status *": "allow",
    "git diff *": "allow"
  }
}

This is strictly better than a binary sandbox toggle. git status should never need human approval. rm -rf always should. Glob patterns let you express that without writing any code.

The lesson: risk surface belongs in configuration, not in branching logic. One agent definition plus a permission matrix gives you “analyst,” “reviewer,” and “full developer” without forking your prompts.

3. They define agents as files in the repo

The most capable agents aren’t configured in a SaaS dashboard. Their agent definitions live as markdown or JSON files checked into the repository — project-level overrides in a .agents/ directory, global defaults in ~/.config/. Each file declares the model, prompt, temperature, permissions, and step budget.

The lesson: agent definitions are code. They should be versioned, reviewed in pull requests, and overridable per-project. If your agent’s behavior is configured somewhere you can’t git diff, you’ve already lost reproducibility.

4. They ship a project memory file

A single markdown file — checked into the root of the repository — captures the project’s structure, conventions, and coding patterns. Every session loads it into context automatically. No embeddings, no vector database, no retrieval pipeline. Just a flat file that developers maintain like any other piece of documentation.

The lesson: the right abstraction for project-specific knowledge is the simplest one that works. A memory file in the repo is cheap, version-controlled, auditable, and universally understood. Only reach for retrieval when a flat file genuinely stops scaling — and for most codebases, it won’t.

5. They cap agentic loops with a step budget

Every serious agent has a max_steps parameter: a hard limit on how many tool-call iterations the agent can take before it’s forced to respond with text only. It’s the simplest possible guardrail, and it prevents two failure modes at once — runaway API costs and infinite tool loops where the agent keeps trying the same broken approach.

The lesson: every agent loop you build should have a step budget. It takes five minutes to implement and saves you from the 3 AM page about a $400 loop that accomplished nothing. If you don’t have this, add it before you add anything else.

6. They invest in semantic tools, not just text search

The best agents don’t just have grep and find. They integrate with Language Server Protocol out of the box — giving the agent access to type information, go-to-definition, find-references, and real-time diagnostics from the compiler.

An agent with find_references via LSP is categorically different from one relying on regex. It knows that UserService in auth.ts is the same class as UserService in routes.ts. Regex can only guess.

The lesson: the quality ceiling of your agent is set by the quality of its tools, not by prompt engineering. If you’re trying to make your agent smarter by tweaking system prompts, you’re optimizing the wrong thing. Give it better tools instead.

7. They separate the client from the server

The agent runs as a background process. The terminal UI, the desktop app, and the IDE extension are all just clients talking to the same session. This isn’t overengineering — it’s what makes features like shared sessions, mobile access, and headless CI runs fall out naturally.

The lesson: if there’s any chance you’ll want a second interface to your agent (a web dashboard, a Slack bot, a CI step), design it as a server from day one. The cost is marginal, and retrofitting it later means rewriting your state management.

8. They treat context compaction as an agent task

When a conversation gets long, the best agents don’t just truncate old messages or use a sliding window. They run a dedicated compaction agent — a smaller, cheaper model with a specific prompt — that produces a structured summary of what happened so far. The main agent then continues with the summary instead of the raw history.

The lesson: context management is itself an agentic task. Hand-rolled heuristics (drop messages older than N turns, keep the last K tool results) lose important information unpredictably. A prompted summarization step is more expensive per invocation but dramatically cheaper than the cost of a confused agent re-doing work it’s already done.

9. They’re model-agnostic by design

The best agents don’t bake in vendor-specific assumptions. They abstract the provider, let users configure different models per agent, and treat the model as a swappable component. The planning agent can run on something cheap. The coding agent gets the most capable model available. Users can A/B test without changing any prompts.

The lesson: today’s best model is tomorrow’s commodity. If switching providers requires a code change, you’ll always be a release behind. Abstract early, and let configuration — not code — decide which model powers which agent.

10. They give boring prompting advice

The official documentation for the best coding agents doesn’t recommend elaborate prompt-engineering techniques. The guidance is simple: give the agent plenty of context and examples, talk to it like you would a junior developer on your team, and iterate on plans before jumping to implementation.

The lesson: the best prompt-engineering advice is still boring. Context plus examples plus explicit constraints beats clever tricks every time. If your agent is failing, the first question should be “did I tell it enough?” — not “do I need a fancier prompt template?”

The meta-lesson

None of these patterns are individually revolutionary. There’s no single clever trick that makes an agent great. What separates the best from the rest is that every concern — permissions, context, delegation, housekeeping, provider choice — is an explicit, configurable, inspectable primitive.

The temptation when building agents is to cram everything into one system prompt and one main loop. Resist it. Factor concerns into named components. Make them configurable. Make them visible.

Your agents will get dramatically easier to debug, and debugging is where you’ll spend most of your time. The architecture that wins isn’t the most sophisticated one — it’s the one where, when something goes wrong, you can point to exactly which piece failed and why.