Here is something nobody talks about when they compare AI coding agents: the architecture determines your API bill, not the model.
Yes, lazy loading is a massive factor in why OpenCode is cheaper than Hermes and OpenClaw. But the real reason it dominates financially comes down to something deeper: OpenCode is stateless and execution-focused. Hermes and OpenClaw are stateful, always-on automation engines. That distinction changes everything.
When you scale up to using hundreds of specialized instructions, how an agent handles that data completely alters your monthly bill. And if you get the architecture wrong, you are paying for tokens you never even used.
- OpenCode is stateless. It fires up for a task, loads what it needs, executes, and terminates. You never pay for idle context.
- OpenClaw and Hermes are stateful. They maintain persistent memory, heartbeat loops, and autonomous background processes that accumulate context over hours and days.
- Lazy loading amplifies the difference. Instead of injecting every skill profile into every turn, OpenCode loads skill content only on demand. With many skills, the savings compound.
- Flat-rate subscriptions change the math. OpenCode Go at $10/month pairs lazy loading with fixed pricing, driving operational costs toward zero relative to pay-per-token competitors.
The Token Math: How Lazy Loading Saves Money
A standard agent framework typically inlines its system prompts, tool definitions, and skill contents directly into the initial context window. Every turn pays for every token in that window, even if the agent is just saying "got it" or asking a one-line clarification.
OpenCode flips this. It loads skill names and short descriptions into the context from the start. That is roughly a few thousand tokens regardless of how many skills you have. The full skill file only loads when the agent actively needs it. If it never uses a skill during a session, you never pay the token cost for its contents.
Here is what that looks like in practice:
| Framework Component | Standard Agent (Full Injection) | OpenCode (Lazy Loading) |
|---|---|---|
| System prompt and tool definitions | ~25,000 tokens | ~25,000 tokens |
| 100 plus skill profiles | ~50,000 plus tokens (full text) | ~4,000 tokens (names and summaries) |
| Active skill content | Already loaded | ~1,000 tokens (loaded on demand) |
| Total base cost per turn | ~75,000 plus tokens | ~30,000 tokens |
The token counts are illustrative estimates based on typical system prompt sizes and average skill file lengths. Your actual numbers will vary by implementation. But the pattern holds regardless of the exact figures: inlining everything pushes your context window up by a factor of two or more. Every turn, every time.
Prompt caching helps with repeated context, but even cached reads carry a fraction of the base cost. A smaller base context means cheaper cache hits and cheaper turns when cache misses. The savings compound across every session.
And here is the thing about lazy loading that does not get enough attention: the advantage grows with your skill library. If you have five skills, the difference is modest. If you have a hundred plus, it dominates your bill. More skills means more idle tokens you are not paying for in OpenCode. More skills means the other frameworks keep getting more expensive.
The Architectural Trap of OpenClaw and Hermes
Lazy loading keeps OpenCode's context clean. But the cost spikes people experience with OpenClaw and Hermes are typically tied to something else entirely: statefulness and loop behavior.
OpenClaw: The Always-On Token Burner
OpenClaw, built by Peter Steinberger, is a personal AI assistant that lives in your chat apps. WhatsApp, Telegram, Discord. It remembers everything, manages your calendar, checks you in for flights, processes your email. It runs a heartbeat system that periodically checks in and takes autonomous action.
This is incredibly powerful. It is also why costs can spiral if you are not paying attention.
OpenClaw is built for multi-agent coordination across channels. It lacks tight loop-throttling out of the box. If it gets stuck in an autonomous debugging or notification loop without a heartbeat schedule configured properly, it accumulates context over hours and days. Users have reported unintended API bills from leaving OpenClaw agents running unmonitored in the background. One user noted burning through their entire Claude Max subscription limit rapidly after setting it up.
The issue is not that OpenClaw is wasteful. It is that OpenClaw is designed to stay on. It is doing what it was built to do. The cost is a side effect of the always-on architecture, not a bug.
Hermes Agent: The Memory Heavyweight
Hermes, built by Nous Research, takes statefulness even further. It is a self-improving agent with a built-in learning loop. It creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations via FTS5 full-text search, and builds a deepening model of who you are across sessions using Honcho dialectic user modeling.
To keep its persistent goals and memory syncs alive, Hermes carries dense, long-running context. It can pair with external systems like Obsidian for knowledge management. It has a built-in cron scheduler that runs unattended automations. All of this means Hermes inherently processes far more tokens over time than a single-session tool.
Interestingly, Hermes explicitly positions itself as an OpenClaw alternative. It offers a hermes claw migrate command that imports your OpenClaw settings, memories, skills, and API keys. This is a market signal: when a competitor builds a migration path specifically targeting another tool's users, it often means they see a vulnerability. In this case, that vulnerability is cost and architecture.
OpenCode: The Stateless Task Executor
OpenCode functions like a command-line utility. You spin it up, give it a project task, it dynamically loads the required skill, writes the fix, and finishes. It does not maintain an expensive, always-on background presence. It handles tasks in isolated, tightly scoped sessions.
You never pay for idle thinking. No heartbeat loops. No persistent-memory token accumulation. No background cron jobs burning through your API credits while you sleep.
| Dimension | OpenCode | OpenClaw | Hermes |
|---|---|---|---|
| Architecture | Stateless | Stateful | Stateful |
| Primary use case | Task execution | Personal assistant | Self-improving agent |
| Idle cost | Zero | Heartbeat loops | Memory syncs plus cron |
| Skill loading | On demand | Context window | Context window |
| Memory model | Session only | Persistent | Persistent plus self-improving |
| Best for | Coding, bug fixes, tasks | Life automation | Research, continuous agents |
Flat-Rate Arbitrage via OpenCode Go
There is another reason OpenCode comes out ahead on cost, and it is not about architecture at all. It is about how you pay for the models.
If you use token-based APIs, costs are unpredictable. You do not know how many turns the agent will take, how large the context will grow, or whether a loop will burn through your credits overnight.
OpenCode Go is a flat-rate subscription: $5 for your first month, then $10 per month. It gives you access to a curated set of open-source coding models including DeepSeek V4 Pro, GLM-5, Kimi K2.5, Qwen3.7 Plus, and MiniMax M2.5. At $10 per month, you get roughly $60 of usage value.
Pair lazy loading with flat-rate pricing and your operational costs become effectively fixed. You know exactly what you will pay next month. No surprises, no runaway bills, no waking up to find your agent has been in a loop for six hours.
- $5 first month, $10/month ongoing. Flat rate regardless of model choice.
- 14 open-source coding models. Includes DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen3.7 Max.
- ~$60/month in usage value. Roughly 3,000 to 30,000 plus requests per month depending on model.
- Zero retention. Providers follow a no-data-retention policy. Your code is not used for training.
Which One Should You Use?
None of this means OpenClaw or Hermes are bad tools. They are not. They are solving different problems with different architectures, and those architectures carry different cost profiles.
If you need an always-on personal assistant that manages your calendar, processes your email, and responds to you on WhatsApp while you are out walking your dog, OpenClaw is probably the right tool. Just monitor your usage.
If you need an agent that learns from experience, improves its own skills, and builds a deepening model of your preferences across weeks of interaction, Hermes is likely the fit.
But if you need to ship code, fix bugs, and execute discrete tasks without paying for idle context you are not using, OpenCode is the clear winner on cost. The architecture makes the difference. Lazy loading amplifies it. Flat-rate pricing locks it in.
Pick the tool that matches your actual use case. Just know what you are paying for.
Frequently Asked Questions
Does lazy loading really save that much money?
It depends on scale. With a small number of skills, the savings are noticeable but modest. With many skills, the difference compounds because you are avoiding paying for hundreds of idle skill files on every turn. The architecture choice matters more than the lazy loading at small scale. Both matter at large scale.
Can I use OpenCode Go with other providers?
Yes. OpenCode Go is optional and works alongside any other provider you configure. It is an add-on, not a replacement. You can use Go for most tasks and switch to a different provider when you need a model not included in the Go catalog.
Is OpenClaw bad because it costs more?
No. OpenClaw is excellent at what it does. The cost reflects the architecture: always-on personal assistants that manage your life require persistent context and regular check-ins. That costs tokens. If you need that functionality, the cost is justified. Just understand what you are paying for and monitor your usage.
What about Hermes migration from OpenClaw?
Hermes offers hermes claw migrate for importing settings, memories, skills, and API keys from OpenClaw. If you are already running OpenClaw and are curious about Hermes, this makes switching straightforward. But the architecture remains stateful on both sides, so the fundamental cost profile does not change dramatically.