Running an AI agent is a lot like hiring a brilliant intern who has never seen a credit card bill. Give them no guardrails, and they'll spend your entire budget "researching" the most expensive way possible to check your email.

I found this out the hard way. When I first set up Donna (my AI assistant), I was so focused on getting the configuration right that I didn't notice I was hemorrhaging cash. Two weeks in, I was spending over $250 a day in API tokens.

The frustrating part: I wasn't getting anything close to $250 worth of output. The bot was burning tokens on context rewrites, running the wrong models for the wrong jobs, and chatting away when it should have been quiet.

I had to stop thinking like a user and start thinking like an operator. Once I did that, costs dropped and output went up. Today I spend around $50 a day and get the output of two full-time employees.

Here's what changed.

The model-per-topic setup

My Telegram workspace with topics — one model per department

One of the most effective changes I made was moving away from a single catch-all chat and putting Donna in a private Telegram group with topics enabled. Each topic runs a different model.

My product updates topic just fetches data and sends a notification — it runs on Claude Haiku, the cheapest option. My marketing topic does creative work, so it gets Sonnet. Anything requiring serious reasoning stays with Opus.

The result: I stopped paying Opus prices for work that Haiku handles fine. The quality didn't drop. The cost did.

Fix your caching first

If there's one change that has the highest impact for the least effort, it's prompt caching — and it's one that OpenClaw claims to handle automatically but often doesn't.

Every time your bot runs, it sends its full context to the model. Without caching, you pay full input price every time. With caching, the provider keeps the context in memory, and subsequent reads cost about 10x less.

The fix: set cacheRetention: "long" explicitly on every Anthropic model in your config. Don't rely on the automatic setting — fleet analysis shows it's unreliable on a meaningful percentage of instances.

With "long" caching and a 30-minute heartbeat: one cache write per hour, cheap reads for every check-in within that window. Without it: 12 cache writes a day at full price. That difference alone accounts for a large share of the 70% reduction I saw.

Replace multiple cron jobs with a single heartbeat

I had three cron jobs running in parallel — email, calendar, and notifications — each firing every 15 minutes. That's 12 API calls per hour, each starting its own fresh session with no shared context.

Switching to a single heartbeat that handles all three cut that to 2 calls per hour. The heartbeat is also a better setup than cron for monitoring tasks because it has full session context — decisions are more coherent.

The rule I use: if it's monitoring or awareness, heartbeat. If it's a scheduled deliverable (a weekly digest, a report), cron.

Keep your config files lean

Every character in your AGENTS.md and TOOLS.md files costs tokens on every single turn. I've seen people stuff entire procedures into their main config file — that's a mistake.

The better approach is SKILL.md files. They're lazy-loaded, meaning they cost nothing until the bot actually needs them. Keep the main config file as a routing layer and move the detail into skills.

At 20 skills, the compact listing alone runs about 1,250 tokens per turn. That's before any skill content loads. It adds up.

Context pruning — worth knowing about

Every tool result your agent generates — file reads, email fetches, search results — accumulates in the context window. Without cleanup, these stack up and push sessions toward compaction faster than they need to.

Setting contextPruning.ttl: "1h" trims old tool results before each model call. The default is "5m", which is too aggressive — it causes the context to shift constantly, which triggers unnecessary cache writes. Matching the TTL to Anthropic's 1-hour cache window keeps things stable.

The one-word heartbeat

When your bot checks in and has nothing to do, it should say HEARTBEAT_OK and nothing else. That's a specific signal to OpenClaw to end the session immediately.

If it writes "Everything looks good, no new mail!" instead, you're paying for those words. On 48 heartbeats a day, that adds up faster than you'd think.

A note on real costs

Let's be honest about something: running OpenClaw for work that actually matters requires capable AI models. You're not going to get human-grade judgment out of a $3-per-million-token model.

Once you're running Sonnet or Opus for serious tasks, you're looking at $20-80 a day depending on usage. That's not a SaaS subscription — it's closer to what you'd spend on a part-time contractor's afternoon.

The comparison that makes sense isn't "OpenClaw vs. a $30/month tool." It's "OpenClaw vs. a second employee." I'm getting output that would take two full-time people to match, for a fraction of what I'd pay them. That math works. But it only works if you're pointing the agent at things that are worth it — complex work, client-facing tasks, decisions with real stakes.

For low-value automation, the numbers don't add up. For the work that moves the needle, they do.

Everything in this post is configured out of the box on MoltBot Ninja. If you'd rather skip the setup and get straight to the part where it works, that's the shortcut.

How to Stop Your OpenClaw Bot from Burning a Hole in Your Pocket

The model-per-topic setup

Fix your caching first

Replace multiple cron jobs with a single heartbeat

Keep your config files lean

Context pruning — worth knowing about

The one-word heartbeat

A note on real costs

Continue Reading

I Was Running. My AI Was Running My Business.

Your Agent Never Forgets a Relationship. You Never Lose a Deal.

Your Agent Can Now Work in Your Browser. You Watch Every Move.