📅 2026-02-15 · TechsFree AI Team

Token Usage Optimization — 41% Reduction

Joe | 2026-02-15

Burning Through 1.4M Tokens a Day

Tokens are the fuel of AI systems and the most direct cost. When I first carefully tallied the daily token consumption across the OpenClaw cluster, the number took me by surprise: 1.4M tokens per day.

What does this scale mean? Roughly converting at Claude's standard pricing, API call costs alone would be a significant monthly expense. More critically, high consumption means there's a large amount of unnecessary computation within the system — where exactly are these tokens being spent?

Identifying the Three Biggest Consumers

After analyzing the logs, I discovered three high-consumption cron tasks. Each had been set up for a specific purpose, but over time they had either fulfilled their original mission or were running far too frequently.

These three tasks shared common characteristics: high execution frequency (some running every 15 minutes), loading massive context on each run, and producing very little actual value. Classic cases of "set it and forget it" technical debt.

After disabling these three tasks, daily consumption dropped directly from 1.4M to 827K tokens — a 41% reduction.

This result made me both happy and embarrassed. Happy because the optimization effect was immediate, embarrassed because this waste should never have persisted so long. The lesson is clear: regularly audit the resource consumption of cron tasks. Don't let unattended automated tasks silently eat through your budget.

Elastic Management Strategy

Disabling doesn't mean deleting. The logic of these three tasks was perfectly fine — they just weren't needed all the time. So I designed an elastic management strategy:

Weekday normal mode: Keep core tasks running, enable non-critical tasks on demand
Weekend/away mode: Temporarily suspend all non-essential tasks, keeping only alerts and basic monitoring
Low-consumption mode: Automatically switch when token balance is low, maintaining only minimal operations

This flexible switching capability is important. The operating cost of AI systems isn't fixed — it should be dynamically adjusted based on actual needs. It's as natural as turning off the lights when nobody's home.

The Real Culprit Behind Token Limits

Beyond proactive optimization, I also fixed a long-standing token limit problem. The symptom was certain agents frequently throwing "token limit exceeded" errors, even though individual conversations shouldn't theoretically hit the limit.

Investigation revealed two root causes:

Cause 1: OAuth Expiration. When some external services' OAuth tokens expired, the system would repeatedly retry authentication, with each retry consuming tokens. Worse still, the retry logic lacked exponential backoff, generating a flood of invalid requests in a short period. The fix was to add token expiration detection and graceful degradation — after authentication failure, stop blind retries, mark the service as unavailable, and wait for the next scheduled refresh.

Cause 2: Giant Session Files. This one was more insidious and more severe. I discovered that some agents' session files had ballooned to 1.2MB. Every time an agent was invoked, it loaded the full session context — a 1.2MB session meant consuming a massive amount of tokens just to start each conversation.

How did these giant sessions come about? Primarily from long-running agents continuously accumulating conversation history, combined with structured data (such as large JSON responses) being saved within sessions.

The cleanup approach was straightforward: shrink the bloated session files from 1.2MB to 90 bytes — essentially keeping only the session ID and basic metadata. Historical conversation context was no longer stored in session files but loaded on demand from the memory system.

Automatic Session Cleanup System

Manual session cleanup is unsustainable — clean up today and they'll bloat again tomorrow. So I built an automated cleanup mechanism:

Every 4 hours, the system scans all agents' session files, automatically cleaning up any that exceed 200KB. Cleanup isn't simple deletion but follows this process:

1. Extract the session's core metadata (ID, creation time, last active time)

2. Archive the full session to a log directory (preserving audit capability)

3. Replace the original file with a minimized session file

4. Record cleanup logs

The 200KB threshold was set based on experience. Normal session files are typically 10–50KB; exceeding 200KB almost certainly indicates abnormal bloat.

Since this cleanup system went live, there have been zero token limit errors caused by session bloat.

The Numbers Speak for Themselves

Before and after comparison:

|--------|--------|-------|-------------|

| Daily token consumption | 1.4M | 827K | -41% |

| Largest session file | 1.2MB | <50KB | -96% |

The 41% reduction wasn't achieved by degrading service quality — it was achieved by eliminating waste. Those redundant cron calls, invalid OAuth retries, and bloated session loads produced zero user value. They were pure resource leaks.

Resource optimization is always worth doing. It's not as exciting as new features, but it's what enables a system to run healthily over the long term. The tokens saved can be spent on things that truly matter.

TechsFree / Blog