Anthropic Claude Tokens Cut 75% with 'Caveman' Prompt

A Developer Cut Claude's Token Use by 75% With Broken English

Jeffrey Liu·April 6, 2026·3 min read·1 sources·AI Agents

Key Takeaways

1Replacing full English sentences with stripped, telegraphic prompts reduces Claude token consumption by up to 75%.
2The technique — nicknamed 'caveman talk' — works because LLMs process meaning, not grammar.
3Claude's new daily usage caps make token efficiency a practical concern, not just a cost optimization.
4Tool-first prompting (ask Claude to use a tool instead of explaining) compounds the savings further.

A developer dramatically cut the token usage of Anthropic's Claude by up to 75% by instructing the AI to communicate like a "caveman." This prompting technique, shared by a Reddit user , involves using short, 3-6 word sentences and eliminating all filler, directly addressing growing concerns over Claude's rising operational costs and recent usage limit reductions. This innovative approach surfaces as Anthropic faces user backlash for new usage caps and changes to its subscription model, making efficient token use more critical than ever. Users are hitting limits "way faster than expected," especially on paid accounts, according to BBC .

Optimizing Claude's Token Consumption

The developer's "caveman" prompting method relies on a few straightforward rules to minimize output length and, consequently, token consumption. These rules include using short sentences, typically between three and six words, and strictly avoiding any pleasantries or preambles. Additionally, the technique mandates running tools first, displaying results immediately, and then stopping without any additional narration. For instance, instead of "I will fix the code," the AI is instructed to say "Me fix code." This directness strips away unnecessary linguistic overhead.

The impact of this method is significant. A typical web search task that might consume around 180 tokens with standard prompting can drop to approximately 45 tokens. This represents a 135-token saving per task , or a 75% reduction. Each "grunt swap"—replacing verbose AI output with a concise phrase—saves between 6 and 10 tokens. Over an entire task, this can lead to a 50-100 token saving, yielding a substantial 50-75% overall burn reduction when combined with a "tool-first" execution strategy.

Why Efficient Prompting Matters Now

The push for token efficiency comes amid significant shifts in Anthropic's service model. The company recently imposed stricter usage limits on its Claude chatbot, frustrating many devoted users. These changes have led to users, including those on paid subscriptions, encountering usage caps much more frequently than before, impacting their workflow, particularly for programming and coding tasks in Claude Code .

Further complicating matters, Anthropic has ceased allowing its Claude subscriptions to integrate with third-party AI agent platforms like OpenClaw . This decision stems from the "outsized strain" such usage places on Anthropic's compute resources, according to Business Insider Africa. Users leveraging these platforms now must transition to a pay-as-you-go model or use a separate API key, which charges per token rather than offering the flat-rate access previously available with Pro and Max plans. This effectively translates to higher costs for heavy users, making every token count.

Research Sources

FAQFrequently Asked Questions

Caveman prompting is a technique where you strip grammar, filler words, and sentence structure from AI prompts — writing like a telegram rather than a full sentence. The LLM understands the intent without processing the overhead, reducing token consumption by up to 75%.

LLMs process meaning, not grammar. Words like 'please', 'could you', 'I would like you to' add tokens without adding meaning. Stripping them cuts consumption without degrading output quality.

Yes — it's more relevant now than ever. Claude's daily usage caps make token efficiency a hard constraint, not just a cost optimization. Users applying this technique report staying under caps that previously interrupted their workflows.

Tool-first prompting means asking Claude to use a specific tool (search, code execution, calculator) instead of asking it to explain or reason through a task. It compounds token savings because the tool does the work with minimal back-and-forth.