Goodbye to tokenmaxxing and hello to token efficiency

Fresh developments suggest a reset in the race for superiority in artificial intelligence.

Sign up now: Get ST's newsletters delivered to your inbox

Unlike earlier digital technologies, AI services are not close to being free or marginally cheaper.

Unlike earlier digital technologies, AI services are not close to being free or marginally cheaper.

PHOTO: ST FILE

Tiger Tong

2026 may be the first year the tangible effects of AI are showing up in everyday work. For the past several years, the dominant assumption in the AI industry was simple: More computing power, more data and larger models would produce better results. Success belonged to those able to leverage the power of AI and willing to spend the most.

For that reason, tokenmaxxing – namely, the idea that employees should use as many tokens as possible – has become a buzzword among tech companies. Tokens are the basic unit that AI companies use to bill customers.

While OpenAI, Anthropic and many AI models offer individual subscriptions, the bulk of AI revenues do not come from retail users but enterprises that pay huge baseline subscription fees, plus variable usage-based rates, for tokens. Some tech companies even set up monitoring systems to track how aggressively their staff consume tokens.

That logic is still visible today. Nvidia CEO Jensen Huang estimated that a 1GW AI data centre will cost US$80 billlion (S$103.7 billion) to US$100 billion, up from US$50 billion to US$60 billion today. Despite this, the industry’s appetite for computing power continues to grow.

Daily US token consumption has grown by 100 times over the past two years, while China’s has increased 1,000-fold compared with two years ago due to a lower base. All this is excellent news for hardware vendors, but bodes poorly for AI users.

Unlike earlier digital technologies, AI services are not close to being free or marginally cheaper.

Nvidia CEO Jensen Huang introducing laptop models using RTX Spark GPUs during a keynote speech on the sidelines of the Computex trade show in Taipei on June 1.

PHOTO: REUTERS

Enter token-minimising

Another narrative is gaining traction – that of how businesses can get efficient about their AI use and how cheaply they can use it. In May, Uber reported blowing through its entire 2026 AI budget in just four months. The company has since set new limits on AI use as it reconsiders and recalibrates, as have Amazon, Meta and Walmart.

Reality is biting. Unlike earlier digital technologies, AI services are not close to being free or marginally cheaper. Every query, every task and every agent carries a computational cost and a bill. And despite a massive expansion of AI data centres, the cost of using frontier models is not dropping.

When a company spends a few dollars a month on AI, price differences matter little. But when AI becomes embedded across an entire organisation, the difference between paying US$5,000 and US$50,000 for the same amount of work becomes much harder to ignore.

Such developments point to a new phase in the AI race. While the first stage was marked by frontier intelligence, this second may be about efficiency and maximising value from AI use. This is where a second development becomes important.

Chinese AI firms have increasingly focused on delivering alternatives that might not be the most cutting-edge and powerful in the world, but are significantly cheaper while remaining fit-for-purpose for many commercial applications.

Top US models like Claude Opus and OpenAI GPT are priced more than 10 times higher than some Chinese models, yet the gap in reasoning power between these two groups might not be as wide for most AI applications. On everyday text reasoning, non-complex coding and most business applications, Chinese models are not far behind their American counterparts.

US AI labs do provide tiered models. Anthropic has Opus, Sonnet and Haiku, ranging from the most capable to the most efficient, but these still lose to Chinese open-source models on cost-efficiency.

Good enough

For many companies, advanced AI models are a plus but are not necessary for most everyday operations. What is more likely is that enterprises will experiment with a mixed-usage model – using frontier US models for high-value planning, complex reasoning and difficult coding tasks, while routing the bulk of execution (in summarisation, translation, content filtering, batch processing and routine coding) to cheaper Chinese models.

How did the cost of Chinese AI models get so low? Part of the answer lies in making a virtue out of necessity. Chinese labs, severely constrained by technology sanctions, have had to develop creative workarounds and aim for “good enough” performance in order to maintain decent operating margins. Huawei’s AI chip architect Xia Jing estimated that DeepSeek V3’s inference cost was only US$0.14 per million tokens – less than the price DeepSeek charges.

Chinese chipmakers also have had to get around technology constraints set on extreme ultraviolet lithography machines the country could import. That has resulted in less technologically advanced but indigenously produced deep ultraviolet lithography machines tested in Chinese foundries, and new chip and transistor designs.

All these efforts will take time to prove themselves, but the relentless focus on efficiency could deliver further dividends, as AI systems upgrade their capabilities through recursive improvements. While Elon Musk expects a Chinese lab to reach the level of Fable 5, the best model in the world, by the first quarter of 2027, Tang Jie, one of Zhipu’s founders, argues it will not take that long.

Pursuit of broad-based adoption

A final note: The AI race is often framed in terms of China or the US winning. Such a zero-sum framing is unproductive as the goal should be wider adoption and lower costs, so as to spur innovation, enterprise development and broad-based growth.

If AI costs fall by 90 per cent while capability remains largely intact, adoption could spread from technology companies to accountants, lawyers, insurers, logistics firms and small and medium-sized enterprises. The focus therefore should not be on who builds the best model, but how many workers suddenly find AI embedded in their daily workflows.

Getting costs down will be key, especially if commercial operations pivot towards agentic AI. A recent Gartner report projects that 40 per cent of enterprise applications will feature specialised AI agents by the end of 2026. Instead of merely answering prompts, these systems proactively plan, reason and execute intricate workflows.

Because such autonomous loops require continuous background reasoning and tool integrations, agentic deployments require an estimated 20 times more token consumption compared with standard chatbots. Consequently, bottom-line commercial success hinges heavily on token efficiency to minimise the computational cost of iterative reasoning loops to protect returns on investment.

One thing remains clear: The earlier mantra of tokenmaxxing is giving way to one focused on token efficiency.

  • Tiger Tong advises on China strategies at Delorean Partners.

See more on