What Building a Custom AI Image Model Taught Me About the Real Cost of Tokens

April 26, 2026

What Building a Custom AI Image Model Taught Me About the Real Cost of Tokens

New technology has always changed how we work. But it rarely changes what good management looks like. If anything, AI has reminded me that strong fundamentals are always required: good people, solid process, and the right technology.

When I started building custom AI image models, I thought I would simply learn AI. What it actually did was reinforce good resource management, business operations, and security operations principles — and then teach me to build AI on top of them. The technology was new. The lessons weren’t.

What Is a Token, and Why Should a Business Leader Care?

Let’s get this out of the way without the jargon.

In large language models, a token is roughly a word or part of a word. Every time you prompt an AI, you spend tokens — on your input and on the output it generates. Image generation works similarly: every render, every iteration, every failed attempt burns compute cycles. Whether you’re paying in dollars per API call or in electricity and GPU time on your own hardware, nothing is free. There is always a cost.

Most organizations today are spending tokens the way employees used to make long-distance calls on a company phone, or the way cloud services get provisioned — without thinking about it until the bill arrives. And by then, the habits are already set.

The Image Model as a Business Case Study

A few years ago, Eight Buffalo Media Group invested in a Linux-based system equipped with a high-end GPU card to serve as our primary AI platform. I outlined part of that journey here. That single hardware decision changed everything about how we approached AI costs. It reduced internet bandwidth usage, gave us unlimited working hours on our own terms, and brought our token cost down to roughly the price of electricity. Open models on physical infrastructure meant we controlled the environment — and the feedback loop.

Building the publicly available 8Buff SDXL models, and the many custom models we never released, required iterative runs, parameter tuning, and an enormous number of failed generations before we got good ones. We generated tens of thousands of images and videos that were ultimately deleted. Each failure taught us something. And because we were running on our own infrastructure with open models, each failure cost us almost nothing compared to what the same experimentation would have run on paid API credits.

But here’s what the process really taught us: you don’t truly understand what something costs until you’re the one directly paying for it.

Before we used any external API services — image-to-video generation, for example — we established a standard: our models had to follow prompts at least 75% of the time and produce images suitable for training data. That benchmark was measured subjectively at first by our own creators, and later validated by image review software that described image content and compared it against known prompts. It wasn’t a perfect science, but it gave us a quality gate. Nothing left our internal environment for an external paid service unless it had earned it.

That discipline kept our learning process cheap. More importantly, it gave us control of a base model we still use daily — even though it is now three years old. Which brings me back to something I’ve believed for a long time: good people, process, and technology will outperform the newest, fastest, and most expensive option. Every time.

The Three Hidden Token Costs Most Businesses Are Missing

As I wrote about in AI Won’t Simplify Your Work — It Will Intensify It, the biggest AI challenges aren’t technical. They’re managerial. Token waste is no different.

1. Talk to Humans First — Then Prompt

Overly long, unfocused system prompts that get repeated across every API call are one of the most common and invisible sources of token waste. They usually exist because the team hasn’t had a conversation about what they actually need the AI to do. A well-facilitated fifteen-minute team discussion can eliminate pages of redundant prompt engineering. Communication is still cheaper than compute.

2. Be Ruthless About Priorities

Overproduction is the silent killer of AI budgets. Generating ten variations when two would do isn’t creativity — it’s a planning gap. Before any generation happens, your team should be able to answer: what decision does this output support? If they can’t, the generation shouldn’t start. We learned this the hard way with thousands of deleted images.

3. Lead With Strategy, Not Validation Loops

Asking AI to check AI sounds efficient. In practice, it compounds cost fast and often compounds errors with it. Validation loops that substitute for human judgment — or worse, for a clear strategy — are expensive in tokens and even more expensive in the decisions they produce. AI rewards clarity. It punishes vague thinking with very polished, very wrong answers.

Token Cost Is Really a Judgment Cost

This is the point most budget conversations miss entirely.

The reason tokens get wasted isn’t a technology problem. It’s a strategy problem. Teams without clear goals generate endlessly. Leaders who haven’t defined what “good output” looks like let their people spin — and let the meter run.

If you’ve read anything I’ve written about the DIKW model — data, information, knowledge, wisdom — this is where it lands in practice. Data and information have never been cheaper to produce than they are right now. AI has made them practically free. But wisdom — knowing what to build, when to stop, and what’s actually worth spending on — that hasn’t gotten any cheaper at all. That still costs human judgment. And human judgment still requires leadership, strategy, and communication.

Token costs add up fast without a clear objective to build toward. Spending to build tools and services that directly support business objectives is not only necessary — it’s how you measure whether your AI investment is working. Spending to produce content no one will ever use is just a more expensive version of a problem that has existed in every organization long before AI arrived.

What This Means for Your AI Budget Right Now

A few practical takeaways for leaders:

• Create low-cost environments for learning. Before your teams spend production tokens, give them a sandboxed space to experiment. Whether that’s local open-source models, a capped API environment, or structured internal pilots, people need to burn tokens cheaply before they burn them well.

• Audit your prompts the way you audit software licenses. Do you know what your teams are actually prompting? What use cases have emerged organically? You may be paying for capabilities you’re not using and missing workflows that are already delivering value.

• Set output standards before you set output volume goals. Define what “good” looks like for your use case before you measure how much you’re producing. Volume without a quality standard is just expensive noise.

• Treat token spend as a signal of strategic clarity. High, unfocused token usage isn’t an AI problem. It’s a leadership signal. It usually means your teams don’t have clear enough priorities to know what’s worth generating.

The Real Question

So what is your strategy for implementing AI? Getting everyone access is a necessary first step... but it is barely the start of the journey.

Talk to your leaders. Understand how their teams are actually using AI, what tools they’re building informally, and where current processes could be transformed rather than just accelerated. Encourage visibility into prompt usage to understand what problems your organization is really trying to solve.

And ask yourself honestly: have you started tracking what you’re spending per decision made with AI, not just per output? I am curious about what you’re seeing.

Written by Ben Tolen & edited by AI. Connect with me on LinkedIn.

Search This Blog

Eight Buffalo Media Group Blog