Briefing #38: AI is Entering the Goldilocks Zone
As AI work moves onto hardware you own, its cost becomes something you can steer.
“Can we just run it ourselves?”
I recently worked with a team that asked me this question point-blank. They were part of an organization that had gotten to a point where AI had moved from being an unknown threat to something their people had come to depend upon in the ordinary course of getting work done. Now, with growing unease, they had become aware that AI was becoming a real, rising cost.
Their unease is well-justified. Leaders are watching the economy with one eye and their budgets with the other, and nobody wants to be the person defending an open-ended expense when the call comes to tighten up. AI is increasingly that expense. It arrives as a line on the books that grows with use, and use is climbing.
What makes it hard is the question underneath the question. This team wasn’t really asking how to lower their AI bill. They were implicitly asking how they would ever show that the bill was worth paying. I have written before that the returns that matter most from AI are often the ones that don’t show up in a single quarter’s numbers. That’s still true, and it’s cold comfort to a leader being asked this month what they got for last month’s spend.
Hardware is catching up
For most of the past two years, serious AI meant the cloud. You sent your work to someone else’s enormous data center and paid by the token for the privilege. That arrangement made sense, and for the most demanding work it still does. But it carried an unspoken assumption underneath it, which is that all of the work has to go there.
That assumption is starting to break. At Computex this year, NVIDIA introduced a chip it calls RTX Spark, and Microsoft is building it into a new Surface Laptop Ultra, both arriving this fall. The specs matter less than what they signal: a laptop in this class can run capable AI models on the device itself, with no round trip to the cloud. Reporting puts these machines at premium-laptop prices, so they aren’t for everyone yet. But, what they tell us is the direction of travel. The hardware on desks and in bags is becoming good enough to do real AI work locally, and to do it reliably.
Certainly not all of it. A frontier reasoning task, or a sprawling multi-step agent, will still want the cloud for a while yet. But a growing share of everyday AI work, like the summarizing, drafting, classifying, and routine analysis that fills most people’s day, can increasingly run on hardware you own and control.
A leader’s job is to know which share that is.
An evolving question for AI leaders
This means there’s a new question AI leaders will need to ask and answer. Instead of asking about which model is best, the question will be “Where should my AI work go?”
It’s a question about routing rather than cost-per-token per se. Routing means choosing the right AI model for the right job, rather than the “best” model or the most expensive. When routing becomes a decision point, then routing becomes the thing that actually governs what AI costs you. And routing is a decision that’s more granular than choosing a single model for an entire workflow or application. Since modern AI workflows are stitched together from many steps, each step may use a different model based on the complexity of the step and how much compute is needed to get a good, reliable result.
Today, there are already workflows that can decide, step-by-step, what is best to run on device and what may get sent up to a frontier model in the cloud. As hardware becomes increasingly proficient at handling certain AI tasks, routing is poised to become a routine, common mechanic that all workflows will embody.
The mantra is evolving from “cloud for everything” to “run locally by default, and reach for the cloud only when the work demands it.”
The cost conversation changes with it. Instead of one undifferentiated meter ticking on everything you do, you get spend that can be attributed to specific steps in specific places. That’s the part that should interest any leader staring at an unexplained AI line item on the P&L.
While you might not be able to prove the value of a single opaque number, this evolution toward hybrid AI has a positive direct benefit to AI leaders. The moment cost is broken down by where the work runs and what it produces, you can finally ask an important question of each piece: is this worth it?
Granular cost is steerable cost, and steerable cost is the first thing you need before you can demonstrate a return at all.
Not too hot, not too cold
It helps to think about this the way Goldilocks did.
All-cloud is “too hot.” Everything runs in the most expensive place by default, the meter never stops, and you get a runaway bill that started this whole conversation about AI economics.
All-local is “too cold.” You cap your own capability, lock yourself out of the frontier work that genuinely needs the cloud, and over-invest in hardware to make a point.
Hybrid is the porridge in the middle. Run what you can on hardware you control, send up what truly needs the cloud, and route deliberately between the two. It’s the arrangement the technology is steadily making possible, and it’s the one that gives a leader the most control over both capability and cost.
It’s “just right”
You don’t need to buy anything this fall to act on this. Whether it’s RTX Spark or something else (there’s a whole community of AI builders who are already building makeshift AI clusters using daisy-chained Mac Minis for example), what’s needed is something that comes naturally to many leaders: planning ahead.
Start by asking your team which of your organization’s AI tasks really need the most powerful models, and which ones could run reliably on something your org could own instead. While you might not get a perfect answer, the act of sorting the work is what begins to make your AI spend legible. And legible spend is what you can manage, defend, and tie to results.
That becomes the basis of an investment plan, and one that you can build to flex as your business needs evolve. Map what runs well locally and what belongs in the cloud. Contemplate where you are today (100% cloud?) and where you ultimately must be because it creates the best cost efficiency for your needs (70% cloud, 30% local? Something else?). That mix will continue to evolve as hardware improves, but it gets you a plan of record today, and one that can be fine-tuned as new hardware and infrastructure solutions come to market. Rather than react to new announcements, you’ll be proactively planning for them.
The cost of AI has felt, for two years, like a meter you switched on and couldn’t turn down. That’s starting to change. The work is moving closer to home as the spending becomes something you can see and shape. AI economics are tilting back toward the people who pay the bills, slowly but surely.
That’s good news, and it’s worth planning around.
AI, Upfront publishes every Monday. If this was useful, subscribe to get it in your inbox. And if there’s a topic you’d like me to tackle, reply and let me know.



