Field Notes
Every time a new AI model drops, the internet turns into a hallway conversation.
Are you on the newest Opus yet?
Did you see the coding score?
Is this the smartest model now?
I get why people ask. The demos are impressive. The leaderboards move fast. Nobody wants to be the person using last year's brain in this year's race.
But I think a lot of people are asking the wrong question.
The question is not "did the model get smarter?"
The question is: smarter at what?
That distinction matters more than most people realize.
In his Sequoia Ascent 2026 writeup, Andrej Karpathy talks about "jagged intelligence." His point is that model capability does not rise evenly in every direction. It spikes where the task is verifiable, heavily trained, data-rich, and economically valuable.
Coding is the easy example.
Code can be tested. It runs or it fails. The benchmark passes or it does not. That gives labs a clean feedback loop, so coding can improve fast.
Chess is another example he uses. A model can suddenly look much better at chess, but that does not mean intelligence smoothly rose everywhere. It may mean chess got more training attention.
That is the part I keep coming back to.
AI models are not all getting generally better at everything at the same speed.
They are getting better in pockets.
Some pockets are exploding. Some are barely moving. Some are still weirdly bad in ways that make you laugh and then immediately check your bank account because you realize you almost paid premium pricing for the wrong job.
That is why "use the best model" is too vague.
Best for coding may not mean best for writing, search, extraction, pricing, or a decision that touches money, strategy, legal exposure, brand, or a customer relationship.
This is the next layer after Trust But Verify. Last time, the point was discipline: go back to the source, check the receipts, do not trust a clean summary just because it sounds right.
Now the question is: what intelligence are you verifying?
That is what I mean by specified intelligence.
Specified intelligence is matching the model, tools, context, data, and review gate to the job.
It is not worshipping the largest model.
It is not marrying Claude.
It is not using one chat window like it is the entire operating system.
I have been using a simple framework when I teach this:
Brain. Body. Context. Prompt.
The brain is the model.
The body is what lets it act.
The context is what it can see and remember.
The prompt is the instruction pack.
Most people blame the brain when the problem is somewhere else. Or they overpay for a bigger brain when the job did not need one.
That is like needing to tighten one screw on a desk drawer and deciding you need to buy a $300 power drill.
Could the drill tighten the screw?
Yes.
Do you need that much power and complexity for that job?
No.
A $10 screwdriver gets the same outcome faster, cheaper, and with less drama.
That is the AI cost problem in plain English.
Using Claude for everything is fine when you are learning. Get your reps. Build the habit. Learn what AI feels like.
But once you start using AI as part of real work, the economics change.
If the new model mostly got better at coding and you are not coding, you may not need it for your daily work.
If Sonnet already handles the task cleanly, Opus may be overkill.
If a non-Anthropic model does the specific job faster or cheaper, loyalty becomes expensive.
Do not confuse price with fit.
Playbook
Here is the practical version.
Do not start with the model.
Start with one recurring job.
What needs a reply? What changed since the last meeting? What price should this date be? Is this file ready to send?
Then ask five questions.
First, what answer do I actually need?
Second, what data does that answer depend on?
Third, can the result be checked?
Fourth, what happens if the model is wrong?
Fifth, which brain is good enough for this job?
That last phrase matters: good enough for this job.
You do not need a premium reasoning model to rename files, extract dates, classify emails, or summarize a call into a known template.
You probably do want stronger reasoning when the output affects money, a customer, a legal position, a strategic decision, or your public reputation.
Businesses already understand this.
You do not ask the copywriter to do the tax return. You do not ask the accountant to write the launch campaign. You do not ask the intern to approve the wire transfer.
Different departments exist because different work needs different judgment.
AI should be routed the same way.
One model drafts. Another checks. A cheaper model handles bulk processing. A stronger model reviews the final decision. A tool-connected agent does the job only after the rules, data, and permission boundaries are clear.
That is how you get more AI for less money.
Not by using less of it.
By using the right kind in the right place.
Use Artificial Analysis to compare models by quality, speed, latency, price, and context window. Use OpenRouter rankings to see available models and routing patterns across providers.
Do not treat either site like the answer.
Treat them like a map.
Then test the model on your actual workflow.
If the job is SEO, test SEO work. If the job is sales research, test sales research. If the job is pricing, test pricing rules against real calendar data.
The leaderboard tells you where to look.
Your workflow tells you what works.
Orientation
This connects back to Don't Marry Claude, The Model Doesn't Matter, and Trust But Verify.
Claude is not the problem. One-tool dependency is.
The newest model is not the problem. Newest-model reflex is.
The mistake is assuming the intelligence improved in the exact direction your work needed.
Sometimes it did.
Sometimes it did not.
The operator's job is to know the difference.
That is the whole game with specified intelligence.
Name the job. Name the source. Name the risk. Name the review gate. Then pick the brain.
Next time: Emotion Is the Moat. Once everybody can build faster, the question becomes more human: how do you make people feel in a world where execution gets easier?
For now, pick one recurring task and ask:
Am I buying the power drill when this job only needs a screwdriver?
Comment below and tell me where you are still using one AI model for everything.
I read every one.
— Brian