9 Comments
User's avatar
Francesco Baruffi's avatar

Great article! Thanks for sharing!

Expand full comment
N of 1.'s avatar

Fascinating piece. Just one question: On what basis are you asserting IQ is commoditized? Recognizing we’re in a period of plateau, with what conviction can we say there’s no architectural breakthrough on the horizon?

Expand full comment
Ryan Cunningham's avatar

architectural breakthroughs are happening right now - autonomously discovered novel architectures (see Liu et al 2025 https://arxiv.org/abs/2507.18074), sparse computing, brain-inspired hardware, etc.. far be it from me to assert some Fukuyaman "End of History" take

"commoditization" is less about how something is made and more about how it's used. if breakthroughs are at all replicable, they will rapidly diffuse and raise baseline expectations for downstream consumers of that intelligence. open-source accelerates that considerably (https://www.machineyearning.io/p/deepseek-and-the-end-of-an-era)

if you showed GPT-3 to someone in 2012, they'd have said it was AGI (was still even the case in 2022 (https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/). now, we just kind of shrug at it

Expand full comment
N of 1.'s avatar

Ok. What I’m getting at is that “if” around “if breakthroughs are at all replicable.”

I guess you’re arguing since every model breakthrough thus far has been replicable, all future breakthroughs are likely the same. Fast copies shall reign.

Seems like a reasonable assumption. It’s just that a lot of the efficient optimization thesis hinges on it. It seems.

Expand full comment
Ryan Cunningham's avatar

more or less. models aren't magic, they're math, so much faster iteration loops for researchers to attempt replication (https://epoch.ai/blog/open-models-report).

advancements in novel silicon architectures, or proprietary hardware-software co-designs, would be less easily replicated.

Expand full comment
Alex Adamov's avatar

Appreciated the article. Thanks for writing it!

Expand full comment
Chris Jen's avatar

Great article.

Expand full comment
Kaiser Y Kuo's avatar

Was a pleasure having you on! You were terrific.

Expand full comment
Enon's avatar

Tokens/Joule is a classic "packer" metric. (As opposed to "mapper", from Alan Carter's *The Programmer's Stone*.) Quantity without quality. It makes a nod towards plausiblity by saying "at a given IQ" (meaning intelligence, though IQ is a measire of rarity of intelligence rather than intelligence), but LLMs have no intelligence since they have no model of the world or of any situation, it's 100% packer BS with no mapping or understanding at all, just bluff.

Back in 2005, when Peter Voss was recruiting on the SL4 list (precursor to Less Wrong) for an AI psychologist, I recommended Rasch psychometrics for measuring AI intelligence. Rasch metrics measure difficulty amd ability on the same scale. IRT and computer-adaptive testing are derived from Rasch metrics. (See my early posts for conversions to and from IQ at different ages, as well as grade-skip tables for gifted children in schools with different ages and class-average percentiles.) LLMs don't respond to questions as humans do, the probability of getting a given question right often does not follow the same logistic curve with increasing easiness or ability that it does for humans, the LLM is BSing all answers with no understanding, leading to the test giving falsely high intelligence scores to LLMs. When several tasks with less than 100% probability correct are chained, the LLM will not recognize an error in any step, and will be confidently unreliable in completing the chain of tasks.

The coffee amalogy in the early part of the article is flawed, "sewage" would be more accurate. It's worse than that, sewage could be fertilizer, AI tokens are toxic waste. They must be assessed carefully by a human with sufficient intelligence in order to be half-trusted, and the constant stream of BS pollutes the mind of the assessor.

Expand full comment