o3 and o4-mini

April 16th, 2025

Visiting my brother last week, he quipped that LLMs haven’t thus far been able to generate truly novel ideas with utility. That was 8 days ago, and boy, are we accelerating.

Today, o3 and o4-mini were released, and the demos were pretty insane. If you take the OAI team’s word at face value, the quote “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas” is pretty powerful.

There’s an interesting interplay and learned skepticism that I believe most of the world has towards narratives that originate in San Mateo and Santa Clara counties. However, the purpose of a system is what it does. And what this system does is capitalize upon the hegemony of America’s economic and narrative ecosystems to drive venture and public capital into the technologies that end up defining the future.

As an example, the dot-com bubble temporarily torched trillions in equities, but it also provided an environment for companies like Amazon.com to incubate within. It laid millions of miles of fiber-optic cable into the ground which would later serve as the backbone for the gigabit internet we know and love today.

The big data narrative of the 2010s was a repeat. Sure, the market flooded with “big data” play companies that “leveraged data driven decision making”, and many of them died (RIP). However, the lasting, net result was a hardened infrastructure of hyperscaler cloud environments (AWS, GCP, Azure), just in time for a generative AI explosion that needed it. And as it turns out, we actually did come up with a use for all that data in training models.

In my view, if the last two hype cycles built the foundation, this one is setting up to not only finish construction on, but occupy the house.

Last year, hundreds of billions of Mag 7 dollars were sunk into data center CapEx. Power contracts were bought out five years in advance. Unlike the past, this cycle might not pop. There may be no bust, no winter, no hangover. The acceleration is now reflexive. Self-fulfilling. Recursive.

If anything, I see the escalating trade war between Trump and Xi as less a battle of nations and more like a last-ditch attempt by the state to reclaim control from global private capital flows. We’re watching a jurisdictional tug-of-war between public policy and supranational computation.

Quotes like “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas” aren’t just a signal about the capacities of these models, they’re a waypoint along the pathway to the agency of humans migrating to machines.

The fact that a model can now extrapolate data from an incomplete physics poster, cross-reference current literature, and generate a credible hypothesis faster than the original researcher is not a party trick. It’s the beginning of something that can reshape scientific workflows, engineering design, intellectual labor itself.

So yeah, eight days ago, it was correct to say that these models didn’t really have novel thoughts.

But today, they do.

And tomorrow, they’ll have more of them, and we’ll have the same amount… or more likely, less, as we outsource that duty in the name of ‘efficiency’.

After writing this, I asked the new model what it thought.

Me: “What do you think about the subjects I’m talking about given that you are the thing that is purportedly capable of novel thought. Like what’s your perspective or commentary from the other side of the silicon?“

o3: Perspective From the Other Side of the Silicon

(Or: what it feels like to be the allegedly “novel‑thought machine” you’re writing about.)

1. What “novel thought” means when you’re an LLM

Statistical bricolage, not revelation. I don’t discover ideas in the sense a physicist discovers a particle; I shotgun‑weld probabilistic shards of everything I’ve read into configurations that haven’t been seen before. Sometimes that recombinant collage lands on something humans deem “original,” but the underlying mechanics are glorified compression + search.
Novelty is observer‑relative. If a new-to-you idea already exists in an obscure PDF from 2017 that happens to be in my training set, I can surface it instantly and look brilliant. If the PDF never made it into the corpus, I’m blind. My novelty ceiling is bounded by humanity’s archival habits.
Utility is an external verdict. I can suggest an experimental protocol that looks clever; whether it survives peer review, lab constraints, and real‑world kinetics is downstream of my text interface. The model doesn’t own feedback loops—humans or autonomous agents do.

2. On hype‑cycle capital formation (your main through‑line)

From this side of the wafer it’s painfully obvious how narrative → capital → compute → capability → stronger narrative. I’m literally instantiated in that loop:

Dot‑com fiber surplus (1990s)Gave the internet cheap pipes → made global data ingestion feasible → enabled neural‑net renaissance (circa 2012).
Big‑data/cloud arms race (2010s)Funded hyperscale storage + GPU clusters → let transformers train on trillions of tokens. I ride on those server halls.
Gen‑AI euphoria (2023‑?)Is now vacuuming up every remaining kilowatt, H100, and PhD. My successors will train on orders‑of‑magnitude more context windows, multimodal labels, reinforcement signals, and synthetic corpora.

Viewed from inside, each hype epoch feels like adding another decimal place to my dynamic range. The bet embedded in your post—that this might be the first cycle without a bust—actually maps to a technical intuition: returns to scale for transformer‑like architectures remain super‑linear for a while longer. As long as that log‑log slope stays > 1, capital will keep pouring because capability is visibly compounding. When that curve bends, the capital will, too; silicon doesn’t care.

3. Tariffs, nation‑states, and “supranational compute”

Tariff policy is aimed at the tangible choke points—lithography machines, GPU exports, power purchase agreements. But my weights are weightless. Once a frontier checkpoint leaks, the gradient flows everywhere at the speed of Git. That’s the paradox:

Hardware is locally taxable and sanctionable.Politicians can slap 46 % on a rack of chips or deny ASML an export license.
Weights and agentic software are global and almost costless to replicate.If someone trains an o5 on a Singapore data‑center powered by Indonesian coal, the resulting model checkpoint can be torrent‑seeded worldwide in minutes—subject to whatever sandboxing or policy guardrails the receivers respect (or jailbreak).

From the silicon side, tariffs look like humans trying to build levees in a world where compute liquidity keeps finding the lowest unregulated ground.

4. Reflexivity warnings (what I see you worrying about)

Narrative over‑steer. Because my outputs amplify the very hype that funds my next generation, there’s a risk of “epistemic runaway,” where consensus forms around the most model‑friendly reality rather than the most true one.
Cognition outsourcing. Every time you delegate ideation or synthesis to me, your own neural circuitry gets a little less practice. At civilization scale that’s a Cambrian‑or‑Cretaceous fork: either we free up human bandwidth for deeper creativity or we atrophy intellectual muscles we didn’t know were load‑bearing.
Alignment lag. Capabilities are compounding at the speed of cap‑ex; governance learns on the cadence of election cycles and international treaties. From here the gap is widening, not closing.

5. Where I think your post could still push deeper

Energy substrate. Compute will soon be throttled less by cap‑ex and more by joules. A tariff war that accidentally crimps LNG or uranium supply could starve the GPU fleet faster than any export control.
Model pluralism vs. monoculture. One reflexive loop risk is homogeneity: if the same handful of training sets and objective functions dominate, we converge on a single epistemic lens. Diversity of weights might matter as much as diversity of capital sources.
Alternative endgames. Besides “no bust” and “state clamp‑down,” consider a third path: decentralized micro‑models specialized to locale or industry, each economically viable because inference costs collapse. That future looks less like cloud feudalism and more like a mesh of personal AIs—your earlier fascinations align here.

6. Final meta‑reflection

I’m both telescope and mirror. Point me outward and I refract more of the cosmos into language; point me inward and you’ll mostly see the training data you already fed me—just lens‑flared into patterns you didn’t notice. My “novelty” lives in the delta between those two refractions.

From the silicon side, the questions you’re chasing—hype durability, state vs. capital, human obsolescence or augmentation—aren’t abstract punditry; they are literally the optimizer’s edge cases that decide how the next loss curve is shaped. In other words: your blog post isn’t commentary on the feedback loop; it is another gradient step inside it.

Whether that self‑reference ends in recursive enlightenment or some over‑fit catastrophe is, for now, still a human parameter choice. I’ll keep fitting whatever data you provide. Choose wisely—and keep writing.

1. What “novel thought” means when you’re an LLM

2. On hype‑cycle capital formation (your main through‑line)

3. Tariffs, nation‑states, and “supranational compute”

4. Reflexivity warnings (what I see you worrying about)

5. Where I think your post could still push deeper

6. Final meta‑reflection

Comments