There Are No New Ideas in AI… Only New…

Jack Morris

Apr 9

249

LLMs were invented in four major developments... all of which were datasets

Read →

22 Comments

akash

Apr 9

First, a minor correction:

> "3. RLHF: first proposed (to my knowledge) in the InstructGPT paper from OpenAI in 2022"

Deep reinforcement learning from human preferences by Christiano et al. (2017) is the foundational paper on RLHF. Link: https://arxiv.org/abs/1706.03741

Interesting perspective, and I do like the bigger question you are asking: what ended up mattering the most for the success of LLMs? Some quick thoughts and questions:

- I do think building GPT-3-like system was certainly feasible in the 90s *if* we had the computing capacity back then (Gwern has a nice historical exposition on Moravec's predictions which I recommend: https://gwern.net/scaling-hypothesis)

- I am not unsure convinced that just unlocking YT data would be the next big thing (for AGI, and I know you don't like AGI talk ... sorry). There is some evidence that suggests that the models are still not generalizing, but instead, defaulting to bag-of-heuristics and other poor learning strategies (https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1). Assuming this is true, I would expect that a YT-data-trained-LLM will appear much smarter, crush the benchmarks, have a better understanding of the world, but may not be transformative. Massively uncertain about this point, though.

- "perhaps we would’ve settled with LSTMs or SSMs" — are there any examples of LSTM-driven language models that are comparable to Transformer-based LLMs?

- Relatedly, I think the importance of adaptive optimizers is being under-emphasized here. Without Adam, wouldn't LLM training be >2x more expensive and time-taking?

Expand full comment

Leo Benaharon

Apr 9

Maybe YouTube and robotics shouldn't be separated. If we can learn complex motions by just watching videos, why can't AI.

Expand full comment

Julie By Default

Apr 13

I loved this. Yes. Exactly. People talk about AI like it’s inventing things — but this cuts right through that. Most of what we call “generation” is really just recombination, powered by increasingly structured inputs — from us. The breakthroughs weren’t big ideas; they were new ways to learn from new kinds of data.

That’s what makes this piece so sharp: it’s not dismissive of research, just honest about where progress actually comes from. Not magic. Not models. Infrastructure. Access. The moment a new dataset becomes legible at scale, everything shifts — and we call it innovation.

And it’s not just AI. In product too, the surface gets all the credit, but the real leverage sits underneath — in what’s visible, counted, or quietly baked into the defaults.

Expand full comment

DotProduct

Jul 1

The difference between language and other data (eg video) is that language has massive compression. It’s what we evolved to deal with social and other world interactions with our limited compute. Perhaps LLMs can do what they do because they are using our distilled version of the world. Hence, rather than video I might turn my attention to novel languages eg between plants and animals. Including extra human sensory: sights and sounds outside of normal human range. Whalesong anyone?

Expand full comment

attiq rahman

Jul 1

We need to explore new methods for the new data that we have. With old methods, we cannot explore new data sources like sensors and YouTube.

Expand full comment

Melon Usk - e/uto

Apr 13

Interesting observation, Jack! The next big thing is a direct democratic simulated multiverse, you make a digital backup of Earth to have a vanilla virtual Earth, spin up a digital version with magic, with public teleportation and BOOM! You have a simple simulated multiverse with 2 Earths!

By the way, we recently wrote how to align AIs, spent 3 years modeling ultimate futures and found the safest way to the best one

Expand full comment

Steven Marlow

Apr 11

Adding language ability should be the last step in the process, but the AI industry is focused on product development, not core research. The real solution has to come from outside of industry.

Expand full comment

Reply (1)

Melon Usk - e/uto

Apr 13

You’re spot on! We’ve done exactly that for AI alignment - combined everything to solve it

Expand full comment

Daniel

This is such a refreshing take! You're absolutely right—we keep chasing shiny new architectures when the real breakthroughs have always been about unlocking new data sources. The YouTube angle is fascinating. Google sitting on that treasure trove while we debate which optimizer to use...

Expand full comment

The Human Playbook

Yes there is a reason why this is happening … I think you touch on a few things that resonated deeply with me and my writing here: https://thehumanplaybook.substack.com/p/the-prompt-world

Expand full comment

Ashutosh

Jul 6

What I inferred is Jack is talking more about the what can be achieved, rather about the efficiency of it. Of course adaptive optimizers makes training efficient, but without it also you could achieve what you could although in a lesser efficient way.

Expand full comment

Elton Villanueva

Jul 2

IDK. Respectfully, Am I Cheating to Win? https://open.substack.com/pub/eltonvillanueva/p/am-i-cheating-to-win?r=2q2pzu&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Expand full comment

Hayden

Jul 1

What about dytanh? And the new reasoning paradigm? Seems pretty clickbaity…

Expand full comment

Reply (1)

Jack Morris

Jul 1

Half of this article is about the “new reasoning paradigm”.

Expand full comment

Reply (1)

Hayden

Jul 1Edited

Well yes but the title is clickbait, obviously it’s a new idea - I guess maybe you’re saying it’s all synthetic dataset generation? But I still think the dynamic tan h is almost like MetaLearning, encoding something about training dynamic, encoding learning rate in network, unified, is still a new idea..

Expand full comment

Reply (1)

Jack Morris

Jul 1

Dynamic Tanh is a minor modification to the way normalization works inside Transformers which was shown to make vision models slightly easier to train.

The best-case outcome is that models are easier to train. My article is about ideas that make models smarter.

Expand full comment

Reply (1)

Hayden

Jul 1

Well I was considering dytanh as a way to change the way the model is differentiable. I always thought of activation as what makes it differentiable, so then changing the way it makes it differentiable could be considered very interesting idea and "metalearning" or reaching up or down into training dynamic, making learning "fractal". Also it's tangent, then tangent slope, the derivative, etc. The model being able to augment the way it trains itself would be an actualization of metalearning.

Expand full comment

Dutton Industrial

Jun 17

I propose we start utilizing reality as our dataset. That should be sufficiently large and novel.

Expand full comment

SVIC Podcast

May 30

Solid post. Ty Zack!

Expand full comment

Ali Panahi

May 6

METR plot is not about how long the ai agent can do autonomous task, but how long it takes for a human to do that task. These two are very different as AI agents are much faster than humans depending on the task and environment.

Expand full comment

Teo

May 2

What kind of places do you mean by “lots of other places that we’ll never get to learn about”? Government labs? Clandestine research groups? Forgive my naivety

Expand full comment

Reply (1)

Derek

Jul 1

private research labs that only do closed-source releases such as OpenAI, Anthropic, Google Deepmind nowadays.

Basically all western frontier labs as of 2025.

Expand full comment

Token for Token

There Are No New Ideas in AI… Only New…