> "3. RLHF: first proposed (to my knowledge) in the InstructGPT paper from OpenAI in 2022"
Deep reinforcement learning from human preferences by Christiano et al. (2017) is the foundational paper on RLHF. Link: https://arxiv.org/abs/1706.03741
Interesting perspective, and I do like the bigger question you are asking: what ended up mattering the most for the success of LLMs? Some quick thoughts and questions:
- I do think building GPT-3-like system was certainly feasible in the 90s *if* we had the computing capacity back then (Gwern has a nice historical exposition on Moravec's predictions which I recommend: https://gwern.net/scaling-hypothesis)
- I am not unsure convinced that just unlocking YT data would be the next big thing (for AGI, and I know you don't like AGI talk ... sorry). There is some evidence that suggests that the models are still not generalizing, but instead, defaulting to bag-of-heuristics and other poor learning strategies (https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1). Assuming this is true, I would expect that a YT-data-trained-LLM will appear much smarter, crush the benchmarks, have a better understanding of the world, but may not be transformative. Massively uncertain about this point, though.
- "perhaps we would’ve settled with LSTMs or SSMs" — are there any examples of LSTM-driven language models that are comparable to Transformer-based LLMs?
- Relatedly, I think the importance of adaptive optimizers is being under-emphasized here. Without Adam, wouldn't LLM training be >2x more expensive and time-taking?
I loved this. Yes. Exactly. People talk about AI like it’s inventing things — but this cuts right through that. Most of what we call “generation” is really just recombination, powered by increasingly structured inputs — from us. The breakthroughs weren’t big ideas; they were new ways to learn from new kinds of data.
That’s what makes this piece so sharp: it’s not dismissive of research, just honest about where progress actually comes from. Not magic. Not models. Infrastructure. Access. The moment a new dataset becomes legible at scale, everything shifts — and we call it innovation.
And it’s not just AI. In product too, the surface gets all the credit, but the real leverage sits underneath — in what’s visible, counted, or quietly baked into the defaults.
Interesting observation, Jack! The next big thing is a direct democratic simulated multiverse, you make a digital backup of Earth to have a vanilla virtual Earth, spin up a digital version with magic, with public teleportation and BOOM! You have a simple simulated multiverse with 2 Earths!
By the way, we recently wrote how to align AIs, spent 3 years modeling ultimate futures and found the safest way to the best one
Adding language ability should be the last step in the process, but the AI industry is focused on product development, not core research. The real solution has to come from outside of industry.
First, a minor correction:
> "3. RLHF: first proposed (to my knowledge) in the InstructGPT paper from OpenAI in 2022"
Deep reinforcement learning from human preferences by Christiano et al. (2017) is the foundational paper on RLHF. Link: https://arxiv.org/abs/1706.03741
Interesting perspective, and I do like the bigger question you are asking: what ended up mattering the most for the success of LLMs? Some quick thoughts and questions:
- I do think building GPT-3-like system was certainly feasible in the 90s *if* we had the computing capacity back then (Gwern has a nice historical exposition on Moravec's predictions which I recommend: https://gwern.net/scaling-hypothesis)
- I am not unsure convinced that just unlocking YT data would be the next big thing (for AGI, and I know you don't like AGI talk ... sorry). There is some evidence that suggests that the models are still not generalizing, but instead, defaulting to bag-of-heuristics and other poor learning strategies (https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1). Assuming this is true, I would expect that a YT-data-trained-LLM will appear much smarter, crush the benchmarks, have a better understanding of the world, but may not be transformative. Massively uncertain about this point, though.
- "perhaps we would’ve settled with LSTMs or SSMs" — are there any examples of LSTM-driven language models that are comparable to Transformer-based LLMs?
- Relatedly, I think the importance of adaptive optimizers is being under-emphasized here. Without Adam, wouldn't LLM training be >2x more expensive and time-taking?
I loved this. Yes. Exactly. People talk about AI like it’s inventing things — but this cuts right through that. Most of what we call “generation” is really just recombination, powered by increasingly structured inputs — from us. The breakthroughs weren’t big ideas; they were new ways to learn from new kinds of data.
That’s what makes this piece so sharp: it’s not dismissive of research, just honest about where progress actually comes from. Not magic. Not models. Infrastructure. Access. The moment a new dataset becomes legible at scale, everything shifts — and we call it innovation.
And it’s not just AI. In product too, the surface gets all the credit, but the real leverage sits underneath — in what’s visible, counted, or quietly baked into the defaults.
Interesting observation, Jack! The next big thing is a direct democratic simulated multiverse, you make a digital backup of Earth to have a vanilla virtual Earth, spin up a digital version with magic, with public teleportation and BOOM! You have a simple simulated multiverse with 2 Earths!
By the way, we recently wrote how to align AIs, spent 3 years modeling ultimate futures and found the safest way to the best one
Adding language ability should be the last step in the process, but the AI industry is focused on product development, not core research. The real solution has to come from outside of industry.
You’re spot on! We’ve done exactly that for AI alignment - combined everything to solve it
Maybe YouTube and robotics shouldn't be separated. If we can learn complex motions by just watching videos, why can't AI.