Time to leave the library

Time to leave the library

Continuing the learning experience theme of my post about Genie 3, I enjoyed this conversation “Is human data enough? With David Silver” on YouTube. Professor Hannah Fry interviewed Silver, who works at Google DeepMind and specialises in reinforcement learning.

The problem of AI companies running out of human-generated text data is already well discussed (and still pretty wild if you stop to think about it). Silver pushed it further: not only is the data running out, but learning from what humans have written — and even from what humans say they prefer (RLHF) — is limiting. It keeps AI stuck remixing our thoughts and ideas.

He should know. AlphaGo was trained on human expert games plus self-play; it worked fantastically well (beating the world-champion!), but it was leaning heavily on us. AlphaGo Zero, by contrast, skipped the human data entirely and just learned by playing itself. Within days it wasn’t just better, it was alien — easily defeating all previous versions of AlphaGo.

That’s the leap Silver (and co-author) Sutton call the Era of Experience: agents that learn not from static datasets or our thumbs-up feedback, but from their own continuous interactions with the world. Think less “parrot” and more “curious kid in a playground” — breaking things, trying weird stuff, learning lessons and understanding things we didn’t teach them.

The upside? Genuine discovery, innovation and long-term learning. The downside? It’s a lot harder to align a system that’s not just copying homework. The future isn’t about endlessly fine-tuning what humans already know - it’s about AI getting out of the library, falling over, and figuring out for itself how to stand up again.