Uncategorized

Towards Artificial General Intelligence Part 2: When a Million Experts Learn on the Job | by Justin “Red” Lebrun | Aug, 2024


For an intelligence, AI can be really dumb sometimes.

AI (smart and knowledgeable): the hippocampus is not only crucial for forming new memories, it also plays a significant role in spatial navigation and context retrieval.

Also AI (dumb as rocks): there are two “R”s in strawberry.

This little exchange highlights two of the biggest problems with AI: they’re not capable of learning on the spot, and even when they’re wrong (hallucinating), they’re damn confident about it (Very American 🇺🇸). Fortunately, we have some very clever organic intelligences working to make these artificial ones more capable, and they’re doing this not by making bigger models, but better model infrastructures.

Let’s dive into two recent AI innovations that are trying to address these issues: PEER and TTT, and why, eventually, they might become the best of friends.

Imagine having a library with millions of books, each written by an expert on a hyper-specific topic. Need to know about how sound would have echoed in the now-ruins of the Ancient Greek Parthenon? There’s a book for that. Want to understand how we can use cosmic rays to peek inside pyramids? You’re not gonna believe it, there’s a book for that too.

Not even kidding.

How does it work? PEER works by building up a large AI model from over a million of smaller “expert” modules (link to full paper). When faced with a task, the system uses a sophisticated routing mechanism to quickly identify and activate only the most relevant experts, rather than using the entire model. The PEER model builds on an AI architecture called “mixture of experts”, but expands it to “mixture of a million experts.”

And it’s pretty computationally light too- It’s like having millions of specialized consultants on call, but only paying for the ones you actually use.

So far so good… but there’s a catch. Like physical books, these AI “experts” are static. They’re incredibly knowledgeable, but they’re about as adaptable as a history textbook from 1995 that still lists Pluto as a planet (It’s not.)

To update them, you’d have to retrain the model — rebuild the entire library just to update a few books. As you can imagine, not cheap. If only the library could employ an editor to revise the books on the spot…

Well, maybe now it can. This is where a totally different approach to AI architecture comes in: Test Time Training (TTT) layers. (link to full paper)

Imagine you’re using a translation AI that initially struggles with medical terminology. With TTT, the AI could improve its medical translations on the fly as it encounters new terms, without needing to be taken offline for retraining.

How does it work? TTT layers incorporate a small, adaptive model within the main AI system. This adaptive model acts like a short-term memory, rapidly adjusting to new information. When the AI encounters input it can’t handle well, the adaptive model updates itself to better process this new information. Going back to our library analogy, imagine the library also employs a skilled editor who can write notes in the margins — add or change information as needed to always have the best possible source of truth at the ready.

The key innovation here is that this learning happens during the AI’s operation — or “test time” — rather than during a separate training phase. This allows the AI to continually refine its performance in real-time, much like how humans learn from each new experience.

Ok, so that’s amazing, why is it not the norm? Well, one of the downsides of TTT is that this constant adaptation still requires significant computational power. It’s as if every time the editor updates a book, they need to quickly skim through several related volumes to ensure the changes are consistent with the broader collection. This process, while not rewriting everything, still takes considerable time and effort.

So, what if we combined PEER and TTT?

We’d get an AI system with millions of tiny, specialized experts, each capable of learning and adapting in real-time. Every individual book in the library would now not only written by an expert, but also have their own built-in fact-checker and editor.

PEER: mixture of a million experts.
TTT: continuous learning.

PEER + TTT: Mixture of a million learning experts.

To understand how this might work, let’s look inward. More specifically, into your head. In your brain, you have specialized regions for tasks like facial recognition or language processing (experts). These regions aren’t static; they’re constantly forming new neural connections based on your experiences.

For instance, when you learn a new word, you’re not just memorizing it. Your brain is forming new connections between this word and related concepts, potentially altering how you understand and use language. This happens across millions of neurons, each specializing in tiny aspects of cognition, yet all working together and adapting in real-time.

A combined TTT-PEER system could work similarly. Imagine millions of tiny AI “experts,” each specializing in a narrow domain — perhaps one for each concept, idea, or application. When the AI encounters a new usage of an idea, the relevant “expert” could rapidly adapt, updating its understanding. This change would then influence how other related “experts” are used, creating a ripple effect of learning across the system.

Yeah, but… While promising, this combined approach faces significant hurdles. One major challenge is maintaining overall coherence when millions of tiny experts are learning and adapting independently. Researchers will need to develop sophisticated coordination mechanisms to ensure these experts work together effectively. And we need to make sure our AI doesn’t become too adaptable. We don’t want it learning “alternative facts” during an election season.

Not today, Satan.

But despite these hurdles, the combination of PEER and TTT represents an exciting new direction in AI research. We might be on the verge of creating AI systems that don’t just regurgitate information, but truly understand and learn from each interaction. Wait, is that… AGI?

…probably not. But definitely a step in the right direction.

At the very least, it would know that there are 3 “r”s in strawberry.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *