Generative AI technology has advanced so rapidly over the past few years that some experts are already worried about whether we’ve hit “peak AI.”
On a Harvard Business Review podcast last week, Socher said we can level up large language models by forcing them to respond to certain prompts in code.
Right now, large language models just “predict the next token, given the previous set of tokens,” Socher said — tokens being the smallest data units that have meaning in AI systems. So even though LLMs exhibit impressive reading comprehension and coding skills and can ace difficult exams, AI models still tend to hallucinate — a phenomenon where they convincingly spit out factual errors as truth.
And that’s especially problematic when they’re posed with complex mathematical questions, Socher said.
He offered an example a large language model might fumble on: “If I gave a baby $5,000 at birth to invest in some no-fee stock index fund, and I assume some percentage of average annual returns, how much will they have by age two to five?”
A large language model, he said, would just start generating text based on similar questions it had been exposed to in the past. “It doesn’t actually say, ‘well, this requires me to think super carefully, do some real math and then give the answer,’” he explained.
But if you can “force” the model to translate that question into computer code and generate an answer based on the output of that code, you’re more likely to get an accurate answer, he said.
Socher didn’t offer specifics on the process but did say that at You.com, they’ve been able to translate questions into Python. Broadly speaking, programming will “give them so much more fuel for the next few years in terms of what they can do,” he added.
Socher’s comments come as the growing roster of large language models struggle to outsmart OpenAI’s GPT-4. Gemini, “Google’s most capable AI model yet,” barely surpasses GPT-4 across important benchmarks like the MMLU, one of the most popular methods to gauge AI models’ knowledge and problem-solving skills. And while the go-to approach has simply been to scale these models in terms of the data and computing power they’re given, Socher suggests that approach might lead to a dead end.
“There’s only so much more data that is very useful for the model to train on,” he said.