Uncategorized

Could State Space Models Kill Large Language Models? — The Information



The time it takes for a new technology to replace (or try to replace) an incumbent has consistently gotten shorter over time (just take a look at how long it took Google Docs to overthrow Microsoft Word versus the flurry of 15-minute grocery delivery startups that popped up following the success of companies like Instacart). That’s why nobody should bat an eye if the current it girl in AI, transformers, which power ChatGPT and its ilk, soon loses its luster.

Perhaps the most promising challenger to transformers is the state space model, which got a lot of buzz from researchers at NeurIPS last month. State space models, or SSMs, address a crucial weakness of transformer models: their staggering computational costs. 

Transformers, the building blocks of large language models, essentially guess the most-likely next word that should appear in a sentence. They do that via a mechanism dubbed “attention,” in which every single word in a sentence is compared to every other word in that sentence. That process allows the models to see the relationships between words, even if they’re not right next to each other. That’s great and all, but it also means that as the number of words increases, the number of computational steps grows quadratically instead of linearly. 

To put that in plain English: It gets expensive really, really fast.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *