Uncategorized

Generative AI on iPhones on step closer thanks to Apple researchers


Apple is working to bring on-device generative AI features to the iPhone. (Image: Notebookcheck)
Apple is working to bring on-device generative AI features to the iPhone. (Image: Notebookcheck)

Rumors are pointing to Apple being introducing a generative AI version of Siri set to debut with iOS 18 on the next-generation iPhone 16 series due in late 2024. The company’s researchers have just detailed one way in which an iPhone will be able to overcome RAM limitations to successfully run a sophisticated LLM on-device.

Apple researchers have documented (pdf) a new method for allowing Large Language Models (LLMs), to run on-device with a unique method to overcome RAM limitations on mobile devices. The full version of an LLMs like Open AI’s ChatGPT 4 has around 1.7 trillion parameters and requires powerful servers to be able to handle the processing. However, Google’s new Gemini AI – which it claims can beat GPT-4 – comes in a ‘Nano’ flavor for smartphones and uses quantization techniques to cut down the model to either 1.8 billion parameters or 3.6 billion parameters. One of these variants of Gemini Nano is currently running on Google’s Pixel 8 Pro smartphones (curr. reduced to $799 from Amazon – normally $999). 

Qualcomm claims that it’s new Snapdragon 8 Gen 3 SoC can support generative AI LLMs up to 10 billion parameters in size – while considerably more capable than what Google is able to get working on the Pixel 8 series, this is still a far cry from the 1.7 trillion parameters required to make GPT-4 function as impressively as it does. Quantization, which makes the LLMs easier for mobile SoCs to process, also means that they lose accuracy and effectiveness. As such, anything that can help increase the size of the models that can be shoehorned onto a mobile device, the better the performance of the LLM.

In order for smartphones to be able to handle gen AU on-device tasks, RAM requirements are also considerable. An LLM reduced to 8-bits per parameter model with 7 billion parameters (like Meta’s Llama 2 which is supported by the Snapdragon 8 Gen 3), would require a smartphone with at least 7GB of RAM. The iPhone 15 Pro series features 8GB of RAM, so this suggests that an Apple-developed LLM like Llama 2 would be at the upper limit of what the current iPhone’s would support. Apple’s researchers have found a way around this onboard RAM limit. 

In a research paper titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory,” Apple’s generative AI researchers have developed a method of utilizing an iPhone’s flash storage to supplement a device’s onboard system RAM. Flash storage bandwidth is not in the same league as LDDR5/X mobile RAM, however Apple’s researchers have developed a method that overcomes this inherent limitation. By using a combination of “windowing” (where the AI model reuses some of the data stored on the flash storage that it has already processed) and “row-column bundling” (which groups data from the LLM in a way that is more efficiently processed, speeding up the read speed). 

Of course, we are yet to see an LLM from Apple, although rumors suggest that we could see one a smarter version of Siri based on an LLM that is set to debut as part of iOS 18 and be able to run on-device on the next-generation iPhone 16 Pro models. But when we do, it seems there will be a good chance that Apple will utilize this method of RAM extension to ensure it delivers an LLM model with as many parameters as possible that it can effectively run on-device. With Samsung upping its generative AI game for the launch of the Galaxy S24 series next month, 2024 is shaping up as the year generative AI becomes commonplace on smartphones too.

Please share our article, every link counts!





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *