Apple has long been rumored to have plans for a big AI push into and beyond 2024, and now new research might go a long way to making that a reality all while being able to maintain Apple’s demand for security and privacy.
To date, large language models (LLMs) like those on which ChatGPT is based have been powered by computers housed in data centers, and accessed via a webpage or an iPhone app. They’re huge pieces of software that require equally huge amounts of resources to work properly, making it problematic to try and make them run locally on phones like the upcoming iPhone 16. But by having LLMs run in data centers there is a concern about privacy to consider, and with Apple already working to keep as many Siri requests on-device as possible, it’s no surprise that Apple might want to do the same with any LLM implementation it is working on.
Now, a research paper might have the answer and it could open the door to Apple’s in-house Apple GPT making a debut outside of Apple Park. But if Siri really is going to get a big upgrade, could the 2024 iPhones come too soon?
On-device processing
The research paper, titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory,” is authored by multiple Apple engineers and discusses how an LLM could be used on devices with limited RAM (or DRAM), like iPhones. The paper would also be useful for bringing Siri upgrades to similar RAM-constrained devices like low-end MacBooks and the iPad, not to mention the Apple Watch.
“Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks,” the paper begins. “However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM.”
Flash storage, or the storage you choose when buying your iPhone, is much more plentiful and can be carved out for storing the LLM data. The paper discusses different ways of using a device’s flash storage in place of DRAM. There are two main ways discussed including “windowing” and “row-column bundling.”
The paper explains that “these methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.”
Obvious benefits
The benefits of such an approach are obvious. Not only would storing an LLM on an iPhone be beneficial in terms of removing the need to store it in a remote data center, and improving privacy, but it would also be much faster. Removing the latency created by poor data connections is one thing, but the speed increase goes beyond that and could make Siri respond more accurately and more quickly than ever before.
Apple is already rumored to be working on bringing improved microphones to the iPhone 16 lineup, likely in an attempt to ensure Siri hears what people ask of it more clearly. Couple that with the potential for an LLM breakthrough and the 2024 iPhones could have some serious AI chops.