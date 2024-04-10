Fast Facts

We're all expecting Apple to formally unveil AI features for the iPhone, iOS, and other hardware and software at it's annual Worldwide Developers Conference in June.

We now have a better idea of what potential features might be arriving from a new research paper.

Yes, Apple (AAPL) is widely expected to showcase new software – and maybe some hardware infused with artificial intelligence features – at its annual Worldwide Developers Conference, which is scheduled to kick off on June 10.

Even as CEO Tim Cook has been teasing its potential – recently saying that there is “incredible breakthrough potential for generative AI” – Apple has spent the past few months quietly exploring the sector in different layers of depth.

Apple in February released a new image editing model that can tackle serious, complex photo edits in a matter of seconds. Better yet, you don’t necessarily need crazy editing experience as the model can understand and process natural language to complete the edit.

There have been reports, meanwhile, that Apple has been developing a Large Language Model (LLM) called Ajax. There have also been reports that Apple is in talks with Google to license the company’s own LLMs for use in IOS. While the results of these projects have yet to be seen, they each point to the major opportunity of AI and Apple: an LLM-enhanced Siri.

The Cupertino, Calif., company recently published a study that explores its efforts with a multi-modal LLM called Feret-UI that might be a step along the path to an AI-enabled Siri.

Unlike the LLM-powered chatbots – such as OpenAI’s ChatGPT – Apple’s Ferret-UI is designed to tackle a user interface like iOS on an iPhone, according to the study. Apple writes: “In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding and reasoning capabilities.”

To date, there has been no scientific evidence that LLMs are capable of reasoning.

If Ferret-UI gets integrated with Siri – which isn’t guaranteed – it could offer more minute and advanced controls over interface elements, potentially supercharging features like VoiceOver. It could also unlock Siri’s capabilities by controlling actions and reading data from even more applications.

Figure 1 in Ferret-UI- Grounded Mobile UI Understanding with Multimodal LLMs

Ferret works by magnifying user interface elements so that readers can read and understand what’s presented more accurately. App icons might seem easy for us to read and we can make out the nuances, but here, Ferret-UI magnifies and computes essentially to get a read. It could be an icon, a widget, or even text within an application.

While the research paper doesn’t show what a true release of Ferret-UI might look like, it does show where the advances are made and what they could be used for.

We don’t know exactly how or even if this will be used, but it has me thinking of how this could impact the iPhone user experience. If paired with Siri, an iPhone owner could ask for app-specific requests that involve a bevy of steps, it would streamline that process.

If it goes that route, it could transform how we interact with our iPhones and smartphones. It would be even neater if it could also be processed locally and securely on-device, making this feature sit nicely alongside other staples of iOS and the iPhone already.

Time will tell, but considering Tim Cook’s comments about AI, this feels like a breakthrough that Apple could integrate within the iPhone, iOS, and its broader software, services, and hardware. You can see the full paper, titled “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs,” here.

