How Apple Plans to Improve Its AI Models While Protecting User Privacy
Even the most loyal Apple users will admit that the company is lagging behind when it comes to AI, with Siri's big Apple Intelligence upgrade now officially delayed (having been heavily promoted throughout last year). In a new blog post, Apple outlines some of the ways it's hoping to get back on track.One of the potential reasons for Apple's generative AI struggles may be that it prioritizes user privacy a lot more than the likes of OpenAI and Google. Apple doesn't gather any user data to train its large language models or LLMs (though it has trained its models on free text on the web), and relies heavily on synthetic data to produce AI text from prompts and from existing writing.The problem with synthetic data is, well, its artificiality. It lacks the nuance and variation of human writing as it changes over time, and without any text written by actual people for comparison, it's difficult to assess the quality of what the AI is outputting. As mentioned in the blog post, Apple is now planning improvements to text generation. In basic terms, the idea is that AI-generated, synthetic text will be compared to a selection of actual writing from users, stored on Apple devices—but with several layers of protection in place to prevent individual users from being identified, or any personal correspondence being sent to Apple. The approach essentially grades synthetic text by comparing it against real writing samples, but only the aggregated grades get back to Apple.What's actually happening doesn't involve actual words or sentences at all, in fact: Both the synthetic text and human writing get converted into "embeddings," which are essentially mathematical representations of the text. There's enough data to rank the quality of the AI text, without getting to the level of doing any real reading.All of this information is encrypted as it's transferred, and comparisons are only made on devices where users have opted into Device Analytics on their gadgets (the option can be found in Privacy & Security > Analytics & Improvements in Settings on iOS, for example). Apple never knows which AI text sample was picked by an individual device, only which samples have better rankings from all the devices pinged.Genmoji and other tools
Apple will test its AI outputs against user data without looking too closely.
Credit: Apple
This anonymized grading system can be used to improve text made or rewritten by its generative AI models, Apple says, and should also mean more accurate, more intelligent summaries of text as well. Outputs that rank the highest could be tweaked with a different word or two, before being sent back for another round of assessments.A simpler version of the same approach is already being used by Apple to power its Genmoji AI feature, where you can magic up an octopus on a surfboard or a cowboy wearing headphones. Apple aggregates data from multiple devices to see which prompts are proving popular, while applying safeguards to ensure unique, individual requests aren't seen or tied to specific users or devices.Again, this only happens on iPhones, iPads, and Macs that have been opted into Device Analytics. By getting devices to report "noisy" signals without any specific user information in them, Apple can improve its AI models based on aggregate data and users don't have to worry about their Genmoji prompts being discovered.Similar techniques will soon be used in other Apple Intelligence features, Apple says. Those features will include Image Playground, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence, which have all been among the first Apple AI capabilities to actually make it out to devices.According to Bloomberg, the new and improved systems will be tested in upcoming beta releases of iOS 18.5, iPadOS 18.5, and macOS 15.5. We may well hear more about them, and about Apple Intelligence in general, at this year's Apple Worldwide Developer Conference, which is scheduled to get underway on Monday, June 9.Meanwhile, Apple's rivals in the AI space aren't showing any signs of slowing down—and have fewer scruples about using text written by their users to train their AI models further. In recent days we've seen Microsoft push out a range of updates for Copilot (including Copilot Vision and file search), Google add video generation to Gemini, and OpenAI upgrade the memory capabilities of ChatGPT.