Multimodal AI In 2025: From Healthcare To eCommerce And Beyond

Blogs

Marketplace

Discover Marketplace

Groups

Discover Groups My Groups

Pages

Discover Pages Liked Pages

More

Popular Posts Discover Posts

Marketplace Blogs Pages Groups

See All

Upgrade to Pro

shared a link

2025-01-06 06:28:02 ·

Multimodal AI In 2025: From Healthcare To eCommerce And Beyond

Multimodal AI - Use CasesLutz FingerMultimodality is set to redefine how enterprises leverage AI in 2025. Imagine an AI that understands not just text but also images, audio, and other sensor data. Humans are naturally multimodal. However, humans are limited in how much input we can process. Take healthcare as an example, during my time at Google Health, I heard many stories where patients overwhelmed doctors with data:Imagine a patient with atrial fibrillation (AFIB) showing up with five years of detailed sleep data collected from their smartwatch. Or take the cancer patient arriving with a 20-pound stack of medical records documenting every treatment theyve had. Both of these situations are very real. For doctors, the challenge is the same: separating the signal from the noise.Whats needed is an AI that can summarize and highlight the key points. Large language models, like ChatGPT, already do this with text, pulling out the most relevant information. But what if we could teach AI to do the same with other types of data like images, time series, or lab results?How Does Multimodality AI Work?To understand how multimodality works, lets start with the fact that AI needs data both to be trained and to make predictions. Multimodal AI is designed to handle diverse data sources text, images, audio, video, and even time-series data at the same time. By combining these inputs, multimodal AI offers a richer, more comprehensive understanding of the problems it tackles.Multimodal AI is more of a discovery tool. The different data modalities are stored by the AI. Once a new data point is input, the AI finds topics that are close. For example, by inputting the sleep data from someone's smartwatch alongside information about their atrial fibrillation (AFIB) episodes, the doctor might find indications of sleep apnea.Read More: AI Agents In 2025: What Enterprise Leaders Need To KnowNote that this is based on "closeness," not correlation. It is the scaled-up version of what Amazon once popularized: "people who shopped for this item also bought this item." In this case, its more like: "People with this type of sleep pattern have also been diagnosed with AFIB."Multimodal Explained: Encoders, Fusion and DecodersA multimodal AI system consists of three main components: Encoders, Fusion and Decoders.Encoding Any ModalityEncoders convert raw data (e.g., text, images, sound, log files, etc.) into a representation the AI can work with. These are called vectors, which are stored in a latent space. To simplify, think of this process as storing an item in a warehouse (latent space), where each item has a specific location (vector). Encoders can process virtually anything: images, text, sound, videos, log files, IoT (sensor) information, time series you name it.Encoding Multimodal Information into a Latent Vector SpaceLutz FingerFusion Mechanism: Combining ModalitiesWhen working with one type of data, like images, encoding is enough. But with multiple types images, sounds, text, or time-series data we need to fuse the information to find whats most relevant.Decoders: Generating Outputs We UnderstandDecoders decodes the information from the latent space aka the warehouse and deliver it to us. It moves from raw, abstract information to something we understand. For example, finding an image of a "house."Encoding and Decoding Multimodal DataLutz FingerIf you want to learn more about encoding, decoding, and reranking, join my eCornell Online Certificate course on Designing and Building AI Solutions. Its a no-coding program that explores all aspects of AI solutions.Transforming eCommerce with MultimodalityLets look at another example: eCommerce. Amazons interface hasnt changed much in 25 years you type a keyword, scroll through results, and hope to find what you need. Multimodality can transform this experience by letting you describe a product, upload a photo, or provide context to find your perfect match.Fixing Search with Multimodal AIAt r2decide, a company a few Cornellians and I started, were using multimodality to merge Search, Browse, and Chat into one seamless flow. Our customers are eCommerce companies tired of losing revenue because their users couldnt find what they needed. At the core of our solution is multimodal AI.For example, in an online jewelry store, a user searching for green would in the past only see green jewelry if the word green appeared in the product text. Since r2decide's AI also encodes images into a shared latent space (e.g., warehouse), it finds green across all modalities. The items are then re-ranked based on the user's past searches and clicks to ensure they receive the most relevant "green" options.Using r2decide Multimodal Search, users will find what they are looking for.Lutz FingerUsers can also search for broader contexts, like "wedding," "red dress," or "gothic." The AI encodes these inputs into the latent space, matches them with suitable products, and displays the most relevant results. This capability even extends to brand names like Swarovski, surfacing relevant items even if the shop doesnt officially carry Swarovski products.Using r2decide's Multimodal Search, users will even find items that "look" like jewelry from a ... [+] competitor.Lutz FingerAI-Generated Nudges to Give Chat-Like AdviceAlongside search results, R2Decide also generates AI-driven nudges contextual recommendations or prompts designed to enhance the user experience. These nudges are powered by AI agents, as I described in my post on agentic AI yesterday. Their purpose is to guide users effortlessly toward the most relevant options, making the search process intuitive, engaging, and effective.Multimodality in 2025: Infinite Possibilities for EnterprisesMultimodality is transforming industries, from healthcare to eCommerce. And it doesnt stop there. Startups like TC Labs use multimodal AI to streamline engineering workflows, boosting efficiency and quality, while Toyota uses it for interactive, personalized customer assistance.2025 will be the year multimodal AI transforms how enterprises work. Follow me here on Forbes, or on LinkedIn for more of my 2025 AI predictions.

·182 Views