Is your AI app pissing off users or going off-script? Raindrop...

μοιράστηκε ένα σύνδεσμο

2025-05-20 01:15:29 ·

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

As enterprises increasingly look to build and deploy generative AI-powered applications and services for internal or external use, one of the toughest questions they face is understanding exactly how well these AI tools are performing out in the wild.
In fact, a recent survey by consulting firm McKinsey and Company found that only 27% of 830 respondents said that their enterprises’ reviewed all of the outputs of their generative AI systems before they went out to users.
Unless a user actually writes in with a complaint report, how is a company to know if its AI product is behaving as expected and planned?
Raindrop, formerly known as Dawn AI, is a new startup tackling the challenge head-on, positioning itself as the first observability platform purpose-built for AI in production, catching errors as they happen and explaining to enterprises what went wrong and why. The goal? Help solve generative AI’s so-called “black box problem.”
“AI products fail constantly—in ways both hilarious and terrifying,” wrote co-founder Ben Hylak on X recently, “Regular software throws exceptions. But AI products fail silently.”
Raindrop seeks to offer any category-defining tool akin to what observability company Sentry does for traditional software.
But while traditional exception tracking tools don’t capture the nuanced misbehaviors of large language models or AI companions, Raindrop attempts to fill the hole.
“In traditional software, you have tools like Sentry and Datadog to tell you what’s going wrong in production,” he told VentureBeat in a video call interview last week. “With AI, there was nothing.”
Until now — of course.
How Raindrop works
Raindrop offers a suite of tools that allow teams at enterprises large and small to detect, analyze, and respond to AI issues in real time.
The platform sits at the intersection of user interactions and model outputs, analyzing patterns across hundreds of millions of daily events, but doing so with SOC-2 encryption enabled, protecting the data and privacy of users and the company offering the AI solution.
“Raindrop sits where the user is,” Hylak explained. “We analyze their messages, plus signals like thumbs up/down, build errors, or whether they deployed the output, to infer what’s actually going wrong.”
Raindrop uses a machine learning pipeline that combines LLM-powered summarization with smaller bespoke classifiers optimized for scale.
Promotional screenshot of Raindrop’s dashboard. Credit: Raindrop.ai
“Our ML pipeline is one of the most complex I’ve seen,” Hylak said. “We use large LLMs for early processing, then train small, efficient models to run at scale on hundreds of millions of events daily.”
Customers can track indicators like user frustration, task failures, refusals, and memory lapses. Raindrop uses feedback signals such as thumbs down, user corrections, or follow-up behaviorto identify issues.
Fellow Raindrop co-founder and CEO Zubin Singh Koticha told VentureBeat in the same interview that while many enterprises relied on evaluations, benchmarks, and unit tests for checking the reliability of their AI solutions, there was very little designed to check AI outputs during production.

“Imagine in traditional coding if you’re like, ‘Oh, my software passes ten unit tests. It’s great. It’s a robust piece of software.’ That’s obviously not how it works,” Koticha said. “It’s a similar problem we’re trying to solve here, where in production, there isn’t actually a lot that tells you: is it working extremely well? Is it broken or not? And that’s where we fit in.”

For enterprises in highly regulated industries or for those seeking additional levels of privacy and control, Raindrop offers Notify, a fully on-premises, privacy-first version of the platform aimed at enterprises with strict data handling requirements.
Unlike traditional LLM logging tools, Notify performs redaction both client-side via SDKs and server-side with semantic tools. It stores no persistent data and keeps all processing within the customer’s infrastructure.
Raindrop Notify provides daily usage summaries and surfacing of high-signal issues directly within workplace tools like Slack and Teams—without the need for cloud logging or complex DevOps setups.
Advanced error identification and precision
Identifying errors, especially with AI models, is far from straightforward.
“What’s hard in this space is that every AI application is different,” said Hylak. “One customer might build a spreadsheet tool, another an alien companion. What ‘broken’ looks like varies wildly between them.” That variability is why Raindrop’s system adapts to each product individually.
Each AI product Raindrop monitors is treated as unique. The platform learns the shape of the data and behavior norms for each deployment, then builds a dynamic issue ontology that evolves over time.
“Raindrop learns the data patterns of each product,” Hylak explained. “It starts with a high-level ontology of common AI issues—things like laziness, memory lapses, or user frustration—and then adapts those to each app.”
Whether it’s a coding assistant that forgets a variable, an AI alien companion that suddenly refers to itself as a human from the U.S., or even a chatbot that starts randomly bringing up claims of “white genocide” in South Africa, Raindrop aims to surface these issues with actionable context.
The notifications are designed to be lightweight and timely. Teams receive Slack or Microsoft Teams alerts when something unusual is detected, complete with suggestions on how to reproduce the problem.
Over time, this allows AI developers to fix bugs, refine prompts, or even identify systemic flaws in how their applications respond to users.
“We classify millions of messages a day to find issues like broken uploads or user complaints,” said Hylak. “It’s all about surfacing patterns strong and specific enough to warrant a notification.”
From Sidekick to Raindrop
The company’s origin story is rooted in hands-on experience. Hylak, who previously worked as a human interface designer at visionOS at Apple and avionics software engineering at SpaceX, began exploring AI after encountering GPT-3 in its early days back in 2020.
“As soon as I used GPT-3—just a simple text completion—it blew my mind,” he recalled. “I instantly thought, ‘This is going to change how people interact with technology.’”
Alongside fellow co-founders Koticha and Alexis Gauba, Hylak initially built Sidekick, a VS Code extension with hundreds of paying users.
But building Sidekick revealed a deeper problem: debugging AI products in production was nearly impossible with the tools available.
“We started by building AI products, not infrastructure,” Hylak explained. “But pretty quickly, we saw that to grow anything serious, we needed tooling to understand AI behavior—and that tooling didn’t exist.”
What started as an annoyance quickly evolved into the core focus. The team pivoted, building out tools to make sense of AI product behavior in real-world settings.
In the process, they discovered they weren’t alone. Many AI-native companies lacked visibility into what their users were actually experiencing and why things were breaking. With that, Raindrop was born.
Raindrop’s pricing, differentiation and flexibility have attracted a wide range of initial customers
Raindrop’s pricing is designed to accommodate teams of various sizes.
A Starter plan is available at /month, with metered usage pricing. The Pro tier, which includes custom topic tracking, semantic search, and on-prem features, starts at /month and requires direct engagement.
While observability tools are not new, most existing options were built before the rise of generative AI.
Raindrop sets itself apart by being AI-native from the ground up. “Raindrop is AI-native,” Hylak said. “Most observability tools were built for traditional software. They weren’t designed to handle the unpredictability and nuance of LLM behavior in the wild.”
This specificity has attracted a growing set of customers, including teams at Clay.com, Tolen, and New Computer.
Raindrop’s customers span a wide range of AI verticals—from code generation tools to immersive AI storytelling companions—each requiring different lenses on what “misbehavior” looks like.
Born from necessity
Raindrop’s rise illustrates how the tools for building AI need to evolve alongside the models themselves. As companies ship more AI-powered features, observability becomes essential—not just to measure performance, but to detect hidden failures before users escalate them.
In Hylak’s words, Raindrop is doing for AI what Sentry did for web apps—except the stakes now include hallucinations, refusals, and misaligned intent. With its rebrand and product expansion, Raindrop is betting that the next generation of software observability will be AI-first by design.

Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.
#your #app #pissing #off #users

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As enterprises increasingly look to build and deploy generative AI-powered applications and services for internal or external use, one of the toughest questions they face is understanding exactly how well these AI tools are performing out in the wild. In fact, a recent survey by consulting firm McKinsey and Company found that only 27% of 830 respondents said that their enterprises’ reviewed all of the outputs of their generative AI systems before they went out to users. Unless a user actually writes in with a complaint report, how is a company to know if its AI product is behaving as expected and planned? Raindrop, formerly known as Dawn AI, is a new startup tackling the challenge head-on, positioning itself as the first observability platform purpose-built for AI in production, catching errors as they happen and explaining to enterprises what went wrong and why. The goal? Help solve generative AI’s so-called “black box problem.” “AI products fail constantly—in ways both hilarious and terrifying,” wrote co-founder Ben Hylak on X recently, “Regular software throws exceptions. But AI products fail silently.” Raindrop seeks to offer any category-defining tool akin to what observability company Sentry does for traditional software. But while traditional exception tracking tools don’t capture the nuanced misbehaviors of large language models or AI companions, Raindrop attempts to fill the hole. “In traditional software, you have tools like Sentry and Datadog to tell you what’s going wrong in production,” he told VentureBeat in a video call interview last week. “With AI, there was nothing.” Until now — of course. How Raindrop works Raindrop offers a suite of tools that allow teams at enterprises large and small to detect, analyze, and respond to AI issues in real time. The platform sits at the intersection of user interactions and model outputs, analyzing patterns across hundreds of millions of daily events, but doing so with SOC-2 encryption enabled, protecting the data and privacy of users and the company offering the AI solution. “Raindrop sits where the user is,” Hylak explained. “We analyze their messages, plus signals like thumbs up/down, build errors, or whether they deployed the output, to infer what’s actually going wrong.” Raindrop uses a machine learning pipeline that combines LLM-powered summarization with smaller bespoke classifiers optimized for scale. Promotional screenshot of Raindrop’s dashboard. Credit: Raindrop.ai “Our ML pipeline is one of the most complex I’ve seen,” Hylak said. “We use large LLMs for early processing, then train small, efficient models to run at scale on hundreds of millions of events daily.” Customers can track indicators like user frustration, task failures, refusals, and memory lapses. Raindrop uses feedback signals such as thumbs down, user corrections, or follow-up behaviorto identify issues. Fellow Raindrop co-founder and CEO Zubin Singh Koticha told VentureBeat in the same interview that while many enterprises relied on evaluations, benchmarks, and unit tests for checking the reliability of their AI solutions, there was very little designed to check AI outputs during production. “Imagine in traditional coding if you’re like, ‘Oh, my software passes ten unit tests. It’s great. It’s a robust piece of software.’ That’s obviously not how it works,” Koticha said. “It’s a similar problem we’re trying to solve here, where in production, there isn’t actually a lot that tells you: is it working extremely well? Is it broken or not? And that’s where we fit in.” For enterprises in highly regulated industries or for those seeking additional levels of privacy and control, Raindrop offers Notify, a fully on-premises, privacy-first version of the platform aimed at enterprises with strict data handling requirements. Unlike traditional LLM logging tools, Notify performs redaction both client-side via SDKs and server-side with semantic tools. It stores no persistent data and keeps all processing within the customer’s infrastructure. Raindrop Notify provides daily usage summaries and surfacing of high-signal issues directly within workplace tools like Slack and Teams—without the need for cloud logging or complex DevOps setups. Advanced error identification and precision Identifying errors, especially with AI models, is far from straightforward. “What’s hard in this space is that every AI application is different,” said Hylak. “One customer might build a spreadsheet tool, another an alien companion. What ‘broken’ looks like varies wildly between them.” That variability is why Raindrop’s system adapts to each product individually. Each AI product Raindrop monitors is treated as unique. The platform learns the shape of the data and behavior norms for each deployment, then builds a dynamic issue ontology that evolves over time. “Raindrop learns the data patterns of each product,” Hylak explained. “It starts with a high-level ontology of common AI issues—things like laziness, memory lapses, or user frustration—and then adapts those to each app.” Whether it’s a coding assistant that forgets a variable, an AI alien companion that suddenly refers to itself as a human from the U.S., or even a chatbot that starts randomly bringing up claims of “white genocide” in South Africa, Raindrop aims to surface these issues with actionable context. The notifications are designed to be lightweight and timely. Teams receive Slack or Microsoft Teams alerts when something unusual is detected, complete with suggestions on how to reproduce the problem. Over time, this allows AI developers to fix bugs, refine prompts, or even identify systemic flaws in how their applications respond to users. “We classify millions of messages a day to find issues like broken uploads or user complaints,” said Hylak. “It’s all about surfacing patterns strong and specific enough to warrant a notification.” From Sidekick to Raindrop The company’s origin story is rooted in hands-on experience. Hylak, who previously worked as a human interface designer at visionOS at Apple and avionics software engineering at SpaceX, began exploring AI after encountering GPT-3 in its early days back in 2020. “As soon as I used GPT-3—just a simple text completion—it blew my mind,” he recalled. “I instantly thought, ‘This is going to change how people interact with technology.’” Alongside fellow co-founders Koticha and Alexis Gauba, Hylak initially built Sidekick, a VS Code extension with hundreds of paying users. But building Sidekick revealed a deeper problem: debugging AI products in production was nearly impossible with the tools available. “We started by building AI products, not infrastructure,” Hylak explained. “But pretty quickly, we saw that to grow anything serious, we needed tooling to understand AI behavior—and that tooling didn’t exist.” What started as an annoyance quickly evolved into the core focus. The team pivoted, building out tools to make sense of AI product behavior in real-world settings. In the process, they discovered they weren’t alone. Many AI-native companies lacked visibility into what their users were actually experiencing and why things were breaking. With that, Raindrop was born. Raindrop’s pricing, differentiation and flexibility have attracted a wide range of initial customers Raindrop’s pricing is designed to accommodate teams of various sizes. A Starter plan is available at /month, with metered usage pricing. The Pro tier, which includes custom topic tracking, semantic search, and on-prem features, starts at /month and requires direct engagement. While observability tools are not new, most existing options were built before the rise of generative AI. Raindrop sets itself apart by being AI-native from the ground up. “Raindrop is AI-native,” Hylak said. “Most observability tools were built for traditional software. They weren’t designed to handle the unpredictability and nuance of LLM behavior in the wild.” This specificity has attracted a growing set of customers, including teams at Clay.com, Tolen, and New Computer. Raindrop’s customers span a wide range of AI verticals—from code generation tools to immersive AI storytelling companions—each requiring different lenses on what “misbehavior” looks like. Born from necessity Raindrop’s rise illustrates how the tools for building AI need to evolve alongside the models themselves. As companies ship more AI-powered features, observability becomes essential—not just to measure performance, but to detect hidden failures before users escalate them. In Hylak’s words, Raindrop is doing for AI what Sentry did for web apps—except the stakes now include hallucinations, refusals, and misaligned intent. With its rebrand and product expansion, Raindrop is betting that the next generation of software observability will be AI-first by design. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured. #your #app #pissing #off #users

VENTUREBEAT.COM

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As enterprises increasingly look to build and deploy generative AI-powered applications and services for internal or external use (employees or customers), one of the toughest questions they face is understanding exactly how well these AI tools are performing out in the wild. In fact, a recent survey by consulting firm McKinsey and Company found that only 27% of 830 respondents said that their enterprises’ reviewed all of the outputs of their generative AI systems before they went out to users. Unless a user actually writes in with a complaint report, how is a company to know if its AI product is behaving as expected and planned? Raindrop, formerly known as Dawn AI, is a new startup tackling the challenge head-on, positioning itself as the first observability platform purpose-built for AI in production, catching errors as they happen and explaining to enterprises what went wrong and why. The goal? Help solve generative AI’s so-called “black box problem.” “AI products fail constantly—in ways both hilarious and terrifying,” wrote co-founder Ben Hylak on X recently, “Regular software throws exceptions. But AI products fail silently.” Raindrop seeks to offer any category-defining tool akin to what observability company Sentry does for traditional software. But while traditional exception tracking tools don’t capture the nuanced misbehaviors of large language models or AI companions, Raindrop attempts to fill the hole. “In traditional software, you have tools like Sentry and Datadog to tell you what’s going wrong in production,” he told VentureBeat in a video call interview last week. “With AI, there was nothing.” Until now — of course. How Raindrop works Raindrop offers a suite of tools that allow teams at enterprises large and small to detect, analyze, and respond to AI issues in real time. The platform sits at the intersection of user interactions and model outputs, analyzing patterns across hundreds of millions of daily events, but doing so with SOC-2 encryption enabled, protecting the data and privacy of users and the company offering the AI solution. “Raindrop sits where the user is,” Hylak explained. “We analyze their messages, plus signals like thumbs up/down, build errors, or whether they deployed the output, to infer what’s actually going wrong.” Raindrop uses a machine learning pipeline that combines LLM-powered summarization with smaller bespoke classifiers optimized for scale. Promotional screenshot of Raindrop’s dashboard. Credit: Raindrop.ai “Our ML pipeline is one of the most complex I’ve seen,” Hylak said. “We use large LLMs for early processing, then train small, efficient models to run at scale on hundreds of millions of events daily.” Customers can track indicators like user frustration, task failures, refusals, and memory lapses. Raindrop uses feedback signals such as thumbs down, user corrections, or follow-up behavior (like failed deployments) to identify issues. Fellow Raindrop co-founder and CEO Zubin Singh Koticha told VentureBeat in the same interview that while many enterprises relied on evaluations, benchmarks, and unit tests for checking the reliability of their AI solutions, there was very little designed to check AI outputs during production. “Imagine in traditional coding if you’re like, ‘Oh, my software passes ten unit tests. It’s great. It’s a robust piece of software.’ That’s obviously not how it works,” Koticha said. “It’s a similar problem we’re trying to solve here, where in production, there isn’t actually a lot that tells you: is it working extremely well? Is it broken or not? And that’s where we fit in.” For enterprises in highly regulated industries or for those seeking additional levels of privacy and control, Raindrop offers Notify, a fully on-premises, privacy-first version of the platform aimed at enterprises with strict data handling requirements. Unlike traditional LLM logging tools, Notify performs redaction both client-side via SDKs and server-side with semantic tools. It stores no persistent data and keeps all processing within the customer’s infrastructure. Raindrop Notify provides daily usage summaries and surfacing of high-signal issues directly within workplace tools like Slack and Teams—without the need for cloud logging or complex DevOps setups. Advanced error identification and precision Identifying errors, especially with AI models, is far from straightforward. “What’s hard in this space is that every AI application is different,” said Hylak. “One customer might build a spreadsheet tool, another an alien companion. What ‘broken’ looks like varies wildly between them.” That variability is why Raindrop’s system adapts to each product individually. Each AI product Raindrop monitors is treated as unique. The platform learns the shape of the data and behavior norms for each deployment, then builds a dynamic issue ontology that evolves over time. “Raindrop learns the data patterns of each product,” Hylak explained. “It starts with a high-level ontology of common AI issues—things like laziness, memory lapses, or user frustration—and then adapts those to each app.” Whether it’s a coding assistant that forgets a variable, an AI alien companion that suddenly refers to itself as a human from the U.S., or even a chatbot that starts randomly bringing up claims of “white genocide” in South Africa, Raindrop aims to surface these issues with actionable context. The notifications are designed to be lightweight and timely. Teams receive Slack or Microsoft Teams alerts when something unusual is detected, complete with suggestions on how to reproduce the problem. Over time, this allows AI developers to fix bugs, refine prompts, or even identify systemic flaws in how their applications respond to users. “We classify millions of messages a day to find issues like broken uploads or user complaints,” said Hylak. “It’s all about surfacing patterns strong and specific enough to warrant a notification.” From Sidekick to Raindrop The company’s origin story is rooted in hands-on experience. Hylak, who previously worked as a human interface designer at visionOS at Apple and avionics software engineering at SpaceX, began exploring AI after encountering GPT-3 in its early days back in 2020. “As soon as I used GPT-3—just a simple text completion—it blew my mind,” he recalled. “I instantly thought, ‘This is going to change how people interact with technology.’” Alongside fellow co-founders Koticha and Alexis Gauba, Hylak initially built Sidekick, a VS Code extension with hundreds of paying users. But building Sidekick revealed a deeper problem: debugging AI products in production was nearly impossible with the tools available. “We started by building AI products, not infrastructure,” Hylak explained. “But pretty quickly, we saw that to grow anything serious, we needed tooling to understand AI behavior—and that tooling didn’t exist.” What started as an annoyance quickly evolved into the core focus. The team pivoted, building out tools to make sense of AI product behavior in real-world settings. In the process, they discovered they weren’t alone. Many AI-native companies lacked visibility into what their users were actually experiencing and why things were breaking. With that, Raindrop was born. Raindrop’s pricing, differentiation and flexibility have attracted a wide range of initial customers Raindrop’s pricing is designed to accommodate teams of various sizes. A Starter plan is available at $65/month, with metered usage pricing. The Pro tier, which includes custom topic tracking, semantic search, and on-prem features, starts at $350/month and requires direct engagement. While observability tools are not new, most existing options were built before the rise of generative AI. Raindrop sets itself apart by being AI-native from the ground up. “Raindrop is AI-native,” Hylak said. “Most observability tools were built for traditional software. They weren’t designed to handle the unpredictability and nuance of LLM behavior in the wild.” This specificity has attracted a growing set of customers, including teams at Clay.com, Tolen, and New Computer. Raindrop’s customers span a wide range of AI verticals—from code generation tools to immersive AI storytelling companions—each requiring different lenses on what “misbehavior” looks like. Born from necessity Raindrop’s rise illustrates how the tools for building AI need to evolve alongside the models themselves. As companies ship more AI-powered features, observability becomes essential—not just to measure performance, but to detect hidden failures before users escalate them. In Hylak’s words, Raindrop is doing for AI what Sentry did for web apps—except the stakes now include hallucinations, refusals, and misaligned intent. With its rebrand and product expansion, Raindrop is betting that the next generation of software observability will be AI-first by design. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.

·201 Views

Γίνε Μέλος

Γλώσσες

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance