Composo helps enterprises monitor how well AI apps work
techcrunch.com
AI and the large language models (LLMs) that power them have a ton of useful applications, but for all their promise, theyre not very reliable. No one knows when this problem will be solved, so it makes sense that were seeing startups finding an opportunity in helping enterprises make sure the LLM-powered apps theyre paying for work as intended.London-based startup Composo feels it has a headstart in trying to solve that problem, thanks to its custom models that can help enterprises evaluate the accuracy and quality of apps that are powered by LLMs.The companys similar to Agenta, Freeplay, Humanloop and LangSmith, which all claim to offer a more solid, LLM-based alternative to human testing, checklists and existing observability tools. But Composo claims its different because it offers both a no-code option and an API. Thats notable because this widens the scope of its potential market you dont have to be a developer to use it, and domain experts and executives can evaluate AI apps for inconsistencies, quality and accuracy themselves.In practice, Composo combines a reward model trained on the output a person would prefer to see from an AI app with a defined set of critera that are specific to that app to create a system that essentially evaluates outputs from the app against those criteria. For instance, a medical triage chatbot can have its client set custom guidelines to check for red flag symptoms, and Composo can score how consistently the app does it.The company recently launched a public API for Composo Align, a model for evaluating LLM applications on any criteria.The strategy seems to be working somewhat: It has names like Accenture, Palantir and McKinsey in its customer base, and it recently raised $2 million in pre-seed funding. The small amount raised here is not uncommon for a startup in todays venture climate, but it is notable because this is AI Land, after all funding to such companies is abundant. But according to Composos co-founder and CEO, Sebastian Fox, the relatively low number is because the startups approach is not particularly capital intensive.For the next three years at least, we dont foresee ourselves raising hundreds of millions because theres a lot of people building foundation models and doing so very effectively, and thats not our USP, Fox, a former Mckinsey consultant, said. Instead, each morning, if I wake up and see a news piece that OpenAI has made a huge advance in their models, that is good for my business.With the fresh cash, Composo plans to expand its engineering team (led by co-founder and CTO Luke Markham, a former machine learning engineer at Graphcore), acquire more clients and bolster its R&D efforts. The focus from this year is much more about scaling the technology that we now have across those companies, Fox said.British AI pre-seed fund Twin Path Ventures led the seed round, which also saw participation from JVH Ventures and EWOR (the latter had backed the startup through its accelerator program). Composo is addressing a critical bottleneck in the adoption of enterprise AI, a spokesperson for Twin Path said in a statement.That bottleneck is a big problem for the overall AI movement, particularly in the enterprise segment, Fox said. People are over the hype of excitement and are now thinking, Well, actually, does this really change anything about my business in its current form? Because its not reliable enough, and its not consistent enough. And even if it is, you cant prove to me how much it is, he said.That bottleneck could make Composo more valuable to companies that want to implement AI but could incur reputational risk from doing so. Fox says thats why his company chose to be industry agnostic, but still have resonance in the compliance, legal, health care and security spaces.As for its competitive moat, Fox feels that the R&D required to get here is not trivial.Theres both the architecture of the model and the data that weve used to train it, he said, explaining that Composo Align was trained on a large dataset of expert evaluations.Theres still the question of what tech giants could do if they simply tapped their massive war chests to enter this problem, but Composo thinks it has a first mover advantage. The other [thing] is the data that we accrue over time, Fox said, referring to how Composo has built evaluation preferences.Because it assesses apps against a flexible set of criteria, Composo also sees itself as better suited to the rise of agentic AI than competitors that use a more constrained approach. In my opinion, we are definitely not at the stage where agents work well, and thats actually what were trying to help solve, Fox said.
0 Comentários
·0 Compartilhamentos
·56 Visualizações