OpenAIs Most Advanced AI Release Stumped by New York Times Word Game

compartilhou um link

2025-01-10 18:20:28 -

Connections claims another victim. Awful General IntelligenceWhile OpenAI CEO Sam Altman claims that the company already has the building blocks for artificial general intelligence, a simple test of its most advanced publicly-avilable AI system was caught majorly lacking by a puzzle that countless people do every day.As Walter Bradley Center for Natural and Artificial Intelligence senior fellow Gary Smithwrites for Mind Matters, OpenAI's o1 "reasoning" model failed spectacularly when tasked with solving theNew York Times' notoriously trickyConnections word game.The game's rules are deceptively simple. Players are given 16 terms and tasked with figuring out what they have in common, within groups of four but because the things relating them can be as obvious as "book subtitles" or as esoteric as "words that start with fire," it can be quite challenging.As Smith explained, he had o1 and other large language models (LLMs) from Google, Anthropic, and Microsoft (which is powered by OpenAI's tech) try to solve the Connections puzzle of the day.Surprisingly if you buy into AI hype, at least they all failed. That's especially true of o1,which has been immensely hyped as the company's next-level system, but which apparently can't reason its way through an NYT word game.Connect FourWhen he fed that day's Connections challenge into the model, o1 did, to its credit, getsomeof the groupings right. But itsother "purported combinations verge[d] on the bizarre," Smith found.In one instance, o1 grouped the words "boot," "umbrella," "blanket," and "pant" and said the relating theme was "clothing or accessories." Three out of four ain't bad, of course, but who's wearing a blanket,except as some sort of out-there fashion statement?After doing the entire exercise over with the same set of words, the LLM confidently said that "breeze," "puff," "broad," and "picnic" were "types of movement or air."Points for the first two, but we're as puzzled as Smith on the latter ones.Overall, Smith rightfully assessed o1 as proffering "many puzzling groupings" alongside its "few valid connections."It's also a telling demonstration of some familiar AI shortfalls: that it can often impress when regurgitating information that's already well documented in its training data, but frequently struggles with novel queries.Our semi-professional take: if OpenAI is indeed reaching the precipice of AGI or has already achieved the start of it, as one of its employees claimed at the end of last year the company is clearly keeping it behind wraps, because this simply ain't it.Share This Article

0 Comentários 0 Compartilhamentos 127 Visualizações