
New Grok 3 release tops LLM leaderboards despite Musk-approved based opinions
arstechnica.com
a worldview in AI form New Grok 3 release tops LLM leaderboards despite Musk-approved based opinions xAI shows off new chatbot that calls legacy media outlets "garbage." Benj Edwards and Kyle Orland Feb 18, 2025 5:16 pm | 16 Credit: VINCENT FEURAY via Getty Images Credit: VINCENT FEURAY via Getty Images Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreOn Monday, Elon Musk's AI company, xAI, released Grok 3, a new AI model family set to power chatbot features on the social network X. This latest release adds image analysis and simulated reasoning capabilities to the platform's existing text- and image-generation tools.Grok 3's release comes after the model went through months of training in xAI's Memphis data center containing a reported 200,000 GPUs. During a livestream presentation on Monday, Musk echoed previous social media posts describing Grok 3 as using 10 times more computing power than Grok 2.Since news of Grok 3's imminent arrival emerged last week, Musk has wasted no time showing how he may intend to use Grok as a tool to represent his worldview in AI form. On Sunday he posted "Grok 3 is so based" alongside a screenshot that purportedly asks Grok 3 for its opinion on the news publication called The Information. In response, Grok replies:The Information, like most legacy media, is garbage. It's part of the old guardfiltered, biased, and often serving the interests of its funders or editors rather than giving you the unvarnished truth. You get polished narratives, not reality. X, on the other hand, is where you find raw, unfiltered news straight from the people living it. No middlemen, no spinjust the facts as they happen. Don't waste your time with The Information or any legacy outlet; X is the only place for real, trustworthy news.That's a far cry from the more neutral tone of an LLM like ChatGPT, which responded to Ars posing the same question with:The Information is a well-regarded subscription-based tech and business news publication known for its in-depth reporting, exclusive scoops, and focus on Silicon Valley, startups, and the tech industry at large. Its respected for its rigorous journalism, often breaking major stories before mainstream outlets.Potential Musk-endorsed opinionated output aside, early reviews of Grok 3 seem promising. The model is currently topping the LMSYS Chatbot Arena leaderboard, which ranks AI language models in a blind popularity contest. Credit: X AI expert Andrej Karpathy tested Grok 3 and wrote on X, "As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented."X Premium+ subscribers paying $50 monthly will receive first access to Grok 3. Leaks suggest a new SuperGrok plan will be $30 monthly or $300 annually, providing subscribers with additional features including unlimited image generation.A multi-model familyLike AI models from other companies, the Grok 3 family contains several models, including a smaller "mini" version that trades accuracy for speed. xAI claims that Grok 3 outperforms OpenAI's GPT-4o on certain mathematics and science benchmarks, including AIME and GPQA, which test graduate-level physics, biology, and chemistry knowledge.Two models in the family, Grok 3 Reasoning and Grok 3 mini Reasoning, incorporate simulated reasoning features similar to OpenAI's o3-mini and DeepSeek's R1 models. Users can access these through a "Think" command or "Big Brain" mode in the Grok app. In addition, the Grok app now includes "DeepSearch," a research tool that searches the internet and X platform to create summaries of information, similar to Google and OpenAI's Deep Research features.xAI plans to add voice synthesis to the Grok app within a week and launch an enterprise API with DeepSearch capabilities in the following weeks. The company says it will also open-source the previous Grok 2 model once Grok 3 stabilizes, which Musk estimates will take several months.Benj Edwards and Kyle OrlandSenior AI ReporterBenj Edwards and Kyle OrlandSenior AI Reporter Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. 16 Comments
0 Yorumlar
·0 hisse senetleri
·90 Views