An OpenAI open model shows how much the companyand AIhas changed in two years
www.fastcompany.com
Welcome toAI Decoded,Fast Companys weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every weekhere.OpenAI says it will release an open-source modelbut why now?OpenAI CEO Sam Altman said Monday that his company intends to release a powerful new open-weight language model with reasoning in the next few months. That would mark a major shift for a company that has kept its models proprietary and secret since 2019. The announcement wasnt a total surprise: After the groundbreaking Chinese open-source model DeepSeek-R1 showed up in January, Altman said during a Reddit AMA that he realized his company was on the wrong side of history and suggested an OpenAI open-source model was a real possibility.Open models typically come with a permissive license that requires little or no payment to the model developer. Open-weight models can be more cost-effective for corporations trying to leverage AI since they allow businesses to host (and secure) the models themselvesavoiding the often risky prospect of sending proprietary data through an API to a third-party provider and paying fees to do it. More businesses are moving in this directionespecially those holding sensitive user data in regulated industries.The catch: A corporate user doesnt have to pay to use the open model. Some AI labs release open models to gain credibility in the marketpotentially paving the way to eventually sell API access to their more powerful closed models. By releasing open models early on, the French AI company Mistral established itself as a top-tier AI lab and a legitimate alternative to U.S. players. Some AI labs release open-source models, then earn consulting fees by helping large enterprises deploy and optimize the models over time.Metas Llama models are the most widely deployed open modelsthough the company restricts reuse and redistribution and keeps the training data and code secret, meaning they are not by definition open source. Meta had different reasons for giving away its models. Unlike Mistral and others, it makes money by surveilling users and targeting adsnot by renting out AI models. Zuckerberg continues funding Llama research because the models are a disruptive force in the industry and earn Meta the right to be called an AI company.OpenAI now has its own reasons for releasing an open-weight model. Eighteen months ago, OpenAI was the undisputed champion of state-of-the-art AI models. But in the time since, the release of LLMs like Googles formidable Gemini 2.0 and DeepSeeks open-source R1 have cracked the competition wide open.The market has changed, and OpenAI itself has evolved. Like Meta, OpenAI doesnt depend directly and solely on its models for its revenue. Selling access to its models via an API is no longer the companys main source of revenue. Now, most of its revenue, not to mention its staggering $300 billion valuation, comes from selling subscriptions to ChatGPT (most of them to individual consumers). OpenAIs real superpower is being a household-name consumer AI brand.OpenAI will definitely continue pouring massive resources into developing ever-better models, but its main reason for doing so isnt to collect rent from developers for direct access to them, but rather to continue making ChatGPT smarter for consumers.AI video generation is getting scary goodAI-video-generation tools are rapidly leaping over the uncanny valley, making it increasingly difficult for everyday internet users to distinguish between real and generated video. This could bode well for smaller companies looking to produce glossy, creative, or ambitious ads at a fraction of the normal cost. But it could spell bad news if bad actors use the technology in phishing scams or to spread disinformation. Its also yet another threat to the film sectors livelihood.The issue is back in the spotlight following several announcements, starting with Runwaysrelease of its new Gen-4 video-generation system, which the company says produces production ready video.AI startup Runway says the new system of models understands much of the worlds physics (a claim supported by this video of a man being overtaken by an ocean wave). The company also touts improvements in video consistency and realism, as well as user control during the generation process. Runway posted a demo video of Gen-4s control tools, which makes the production process look pretty easy, even for non-technicals). Some of the samples of finished videos posted on X look somehow more real than real (see Jean Baudrillard, Simulacra and Simulation).Runway faces some stiff competition in the AI video space in the form of perennial contenders including Googles Veo 2 model, OpenAIs Sora, Adobe Firefly, Pika, and Kling.A new math benchmark aims to beat test question contaminationPeople in the AI community have been debating for some time whether our current methods of testing models math skills are broken. The concern is that while existing math benchmarks contain some very hard problems, those problems (and their solutions) tend to get published online pretty quickly. This of course makes the problem-solution sets fair game for AI companies sweeping up training data for their next models. The worry is that, come evaluation time, the models may have already encountered the test problems and answers in their training data.A new benchmark called MathArena was designed to eliminate those issues. MathArena takes its math problems from very recent math competitions and Olympiads, which have obvious incentives to keep their problems secret. The researchers from MathArena also created their own standard method of administering the evaluation, meaning the AI model developers cant give their own models an edge via changes to the evaluation setup.MathArena has just released the results of the most recent benchmark, which includes questions from the 2025 USA Math Olympiad. Heres one of the questions: Let H be the orthocenter of the acute triangle ABC, let F be the foot of the altitude from C to AB, and let P be the reflection of H across BC. Suppose that the circumcircle of triangle AFP intersects line BC at two distinct points, X and Y. Prove that C is the midpoint of XY. Ouch. And to make matters worse, the test requires not only the correct answer but a description of each reasoning step the model took along the way.The results are, well, ugly. Some of the most powerful and celebrated models in the world took the test, and none scored above 5%. The top score went to DeepSeeks R1 model, which earned a 4.76%. Googles Gemini 2.0 Flash Thinking model scored 4.17%. Anthropics Claude 3.7 Sonnet (Thinking) scored 3.65%. OpenAIs most recent thinking model, o3 mini, scored 2.08%.(Update: Shortly after this writing, Googles new Gemini 2.5 Pro model scored an impressive 24.4% on the benchmark, besting other top models by a wide margin.)The results suggest one of several possibilities: Maybe MathArena contains far harder questions than other benchmarks, or LLMs arent great at explaining their reasoning steps, or earlier math benchmark scores are questionable because the LLMs had already seen the answers. Looks like LLMs still have some homework to do.More AI coverage from Fast Company:An AI watchdog accused OpenAI of using copyrighted books without permissionAmazon unveils Nova Act, an AI agent that can shop for youWhat is AI thinking? Anthropic researchers are starting to figure it outHow Hebbia is building AI for in-depth researchWant exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.
0 Reacties ·0 aandelen ·29 Views