• Instagram announces a blatant CapCut clone
    www.theverge.com
    Instagram head Adam Mosseri just announced a video editing app called Edits. Mosseri said the app is meant to rival CapCut, a video editing app that went offline along with TikTok. Edits is available for preorder on the iOS App Store.Theres a lot going on right now, but no matter what happens, its our job to provide the best possible tools for creators, Mosseri said in a video posted to Instagram. He goes on to describe the app:Edits is more than a video editing app; its a full suite of creative tools. There will be a dedicated tab for inspiration, another for keeping track of early ideas, a much higher-quality camera (which I used to record this video), all the editing tools youd expect, the ability to share drafts with friends and other creators, and if you decide to share your videos on Instagram powerful insights into how those videos perform.The insights he mentions include a live insights dashboard, a breakdown of follower and non-follower engagement, and metrics for how often users skip specific ones. It will also include editing tools that let people use green screens and video overlays, both common features of TikTok videos, according to its App Store listing.Below is a collection of screenshots Meta provided to The Verge via email.While Mosseri doesnt say as much in his video, the announcement feels like a clear push to get the app into peoples minds as the future of TikTok and other ByteDance-owned apps like CapCut remains in question. Edits will be available starting March 13th, 2025.
    0 Comments ·0 Shares ·101 Views
  • OmniThink: A Cognitive Framework for Enhanced Long-Form Article Generation Through Iterative Reflection and Expansion
    www.marktechpost.com
    LLMs have made significant strides in automated writing, particularly in tasks like open-domain long-form generation and topic-specific reports. Many approaches rely on Retrieval-Augmented Generation (RAG) to incorporate external information into the writing process. However, these methods often fall short due to fixed retrieval strategies, limiting the generated contents depth, diversity, and utilitythis lack of nuanced and comprehensive exploration results in repetitive, shallow, and unoriginal outputs. While newer methods like STORM and Co-STORM broaden information collection through role-playing and multi-perspective retrieval, they remain confined by static knowledge boundaries and fail to leverage the full potential of LLMs for dynamic and context-aware retrieval.Machine writing lacks such iterative processes, unlike humans, who naturally reorganize and refine their cognitive frameworks through reflective practices. Reflection-based frameworks like OmniThink aim to address these shortcomings by enabling models to adjust retrieval strategies and deepen topic understanding dynamically. Recent research has highlighted the importance of integrating diverse perspectives and reasoning across multiple sources in generating high-quality outputs. While prior methods, such as multi-turn retrieval and roundtable simulations, have progressed in diversifying information sources, they often fail to adapt flexibly as the models understanding evolves.Researchers from Zhejiang University, Tongyi Lab (Alibaba Group), and the Zhejiang Key Laboratory of Big Data Intelligent Computing introduced OmniThink. This machine-writing framework mimics human cognitive processes of iterative reflection and expansion. OmniThink dynamically adjusts retrieval strategies to gather diverse, relevant information by emulating how learners progressively deepen their understanding. This approach enhances knowledge density while maintaining coherence and depth. Evaluated on the WildSeek dataset using a new knowledge density metric, OmniThink demonstrated improved article quality. Human evaluations and expert feedback affirmed its potential for generating insightful, comprehensive, long-form content, addressing key challenges in automated writing.Open-domain long-form generation entails creating detailed articles by retrieving and synthesizing information from open sources. Traditional methods involve two steps: retrieving topic-related data via search engines and generating an outline before composing the article. However, issues like redundancy and low knowledge density persist. OmniThink addresses this by emulating human-like iterative expansion and reflection, building an information tree and conceptual pool to structure relevant, diverse data. Through a three-step processinformation acquisition, outline structuring and article compositionOmniThink ensures logical coherence and rich content. It integrates semantic similarity to retrieve relevant data and refines drafts to produce concise, high-density articles.OmniThink demonstrates outstanding performance in generating articles and outlines, excelling in metrics like relevance, breadth, depth, and novelty, particularly when using GPT-4o. Its dynamic expansion and reflection mechanisms enhance information diversity, knowledge density, and creativity, enabling deeper knowledge exploration. The models outline generation improves structural coherence and logical consistency, attributed to its unique Concept Pool design. Human evaluations confirm OmniThinks superior performance compared to baselines like Co-STORM, especially in breadth. However, subtle improvements in novelty are less evident to human evaluators, highlighting the need for more refined evaluation methods to assess advanced model capabilities accurately.In conclusion, OmniThink is a machine writing framework that mimics human-like iterative expansion and reflection to produce well-structured, high-quality long-form articles. Unlike traditional retrieval-augmented generation methods, which often result in shallow, redundant, and unoriginal content, OmniThink enhances knowledge density, coherence, and depth by progressively deepening topic understanding, similar to human cognitive learning. As automatic and human evaluations confirm, this model-agnostic approach can integrate with existing frameworks. Future work aims to incorporate advanced methods combining deeper reasoning, role-playing, and human-computer interaction, further addressing challenges in generating informative and diverse long-form content.Check out the Paper, GitHub Page, and Project. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our65k+ ML SubReddit.(Promoted) Sana Hassan+ postsSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions. Meet 'Height':The only autonomous project management tool (Sponsored)
    0 Comments ·0 Shares ·117 Views
  • Mastering Data Scaling: The Only Guide Youll Ever Need (Straight from My Journey)
    towardsai.net
    Mastering Data Scaling: The Only Guide Youll Ever Need (Straight from My Journey) 0 like January 19, 2025Share this postAuthor(s): Suraj Jha Originally published on Towards AI. How I Finally Conquered Data Scaling: Learn from My Real-World ExperienceThis member-only story is on us. Upgrade to access all of Medium.Image By AuthorScaling is one of the fundamental steps in data preprocessing.It becomes a must when we are especially working with a dataset that is going to work as an input for machine learning models.Scaling transforms raw data into a suitable format that machine learning models can understand.From improving performance to ensuring meaningful comparisons between features, it manages all perfectly.Lets witness the magic of scaling in data cleaning and data preprocessing.It's better to understand it with a real-world example.Imagine you have a dataset where one of its features is age (ranging from 0 to 60) and another feature is their annual income (ranging from $10,000 to $5,000,000).ML algorithms like gradient descent struggle with this kind of feature because large-scale features often dominate the optimization process. It ultimately leads to skewed results.Scaling makes sure that all features are treated equally by ML models, because that way we can enhance the models accuracy and convergence speed.One scaling type doesnt solve all problems, so we have four types of scaling methods.It is useful when we know the bounds (min and max values) of the data and want to draw the relationship between Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Comments ·0 Shares ·115 Views
  • Meta announces new Edits app amidst TikTok and CapCut ban, launching next month
    9to5mac.com
    Today, Meta is announcing Edits: a new mobile video editing platform. This comes just hours after TikTok (and as a result, CapCut) were banned in the US though TikTok is currently being restored, at least temporarily. Meta has a history of sitting on new features until theyre forced to move, and this is yet another example of that. With a surprise mid-day Sunday announcement, the company announced Edits, a new mobile-first editing tool, meant to serve as an alternative to popular editing tool CapCut.Adam Moserri, head of Instagram, announced the new Edits app today on Threads, describing it as a tool for those of you who are passionate about making videos on your phone. He also stated that Theres a lot going on right now, but no matter what happens its our job to provide the best possible tools for creators, likely in reference to the TikTok and CapCut ban.In terms of features, Edits is going to offer the following, per Moserris post:A dedicated inspiration tabAnother tab to keep track of your ideasA much higher quality camera for shooting videosAll the editing tools youd expectAbility to share drafts with friends and other creatorsEasy export to Instagram, and powerful insights for Reels contentOther than these details, we dont have much to go off of. Hopefully, Meta will share more in the coming weeks. The company says its been testing this out with creators in recent past.Edits is available for pre-order on the App Store starting today, and will be available on iOS sometime in February, according to Meta. However, the App Store pre-order page has the launch listed as March.No timeline is specified for an Android launch, other than the fact that itll come soon.Follow Michael:X/Twitter,Bluesky,InstagramAdd 9to5Mac to your Google News feed. FTC: We use income earning auto affiliate links. More.Youre reading 9to5Mac experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Dont know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel
    0 Comments ·0 Shares ·129 Views
  • New Cybertruck Wrap Adds Solar Panels All Over It
    futurism.com
    Hear us out.Solar WrapWhile Tesla CEO Elon Musk has promised for years to add a solar roof to the Cybertruck, the company has yet to actually offer such an option.In 2019, the year the brutalist pickup was first announced, Musk claimed that solar cells covering the truck's bed could add 15 miles of range per day. Extendable "fold out wings" could add even more, he said.But the concept appears to have been lost to time or the reality of engineering. Fortunately, as convention-goers spotted at this year's Consumer Electronics Show, a company called Sunflare is trying to turn the idea of an at least partially solar-powered Cybertruck into a reality.The company's Solar Car Film makes use of copper indium gallium selenide solar cells (CIGS) to effectively wrap the entire Cybertruck with power-generating panels. YouTubers behind the channel "Electric Revolution," were told that the option will cost $10,000, including a 5 kW battery inverter.How much range such a wrap will add to the car's battery is unclear. Under ideal conditions, they might add anywhere from 12 to 18 miles of range per day, according to Torque News' back-of-the-envelope (and likely generous) calculations.That's not immense, but the wrap if it turns out to be an actual product and not just vaporware could come in handy in a variety of scenarios, such as off-the-grid camping.Trickle ChargeOther roof-based solar charging solutions, such as the panels installed on the roof of Toya's Prius Prime, offer only a handful of miles of range per day,and that's only if the car is in the full sun for hours at a time.The California-based Sunflare has also been developing off-grid power solutions for campers and camper vans.But whether its solar wrap will manage to turn the Cybertruck into an energy-efficient long-range EV remains dubious. The 6,600-pound stainless steel monstrosity has a below-average real-world range of around 230 miles on a single charge, putting it well below the competition.Though the concept is elegant, adding a handful of miles with the help of a solar panel isn't going to flip that equation on its head. At most, it could slow down the rate of "vampire drain," when batteries lose their charge over time when not in use, as some Cybertruck owners have been noticing.That doesn't mean solar panel accessories have no place in the automotive world. Particularly during more adventurous driving and overlanding, access to a steady source of free renewable energy can prove extremely useful.More on the Cybertruck: Tesla Moves Workers Away From Cybertruck Production as Demand SlumpsShare This Article
    0 Comments ·0 Shares ·131 Views
  • Best Coffee Makers for 2025
    www.cnet.com
    Start your day with the perfect cup of coffee, brewed fresh from one of these top coffee makers.
    0 Comments ·0 Shares ·124 Views
  • Never-before-seen Astro Bot level debuts at weekend PlayStation Tournament
    www.eurogamer.net
    Never-before-seen Astro Bot level debuts at weekend PlayStation TournamentBot's not to love?Image credit: Team Asobi / PlayStation News by Vikki Blake Contributor Published on Jan. 19, 2025 Team Asobi has casually dropped a never-before-seen Astro Bot speedrun stage.Debuted at this weekend's PlayStation tournament, the challenge - which you can see below - saw speedrunners explore the unfamiliar level to secure the fastest clear time in a bid to win the final.PlayStation Tournaments: XP | Live from London.Watch on YouTubeThe two finalists were given five attempts to set the fastest time across a devilishly difficult level, using the flower umbrella and slo-mo to speed up the run as much as possible.Team Circle's Rhys secured a win with an astonishing run-time of 34.51.6s, narrowly beating Team Square's equally-impressive 34.84.1s.There was no word from the commentary team on whether the general public will eventually get the chance to run the gauntlet, too, but Bot fans are hopeful Team Asobi hasn't yet given up on patching in a cheeky cameo or two given several games noted in the credits have yet to be discovered. To see this content please enable targeting cookies."I've built my entire working life around [video games], my career dedicated to celebrating the best and highlighting the worst," Tom O wrote in his feature, Astro Bot made the best play for nostalgia this year, and I don't care if you think it's a big advert. "As a job, it's as tiring and exhausting, sometimes exasperating, as any other, and it's easy to lose sight of why I chose this path all those years ago."Just for a moment, in the darkened early hours, a dim light from the DualSense painting the room in an ethereal glow, I was 12 again, and it was incredible. For that, I love Astro Bot, and I guess I love PlayStation too."
    0 Comments ·0 Shares ·124 Views
  • Why Trump's new love of TikTok is dangerous
    www.salon.com
    Not too long ago, Donald Trump wasa big fan of banning TikTok, the Chinese-owned social media app that went offline in the U.S. early Sunday under a controversial ban. On Friday, the Supreme Court upheld the law, passed by bipartisan majorities last April, largely due to concerns that the Chinese government used the platform to spy on Americans. President Joe Biden signed that law, but only four years after Trump, while still president, tried and failed to ban the appthrough executive order. TikTok allows "the Chinese Communist Party access to Americans personal and proprietary information potentially allowing China to track the locations of Federal employees and contractors, build dossiers of personal information for blackmail, and conduct corporate espionage," Trump said in the 2020 order.There's good reason to believe Trump's personal reasons weren't so noble. For one thing, he's racist against Chinese people and apparently believes COVID-19 was somehow their fault, instead of seeing them as the first victims of a mutated virus. However, while U.S. intelligence services are frustratingly tight-lipped about the specific evidence, both common sense and the testimony of more trustworthy politicians who have seen the intel including Biden suggest that the accusation of foreign spying is almost certainly true. Nor is this a "free speech" issue. The right to speak out, even online, has not changed. The government's authority here is to determine what foreign companies are allowed to operate within our borders, a nearly ironclad power.TikTok is good for Trump, and for one simple reason: It is a maelstrom of disinformation so gargantuan that even Elon Musk-controlled Twitter fails to compete.Trump, meanwhile, has changed his tune about TikTok, but not because he disbelieves the intelligence reports or because he is a free trade absolutist. (Hardly that, as his love of tariffs demonstrates.) No, it's because he's learned in the past four years that TikTok is a shockingly efficient disseminator of disinformation, which is Trump's main stock-in-trade. "Im now a big star on TikTok," he bragged in September, vowing to protect the site from being banned. He's also buddied up with the chief executive of the American division of TikTok, Shou Chew, inviting him to join the murder's row of tech billionaires attending the inauguration."Its been a great platform for him and his campaign to get his America first message out," Mike Waltz, an incoming national security advisor to Trump, said Thursday. "We will put measures in place to keep TikTok from going dark." Chew then took to TikTok to publicly credit Trump with working to save the platform.On Sunday, Tik Tok rewarded Trump for his support with blatant propaganda. The app went dark, as expected, but when users tried to open it, they got this message:We are fortunate that President Trump has indicated that he will work with us on a solution to reinstate TikTok once he takes office."TikTok replaces its app with MAGA propaganda[image or embed] Judd Legum (@juddlegum.bsky.social) January 19, 2025 at 7:41 AMWant more Amanda Marcotte on politics? Subscribe to her newsletter Standing Room Only.TikTok is good for Trump, and for one simple reason: It is a maelstrom of disinformation so gargantuan that even Elon Musk-controlled Twitter fails to compete. It's a train wreck of B.S., from people claiming sunscreen and vaccines don't work to bizarre videos claiming demons infect everything to old-fashioned authoritarian lies. The company claims to stand for "free speech," but the Chinese government censors information that doesn't serve its political goals. The algorithm is hidden from public view, but it's easy to see it favors divisive, emotionally manipulative and misleading information. It ratchets up culture war tensions and stokes arguments while undermining people's mental ability to focus on developing solutions. Hundreds of millions of people willingly plug into an app that feeds them the demoralizing propaganda authoritarians have been trying to shove down our throats forever. It's a fascist's dream.The surveillance aspect of TikTok has received more political and legal attention than its efficiency at hijacking people's thoughts and emotions. People don't like to hear they're being manipulated, especially when the manipulation is working. We all want to feel like we're properly skeptical and careful media consumers. Unfortunately, TikTok algorithms expertly exploit that desire, by pumping videos that promise viewers the "real story" and information "they" don't want you to hear. Conspiracy theorists love someone who thinks they're a skeptic.But even when people are rational enough to reject the constant drumbeat of disinformation, there are signs the site is undermining people in subtle ways that are bad for their mental health and the larger body politic. For Slate on Thursday, Scaachi Koul wrote about her attachment to TikTok, describing it as an app that "burned hours of my life" and echoing the refrain popular with users, "All I do on this app is cry for strangers." I have to quote for length to give justice to what sounds frankly overwhelming, though she appears to mean it as praise for the site:Soldiers coming homefrom service, teenagers beinggifted their first car, babies beingnamed after a late uncle. Bleaker, more gut-wrenching videos had this comment, too: videos of orphaned children in Gazawith their arms or legs missing, bodiesshakingwith a fear theyll never lose. I read the same comment on TikToks featuring people who lost their homes in theEaton fire, on that now-viral video of that L.A. residentfinding his dog still alivein the rubble of his home. Its on videos of people sittingalone at their birthday partiesbecause no one came, clips of little kids doing their first somersaults, footage of an elderly woman returning to the house that she used to own, now bombed and decimated.I liked feeling like I could walk into a strangers life and see them on the best day they ever had: a graduation, a birth, an engagement, successfully moving their ex-boyfriend out by throwing all his shit in the yard. Whatever a good day meant to these strangers, I got to witness a little piece of it, usually from the comfort of my own bed, late at night while I ran away from sleep. What was I hoping for in those moments? To borrow others feelings to amplify my own.RelatedThe tech billionaire war on "woke" is really targeting workersKoul defends an "algorithm [that] seemed to want to make me sob" for giving her "the brutality and the beauty of being a person in the world." From my more jaundiced view, however, the experience sounds more like an emotional roller coaster designed to sap constructive energy. That's a lot of people whose emotions she's digesting in 15-second bursts. Those emotions are detached from the context that gives our feelings deeper meaning. Having one long conversation with a good friend almost certainly grounds you deeper into your humanity than a mile-a-minute drivebys of disassociated, ping-ponging emotions from strangers. What is all the feeling for, if you're too drained to do anything about it?I'm not the only skeptic of how the shallow manipulations of TikTok are dissuading people from having more meaningful, if more slow-moving, experiences in the real world. In a long and disturbing Atlantic article about how Americans are spending more time alone than ever before in recorded history, Derek Thompson writes, "A popular trend on TikTok involves 20somethings celebrating in creative ways when a friend cancels plans, often because theyre too tired or anxious to leave the house." While he sympathizes with the occasional need to chill at home, he also notes it's unsettling that it's a wildly popular discourse. Apparently, a lot of folks feel seeing people in the real world is too taxing, and it's easier to refract your urge for connection to an app that offers only an inch-deep simulacrum.This, too, is an authoritarian's dream: people who exhaust all their emotions on an endless hamster wheel of random strangers, while becoming further disconnected from investment in their real-world community. Koul writes, "I cant think of a better use of all thattime" than weeping over people whose names she doesn't know. And not to be a fuddy-duddy, but I can think of many better uses, including using that desire to connect with people to motivate charity work, political organizing, or just throwing a dinner party. These connections give us energy and move us to do more than cry, but to take action.I don't want to pick on Koul, who is a lovely person and clearly has a lot of empathy. That's why I'm so alarmed by TikTok. This isn't Twitter, which is awash in trolls responding to incentives that encourage antisocial emotions like bullying, and is losing users for it. TikTok manipulates people by exploiting their better selves, and repurposing it to ugly ends. The algorithm feeds people endless videos to turn their emotions up high, exhausting their empathy, so they have less to offer those they can actually help. It appeals to people's desire to think for themselves by redirecting that urge to disinformation. Places like Twitter mobilize the worst people, but TikTok does something even more sinister. It demobilizes, distracts, and depresses those who want to do better. No wonder Trump loves it.Read moreabout this topicBy Amanda MarcotteAmanda Marcotte is a senior politics writer at Salon and the author of "Troll Nation: How The Right Became Trump-Worshipping Monsters Set On Rat-F*cking Liberals, America, and Truth Itself." Follow her on Bluesky@AmandaMarcotteand sign up for her biweekly politics newsletter, Standing Room Only.MORE FROM Amanda MarcotteRelated Topics ------------------------------------------ChinaCommentaryDisinformationDonald TrumpMark ZuckerbergMetaRednoteSocial MediaTiktokTiktok Ban
    0 Comments ·0 Shares ·125 Views
  • Turns out the devs behind Crysis made its highest settings so hard to run is because they were thinking about the PC you would own in 2010, not the one your were stuck with in 2007
    www.vg247.com
    You Know The MemeTurns out the devs behind Crysis made its highest settings so hard to run is because they were thinking about the PC you would own in 2010, not the one your were stuck with in 2007Do I even need to ask the question?Image credit: Crytek News by Oisin Kuhnke Contributor Published on Jan. 19, 2025 Crysis was such an infamously hard game to run at its max settings it became a meme, but apparently you weren't even meant to do so at release."Can it run Crysis?" That's probably a meme you saw everywhere in gaming spaces about a decade ago, in fact it's so prolific the only thing I know about the game is that it was meant to be hard to run, I have no idea what the story even is. But with the advancements since its release almost two decades ago in 2007, it's not really a joke that works anymore because most computers can, in fact, run Crysis now. As it turns out, though, that kind of works out, because the developers behind it had intended it to be future proof, so that it would always stand the test of time as soon as hardware got better.To see this content please enable targeting cookies. Speaking to PC Gamer as part of a larger retrospective on the game, Crysis director and Crytek founder Cevat Yerli spoke of the meme and the game's legacy, saying that he wanted to "make sure Crysis does not age, that [it] is future proofed, meaning that if I played it three years from now, it should look better than today." The game's highest graphic settings were designed with hardware you might have in 2010 and further, even though of course people still tried to crank them up back in 2007. "A lot of people tried to maximize Crysis immediately. And Im like, 'Oh, thats not why we built the Ultra mode, or Very High'."Luckily, as prevalent as the joke once was, Yerli never took it to heart in any particular way, saying, "It was this ambivalent kind of meme that was good and bad, but I actually enjoyed it. Last year, Jensen [Huang] for Nvidia announced a new GPU, and they said, 'Yes, and it can run Crysis.'" Maybe if you played the game all the way back in 2007 on medium settings you can finally find out if your rig can run Crysis (I really imagine it can).
    0 Comments ·0 Shares ·131 Views
  • AI benchmarking organization criticized for waiting to disclose funding from OpenAI
    techcrunch.com
    An organization developing math benchmarks for AI didnt disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AIs mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.In a post on the forum LessWrong, a contractor for Epoch AI going by the username Meemi says that many contributors to the FrontierMath benchmark werent informed of OpenAIs involvement until it was made public. The communication about this has been non-transparent, Meemi wrote. In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.On social media, some users raised concerns that the secrecy could erode FrontierMaths reputation as an objective benchmark. In addition to backing FrontierMath, OpenAI had access to many of the problems and solutions in the benchmark a fact Epoch AI didnt divulge prior to December 20, when o3 was announced.In a reply to Meemis post, Tamay Besiroglu, associate director of Epoch AI and one of the organizations co-founders, asserted that the integrity of FrontierMath hadnt been compromised, but admitted that Epoch AI made a mistake in not being more transparent. We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible, Besiroglu wrote. Our mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.Besiroglu added that while OpenAI has access to FrontierMath, it has a verbal agreement with Epoch AI not to use FrontierMaths problem set to train its AI. (Training an AI on FrontierMath would be akin to teaching to the test.) Epoch AI also has a separate holdout set that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.OpenAI has been fully supportive of our decision to maintain a separate, unseen holdout set, Besiroglu wrote. However, muddying the waters, Epoch AI lead mathematician Ellot Glazer noted in a post on Reddit that Epoch AI hasnt be able to independently verify OpenAIs FrontierMath o3 results.My personal opinion is that [OpenAIs] score is legit (i.e., they didnt train on the dataset), and that they have no incentive to lie about internal benchmarking performances, Glazer said. However, we cant vouch for them until our independent evaluation is complete.The saga is yet another example of the challenge of developing empirical benchmarks to evaluate AI and securing the necessary resources for benchmark development without creating the perception of conflicts ofinterest.
    0 Comments ·0 Shares ·111 Views