• Alibabas Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI
    venturebeat.com
    Alibaba's Qwen2.5-Max AI model sets new performance benchmarks in enterprise-ready artificial intelligence, promising reduced infrastructure costs and improved efficiency for business applications.Read More
    0 Комментарии ·0 Поделились ·33 Просмотры
  • Jack Dorsey is back with Goose, a new, ultra simple open source AI agent building platform from his startup Block
    venturebeat.com
    Gooses capabilities are enhanced through Blocks collaboration with Anthropic, a leading AI safety and research company.Read More
    0 Комментарии ·0 Поделились ·31 Просмотры
  • Square Enix condemns Final Fantasy 14 "stalking" mod and threatens legal action
    www.gamesindustry.biz
    Square Enix condemns Final Fantasy 14 "stalking" mod and threatens legal actionProducer/director Naoki Yoshida confirms it cannot be used to access address or payment information News by Vikki Blake Contributor Published on Jan. 28, 2025 Final Fantasy 14 producer and director Naoki Yoshida has issued a statement to players condemning the "Playerscope" mod accused of tracking players across characters without consent.In a statement, Yoshida confirmed the existence of "third-party tools that are being used to check [Final Fantasy] 14 character information that is not displayed during normal game play," acknowledging it can be used to disclose a segment of an 14 character's internal account ID, which is then used in an attempt to further correlate information on other characters on the same account.According to IGN, the only way to opt-out of having your information scraped is to join the tool's private Discord channel.In response, Square Enix's development and operations teams have requested the tool in question be removed and deleted from public mod depositories - which is seemingly has, although it could still be shared privately - and is now pursuing legal action against the creator(s)."Aside from character information that can be checked in-game and on the Lodestone, we have received concerns that personal information registered on a users Square Enix account, such as address and payment information, could also be exposed with this tool," Yoshida explained."Please rest assured that it is not possible to access this information using these third-party tools."We strive to offer and maintain a safe environment for our players, which is why we ask everyone to refrain from using third-party tools. We also ask that players do not share information about third-party tools such as details about their installation methods, or take any other actions to assist in their dissemination."Yoshida closed by reminding players that the use of third-party tools is prohibited by the final Fantasy 14's user agreement.
    0 Комментарии ·0 Поделились ·31 Просмотры
  • Staff take to LinkedIn to reveal "mass layoffs" at Moon Active
    www.gamesindustry.biz
    Staff take to LinkedIn to reveal "mass layoffs" at Moon ActiveRedundancies primarily affect Lithuanian office, although staff from Poland, Armenia, Ukraine, Trkiye, and Israel also impactedImage credit: Moon Active Games News by Vikki Blake Contributor Published on Jan. 28, 2025 Moon Active has reportedly let go of at least 20 employees, primarily from its Lithuanian office, although staff from Poland, Armenia, Ukraine, Trkiye, and Israel have also been impacted.The Lithuanian team had once worked for Melsoft before Moon acquired it in 2020.Whilst the company has yet to formally confirm the layoffs, App2Top reports dozens of former staff "simultaneously" announced they were looking for work on LinkedIn, and a Glassdoor whistleblower revealed "it was nice to work there for five years until [...] mass layoffs of [sic] January 2025."In November 2021, Coin Master developer Moon Active raised $300 million in funding.The swath of job cuts from last year seems to be continuing in 2025. In January alone, we've reported on the closure of Freejam, Splash Damage, Piranha Games, Jar of Sparks, as well as 185 jobs cut by Ubisoft.Earlier today, we reported that Fast Travel Games had cut 30 jobs, primarily in its publishing, marketing, and admin departments, and Phoenix Labs laif off "majority" of workforce.
    0 Комментарии ·0 Поделились ·33 Просмотры
  • Astro Bot and Black Myth: Wukong take the lead in GDCA 2025 nominations
    www.gamedeveloper.com
    We've reached the 25th year of theGame Developers Choice Awards, and we're celebrating with our incredible group of finalistsfeaturing a robotic adventure and Chinese mythology.Organizers of theGame Developers Conferencehave unveiled this year's finalists for the 25th annualGame Developers Choice Awards, the leading video game awards that are nominated by, voted on, and decided by video game developers within the industry. It takes place atGDC 2025, thepremier industry event for game developers and professionals.Save Up To $400 On Your GDC Pass (Through Jan. 31)This year's nominations are led byAstro Bot(Team ASOBI / Sony Interactive Entertainment) andBlack Myth: Wukong(Game Science), receiving seven nominations each. Both are finalists for the highly coveted Game of the Year Award, alongsideBalatro(LocalThunk / Playstack),Helldivers 2(Arrowhead Game Studios / PlayStation Publishing LLC),Final Fantasy VII Rebirth(Square Enix), andMetaphor: ReFantazio(ATLUS / SEGA / Studio Zero).Astro Bot, developed by Team ASOBI and published by Sony Interactive Entertainment, is a 3D platforming title that was released alongside PlayStations 30th anniversary. It's a love letter to the many iconic characters and worlds from PlayStations wide roster of franchises. It has been praised by fans and critics alike for its unique and energetic level design, incredible soundtrack, smart integration with the PlayStation 5s hardware, and charming aesthetics.Black Myth: Wukong, developed and published by Game Science, is an action RPG rooted in Chinese mythology and inspired by the literary classicJourney to the West. The game features skill-based combat and challenging gameplay that fans of the soulslike subgenre have come to appreciate.Other notable finalists includeMetroidvania titleAnimal Well(Billy Basso / Bigmode) with five nominations and two honorable mentions, poker-inspired roguelike deck builderBalatro(Local Thunk / Playstack) with four nominations and two honorable mentions, andIGF 2024finalist1000xRESIST(Sunset Visitor / Fellow Traveller) with three nominations.In addition to the categories listed below, the GDC Audience is encouraged to cast their votes for their favorite game of the year with theAudience Award.The Game Developers Choice Awards is part ofGDC 2025, which will be held in San Francisco March 17-21. The ceremony takes place onWednesday, March 19right after the2025 Independent Games Festival Awards, which starts at6:30pm PT.Here's this year's list of finalists and honorable mentions:Best AudioAnimal Well(Billy Basso / Bigmode)Astro Bot(Team ASOBI / Sony Interactive Entertainment)Black Myth: Wukong(Game Science)Final Fantasy VII Rebirth(Square Enix)Senua's Saga: Hellblade II(Ninja Theory / Xbox Game Studios)Honorable Mentions:Balatro(LocalThunk / Playstack),Lorelei and the Laser Eyes(Simogo / Annapurna Interactive),Metaphor: ReFantazio(ATLUS / SEGA / Studio Zero)Neva(Nomada Studio / Devolver Digital),Silent Hill 2(Bloober Teams SA / KONAMI)Best Debut1000xRESIST(Sunset Visitor / Fellow Traveller)Animal Well(Billy Basso / Bigmode)Balatro(LocalThunk / Playstack)Pacific Drive(Ironwood Studios / Kepler Interactive)Tiny Glade(Pounce Light)Honorable Mentions:Manor Lords(Slavic Magic / Hooded Horse),Mullet Madjack(HAMMER95 / Epopeia Games),The Plucky Squire(All Possible Futures / Devolver Digital),Tiny Glade(Pounce Light)Best DesignAnimal Well(Billy Basso / Bigmode)Astro Bot(Team ASOBI / Sony Interactive Entertainment)Balatro(LocalThunk / Playstack)Black Myth: Wukong(Game Science)Lorelai and the Laser Eyes(Simogo / Annapurna Interactive)Honorable Mentions:Final Fantasy VII Rebirth(Square Enix),Helldivers 2(Arrowhead Game Studios / PlayStation Publishing LLC),Satisfactory(Coffee Stain Studios / Coffee Stain Publishing),The Legend of Zelda: Echoes of Wisdom(Grezzo, Nintendo Entertainment Planning & Development / Nintendo),UFO 50(Mossmouth)Innovation AwardAnimal Well(Billy Basso / Bigmode)Astro Bot(Team ASOBI / Sony Interactive Entertainment)Balatro(LocalThunk / Playstack)Black Myth: Wukong(Game Science)UFO 50(Mossmouth)Honorable Mentions:Helldivers 2(Arrowhead Game Studios / PlayStation Publishing LLC),Lorelei and the Laser Eyes(Simogo / Annapurna Interactive),Thank Goodness You're Here!(Coal Supper /Panic),The Plucky Squire(All Possible Futures / Devolver Digital),Tiny Glade(Pounce Light)Best Narrative1000xRESIST(Sunset Visitor / Fellow Traveller)Black Myth: Wukong(Game Science)Like a Dragon: Infinite Wealth(Ryu Ga Gotoku Studio / SEGA)Metaphor: ReFantazio(ATLUS / SEGA / Studio Zero)Mouthwashing(Wrong Organ / Critical Reflex)Honorable Mentions:Final Fantasy VII Rebirth(Square Enix),Frostpunk 2(11 bit studios),Life is Strange: Double Exposure(Deck Nine Games / Square Enix),Neva(Nomada Studio / Devolver Digital),Senua's Saga: Hellblade II(Ninja Theory / Xbox Game Studios)Best TechnologyAstro Bot(Team ASOBI / Sony Interactive Entertainment)Black Myth: Wukong(Game Science)Helldivers 2(Arrowhead Game Studios / PlayStation Publishing LLC)Senua's Saga: Hellblade II(Ninja Theory / Xbox Game Studios)Tiny Glade(Pounce Light)Honorable Mentions:Animal Well(Billy Basso/ Bigmode),Call of Duty: Black Ops 6(Threyarch, Raven Software, Beenox, High Moon Studios, Activision Shanghai, Sledgehammer Games, Infinity Ward, Demonware /Activision),Dragon Age: The Veilguard(BioWare / Electronic Arts),Satisfactory(Coffee Stain Studios / Coffee Stain Publishing),Tekken 8(Bandai Namco Studios Inc / Bandai Namco Entertainment)Social Impact1000xRESIST(Sunset Visitor / Fellow Traveller)Astro Bot(Team ASOBI / Sony Interactive Entertainment)Frostpunk 2(11 bit studios)Life is Strange: Double Exposure(Deck Nine Games / Square Enix)Neva(Nomada Studio / Devolver Digital)Honorable Mentions:Closer the Distance(Osmotic Studios / Skybound Games),Distant Bloom(Ember Trail / Kina Brave),Dragon Age: The Veilguard(BioWare / Electronic Arts),Tales of Kenzera: Zau(Surgent Studios / Electronic Arts)Game of the YearAstro Bot(Team ASOBI / Sony Interactive Entertainment)Balatro(LocalThunk / Playstack)Black Myth: Wukong(Game Science)Final Fantasy VII Rebirth(Square Enix)Helldivers 2(Arrowhead Game Studios / PlayStation Publishing LLC)Metaphor: ReFantazio(ATLUS / SEGA / Studio Zero)Honorable Mentions:Animal Well(Billy Basso / Bigmode),Helldivers 2(Arrowhead Game Studios / PlayStation Publishing LLC),Like a Dragon: Infinite Wealth(Ryu Ga Gotoku Studio / SEGA),Satisfactory(Coffee Stain Studios / Coffee Stain Publishing),UFO 50(Mossmouth)Early Rates Have Been Extended Through Jan. 31, Don't Miss Out!Any video game that was released and made publicly available during the 2024 calendar year is eligible for free nomination for the 2025 Game Developers Choice Awards. All nominees and winners are selected by the Game Developers Choice Awards-specific International Choice Awards Network (ICAN), which is an invitation-only organization, composed of leading game creators from all parts of the industry.Winners will be announced at the GDCA ceremony on Wednesday, March 19 (with a simultaneous broadcast onGDC Twitch). Both the GDCA and IGF ceremonies are available to watch for allGDC 2025 passholders.GDC returns this March 17-21, 2025, and our Early Rate is currently live (for a limited time)! For more information, be sure to visit ourwebsiteand follow the#GDC2025hashtag on social media.Subscribe to the GDCnewsletterandRSSfeed.Get regular updates onGDC social mediaincluding Bluesky, Instagram, TikTok, and LinkedIn.GDC and Game Developer are affiliated organizations under Informa.
    0 Комментарии ·0 Поделились ·32 Просмотры
  • Climate change made the Los Angeles wildfires more likely
    www.theverge.com
    Climate change helped to set the stage for the devastating Los Angeles fires this month, a new study by 32 researchers shows. The Palisades and Eaton wildfires broke out in early January and soon killed at least 28 people, destroying 16,000 structures. Hot, dry conditions and extraordinarily powerful winds fanned the flames.Those conditions were made about 35 percent more likely because of greenhouse gas emissions from fossil fuels warming the planet, according to the study. Fire risk will only grow unless the pollution causing climate change stops. Realistically, this was a perfect storm when it comes to conditions for fire disasters, John Abatzoglou, professor of climatology at the University of California, Merced, said in a press call today. This was a perfect storm when it comes to conditions for fire disasters.In todays climate, the extreme weather that drove January infernos can be expected about every 17 years, according to the study.The study was conducted by the World Weather Attribution initiative, an international collaboration of scientists that researches the role that climate change plays in disasters around the world. They look at historical weather data and climate models to compare real-world scenarios to what likely would have happened if the planet wasnt 1.3 degrees Celsius warmer today, on average, than it was before the Industrial Revolution.If the planet warms by another 1.3 degrees Celsius, which could happen in 75 years under current policies, the kind of weather that exacerbated the fires this month becomes another 35 percent more probable. The length of the dry season in the region has already grown by about 23 days, according to the researchers. That increases the chances of arid weather coinciding with the powerful Santa Ana winds that typically pick up in cooler months.While those winds return each year, they were catastrophically strong this month reaching hurricane strength at upwards of 100 miles per hour. For now, scientists dont have enough research to know how climate change affected the Santa Ana winds, specifically. Their research only shows that fire season is encroaching more into windy season because of climate change, and that made these fires more likely.See More:
    0 Комментарии ·0 Поделились ·31 Просмотры
  • ByteDance Introduces UI-TARS: A Native GUI Agent Model that Integrates Perception, Action, Reasoning, and Memory into a Scalable and Adaptive Framework
    www.marktechpost.com
    GUI agents seek to perform real tasks in digital environments by understanding and interacting with graphical interfaces such as buttons and text boxes. The biggest open challenges lie in enabling agents to process complex, evolving interfaces, plan effective actions, and execute precision tasks that include finding clickable areas or filling text boxes. These agents also need memory systems to recall past actions and adapt to new scenarios. One significant problem facing modern, unified end-to-end models is the absence of integrated perception, reasoning, and action within seamless workflows with high-quality data encompassing this breadth of vision. Lacking such data, these systems can hardly adapt to a diversity of dynamic environments and scale.Current approaches to GUI agents are mostly rule-based and heavily dependent on predefined rules, frameworks, and human involvement, which are not flexible or scalable. Rule-based agents, like Robotic Process Automation (RPA), operate in structured environments using human-defined heuristics and require direct access to systems, making them unsuitable for dynamic or restricted interfaces. Framework-based agents use foundation models like GPT-4 for multi-step reasoning but still depend on manual workflows, prompts, and external scripts. These methods are fragile, need constant updates for evolving tasks, and lack seamless integration of learning from real-world interactions. The models of native agents try to bring together perception, reasoning, memory, and action under one roof by reducing human engineering through end-to-end learning. Still, these models rely on curated data and training guidance, thus limiting their adaptability. The approaches do not allow the agents to learn autonomously, adapt efficiently, or handle unpredictable scenarios without manual intervention.To address the challenges faced in GUI agent development, the researchers from ByteDance Seed and Tsinghua University, proposed the UI-TARS framework to boost native GUI agent models. It integrates enhanced perception, unified action modeling, advanced reasoning, and iterative training, which helps reduce human intervention with improved generalization. It enables detailed understanding with precise captioning of interface elements using a large dataset of GUI screenshots. This introduces a unified action space to standardize platform interactions and utilizes extensive action traces to enhance multi-step execution. The framework also incorporates System-2 reasoning for deliberate decision-making and iteratively refines its capabilities through online interaction traces.Researchers designed the framework with several key principles. Enhanced perception was used to ensure that GUI elements are recognized accurately by using curated datasets for tasks such as element description and dense captioning. Unified action modeling links the element descriptions with spatial coordinates to achieve precise grounding. System-2 reasoning was integrated to incorporate diverse logical patterns and explicit thought processes, guiding deliberate actions. It utilized iterative training for dynamic data gathering and interaction refinement, identification of error, and adaptation through reflection tuning for robust and scalable learning with less human involvement.Researchers tested the UI-TARS trained on a corpus of about 50B tokens along various axes, including perception, grounding, and agent capabilities. The model was developed in three variants: UI-TARS-2B, UI-TARS-7B, and UI-TARS-72B, along with extensive experiments validating their advantages. Compared to baselines like GPT-4o and Claude-3.5, UI-TARS performed better in benchmarks measuring perception, such as VisualWebBench and WebSRC. UI-TARS outperformed models like UGround-V1-7B in grounding across multiple datasets, demonstrating robust capabilities in high-complexity scenarios. Regarding agent tasks, UI-TARS excelled in Multimodal Mind2Web and Android Control and environments like OSWorld and AndroidWorld. The results highlighted the importance of system-1 and system-2 reasoning, with system-2 reasoning proving beneficial in diverse, real-world scenarios, although it required multiple candidate outputs for optimal performance. Scaling the model size improved reasoning and decision-making, particularly in online tasks.In conclusion, the proposed method, UI-TARS, advances GUI automation by integrating enhanced perception, unified action modeling, system-2 reasoning, and iterative training. It achieves state-of-the-art performance, surpassing previous systems like Claude and GPT-4o, and effectively handles complex GUI tasks with minimal human oversight. This work establishes a strong baseline for future research, particularly in active and lifelong learning areas, where agents can autonomously improve through continuous real-world interactions, paving the way for further advancements in GUI automation.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our70k+ ML SubReddit. Divyesh Vitthal Jawkhede+ postsDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges. Meet 'Height':The only autonomous project management tool (Sponsored)
    0 Комментарии ·0 Поделились ·32 Просмотры
  • TAI #137: DeepSeek r1 Ignites Debate: Efficiency vs. Scale and China vs. US in the AI Race
    towardsai.net
    Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by LouieThis weeks AI discourse centered on DeepSeeks r1 release, which sparked a heated debate about its implications for OpenAI, GPUs, and the broader industry. Meanwhile, Google quietly rolled out an improved version of its own reasoning model Gemini Flash 2.0 Thinking, improving its AIME benchmark score to 73.3% (from ~64% in December). OpenAIs announcement of its planned $500B Stargate data center project a collaboration with SoftBank and Oracle painted a contrasting picture: while DeepSeek refined efficiency, OpenAI appears to be doubling down on scale.We have often covered Deepseeks model releases and technical innovations over the past year, and last week; I outlined r1s reinforcement learning (RL)-driven reasoning, 30x lower API costs than OpenAIs o1, and successful distillation into smaller models. This week, Deepseeks models went viral, its chatbot leaped to the top of the app store, and reactions oscillated between OpenAI is obsolete and DeepSeeks training costs are faked. In particular, many people cottoned on to Deepseeks impressive training costs (just $5.6m direct compute cost for v3 for the final model run announced in December) and lower inference prices for r1 vs OpenAI o1. This led many to question whether huge $bn training clusters are still needed and whether the US has lost its AI lead.We think much of the sudden reaction is overblown, and its entertaining that the r1 price reduction hits the media while the invention and consequences of reasoning models themselves have still gone largely unreported. Deepseek has a very impressive research team with a productive culture and structure (including high vertical integration and fewer silos between teams), but we think the US still has more leading AI researchers and companies. The main difference is that the best AI researchers in the US work for companies that are not GPU poor, and expertise has been prioritized to scale quickly over first principles-led improvements and tweaks to LLM architectures and methods. In China, the best researchers instead flocked to Deepseek, which is still GPU-poor (relatively speaking, due to sanctions) and has been focusing on finding next-generation methods to improve training and inference efficiency. Gains from their 10+ breakthroughs over the last 2 years (all publicly shared, many of them already included in the v2 release over 8 months ago) added up to what looks like a cost-efficiency advantage vs. US labs. However, OpenAI and Anthropic reportedly have a 70%+ gross margin, while Deepseek CEO said they price close to cost, so v3 is not 10x more efficient than 4o and r1, not 30x vs. o1 as prices would imply. Nevertheless, it is still significant that such a capable model is now made available open-source and that it could be trained and served at such an affordable price.Why should you care?We think its great to see new LLM techniques, efficiency gains, and such a strong open-source reasoning model. Hopefully, the release will pressure OpenAI to also show its o1 reasoning tokens, reduce prices, and release the much stronger o3! We also see huge potential for the open source community to build on top of these models and, in particular, in reinforcement fine-tuning these models for new domains.However, we dont think this is the end of building larger training clusters. Scaling laws still hold and all else equal, the more compute we put in, the more capable models we get out. Algorithmic and technique efficiency gains on top of this just means we get more out of our GPU clusters. It doesnt mean we wont get even more from larger training runs. More compute still stacks capability on top of all other improvements so there is no loss of incentive to have the biggest cluster. It is no surprise to see OpenAI hoping to push towards their $500bn Stargate data center plan! The main news over the past 4 months is just that now we also have new test-time compute scaling laws, which is yet another vector to scale both during training via the RL process and synthetic data generation and at inference time.Lost somewhat in the noise of the pricing a potentially much more significant aspect of r1 we noted this week is that despite being trained via rewards for solving math and LeetCode problems, it has also demonstrated significant improvements in creative writing. The r1 model now tops the eqbench leaderboard for Creative Writing with large gains over V3. We have also heard from many people who are finding Gemini Flash-Thinking 2.0 better than Gemini Pro 2.0 for creative writing tasks. This raises the question of just how far the generalization potential of this new paradigm of reasoning LLMs can take us.Hottest News1. OpenAI Launches Operator, an AI Agent That Performs Tasks AutonomouslyOpenAI launched a research preview of Operator, a general-purpose AI agent that can take control of a web browser and independently perform specific actions. Operator is powered by a new model called a Computer-Using Agent (CUA). Combining GPT-4os vision capabilities with advanced reasoning through reinforcement learning, CUA is trained to interact with the buttons, menus, and text fields people see on a screen.2. Google Releases Update to Gemini 2.0 Flash Thinking ModelGoogle quietly released another update to its own reasoning model Gemini 2.0 Flash Thinking first released in late December. Similar to Deepseek R1, and unlike OpenAI o1, the Flash Thinking model displays its reasoning process. The model is currently available for free while in its experimentation stage. The model climbed to a score of 73.3% on AIME (vs. ~64% in December) and 74.2% on the GPQA Diamond science benchmark (vs. ~66% in December and 58.6% for the non-reasoning Flash 2.0 model).3. Anthropic Introduces Citations To Reduce AI ErrorsAnthropic unveiled a new feature for its developer API called Citations. This feature lets Claude ground its answers in source documents. It provides detailed references to the exact sentences and passages used to generate responses, leading to more verifiable, trustworthy outputs. Citations are available only for Claude 3.5 Sonnet and Claude 3.5 Haiku. Additionally, Citations may incur charges depending on the length and number of the source documents.4. Hugging Face Shrinks AI Vision Model SmolVLM to Phone-Friendly SizeHugging Face introduced vision-language models that run on devices as small as smartphones while outperforming their predecessors, which required massive data centers. The companys new SmolVLM-256M model, requiring less than one gigabyte of GPU memory, surpasses the performance of its Idefics 80B model from just 17 months ago a system 300 times larger.5. OpenAI Teams Up With SoftBank and Oracle on $500B Data Center ProjectOpenAI announced it is teaming up with Japanese conglomerate SoftBank and with Oracle, among others, to build multiple data centers for AI in the U.S. The joint venture, the Stargate Project, intends to invest $500 billion over the next four years to build new AI infrastructure for OpenAI in the United States.6. Google Invests Further $1Bn in OpenAI Rival AnthropicGoogle is reportedly investing over $1 billion in Anthropic. This new investment is separate from the companys earlier reported funding round of nearly $2 billion earlier this month, led by Lightspeed Venture Partners, to bump the companys valuation to about $60 billion.Five 5-minute reads/videos to keep you learning1. Building Effective AgentsThis post combines everything Anthropic has learned from working with customers and building agents. It also shares practical advice for developers on building effective agents. The post covers when and how to use agents, the workflow, and more.2. Inside DeepSeek-R1: The Amazing Model that Matches GPT-o1 on Reasoning at a Fraction of the CostOne dominant reasoning thesis is that big models are necessary to achieve reasoning. DeepSeek-R1 challenges that thesis by matching the performance of GPT-o1 at a fraction of the compute cost. This article explores the technical details of the DeepSeek-R1 architecture and training process, highlighting key innovations and contributions.3. Agents Are All You Need vs. Agents Are Not Enough: A Dueling Perspective on AIs FutureThe rapid evolution of AI has sparked a compelling debate: Are autonomous agents sufficient to tackle complex tasks, or do they require integration within broader ecosystems to achieve optimal performance? As industry leaders and researchers share insights, the divide between these perspectives has grown more pronounced. This article presents arguments for both sides and provides a middle ground.4. 10 FAQs on AI Agents: Decoding Googles Whitepaper in Simple TermsThe future of AI agents holds exciting advances, and weve only scratched the surface of what is possible. This article explores AI agents by diving into Googles Agents whitepaper and addressing the ten most common questions about them.5. Image Segmentation Made Easy: A Guide to Ilastik and EasIlastik for Non-ExpertsTools like Ilastik and EasIlastik empower users to perform sophisticated image segmentation without writing a single line of code. This article explores what makes them so powerful, walks through how to use them, and shows how they can simplify image segmentation tasks, no matter your level of experience.6. Why Everyone in AI Is Freaking Out About DeepSeekOnly a handful of people knew about DeepSeek a few days ago. Yet, thanks to the release of DeepSeek-R1, its been arguably the most discussed company in Silicon Valley in the last few days. This article explains what has led to this popularity.Repositories & ToolsOpen R1 is a fully open reproduction of DeepSeek-R1.PaSa is an advanced paper search agent powered by LLMs that can autonomously make a series of decisions.Top Papers of The Week1. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningDeepSeek-R1-Zero and DeepSeek-R1 are reasoning models that perform comparable to OpenAI-o11217 on reasoning tasks. DeepSeek-R1-Zero is trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), while DeepSeek-R1 incorporates multi-stage training and cold-start data before RL. Available in sizes 1.5B, 7B, 8B, 14B, 32B, and 70B, DeepSeek-R1-Zero and DeepSeek-R1 are open-sourced and distilled from DeepSeek-R1 based on Qwen and Llama.2. Humanitys Last ExamHumanitys Last Exam (HLE) is a multi-modal benchmark designed to be the final closed-ended academic benchmark with broad subject coverage. HLE is developed by subject-matter experts and comprises 3,000 multiple-choice and short-answer questions across dozens of subjects, including mathematics, humanities, and the natural sciences. Each question has a known, unambiguous, and easily verifiable solution that cannot be quickly answered via Internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE.3. Evolving Deeper LLM ThinkingThis paper explores an evolutionary search strategy for scaling inference time compute in LLMs. It proposes a new approach, Mind Evolution, that uses a language model to generate, recombine, and refine candidate responses. Controlling for inference cost, Mind Evolution significantly outperforms other inference strategies, such as Best-of-N and Sequential Revision, in natural language planning tasks.4. Agent-R\xspace: Training Language Model Agents to Reflect via Iterative Self-TrainingThis paper proposes an iterative self-training framework, Agent-R, that enables language Agents to Reflect on the fly. It leverages Monte Carlo Tree Search (MCTS) to construct training samples that recover correct trajectories from erroneous ones. It introduces a model-guided critique construction mechanism: the actor model identifies the first error step in a failed trajectory. Next, it is spliced with the adjacent correct path, which shares the same parent node in the tree.5. Reasoning Language Models: A BlueprintThis paper proposes a comprehensive blueprint that organizes reasoning language model (RLM) components into a modular framework based on a survey and analysis of all RLM works. It incorporates diverse reasoning structures, reasoning strategies, RL concepts, supervision schemes, and other related concepts. It also provides detailed mathematical formulations and algorithmic specifications to simplify RLM implementation.Quick Links1. Meta AI releases Llama Stack 0.1.0, the first stable release of a unified platform designed to simplify building and deploying generative AI applications. The platform offers backward-compatible upgrades, automated provider verification, and a consistent developer experience across local, cloud, and edge environments. It addresses the complexity of infrastructure, essential capabilities, and flexibility in AI development.2. Perplexity launched Sonar, an API service that allows enterprises and developers to integrate the startups generative AI search tools into their applications. Perplexity currently offers two tiers for developers: a cheaper and faster base version, Sonar, and a pricier version, Sonar Pro, which is better for tough questions.Whos Hiring in AIDeveloper and Technical Communications Lead @Anthropic (Multiple US Locations/Hybrid)AI Algorithm Intern @INTEL (Poland/Hybrid)Software Developer 3 @Oracle (Austin, TX, United States)Data Scientist @Meta (Seattle, WA, USA)Junior Software Engineer @Re-Leased (Napier, New Zealand)Designated Technical Support Engineer @Glean (Palo Alto, CA, USA)Gen AI Engineer | LLMOps @NEORIS (Spain)Interested in sharing a job opportunity here? Contact [emailprotected].Think a friend would enjoy this too? Share the newsletter and let them join the conversation.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Комментарии ·0 Поделились ·33 Просмотры
  • Introduction to Machine Learning
    towardsai.net
    Introduction to Machine Learning 0 like January 28, 2025Share this postAuthor(s): Carlos da Costa Originally published on Towards AI. Lay the foundation for your machine learning journey with this comprehensive introductionThis member-only story is on us. Upgrade to access all of Medium.Photo by Arseny Togulev on UnsplashWhen we hear about Machine Learning, our minds often jump to exciting technologies like ChatGPT, Gemini, and other generative AI tools. While these applications are impressive, they all share the same foundational principles of machine learning. To truly understand and harness the power of these innovations, its essential to build a strong foundation in the basics. In this blog post, well introduce the fundamental concepts of machine learning, providing a beginner-friendly introduction to machine learning for anyone eager to explore the world of machine learning and Artificial Intelligence.In this article, we will cover:What is Machine Learning?Training and testing setTypes of machine learningChallenges in Machine LearningDownload bellow the machine learning cheat sheet and roadmap!Unlock the world of machine learning with this comprehensive beginner's guide! Featuring both a detailed cheat sheetdaviddacosta.gumroad.comIn simple terms, machine learning is the science of teaching a computer to learn from data without being explicitly programmed.Image by the author created with napkin.aiConsider a program designed to predict whether it will rain. In traditional programming, we would manually create rules to evaluate weather patterns. For example, we might check temperature, wind speed, cloud Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Комментарии ·0 Поделились ·33 Просмотры
  • The Rings of Power Season 2 Viewership Data Isnt Great News for the Shows Five-Season Plan
    www.denofgeek.com
    A new 2024 year-end report from streaming analytics company Luminate raised more than a few eyebrows when it released earlier this month. Among its many revelations (via Deadline) is the less-than-stellar data concerning major franchise streaming series released last year. According to the report, all Marvel and Star Wars series underperformed last year in terms of total minutes watched, including Echo and Agatha All Along, as well as the cancelled The Acolyte, when compared to previous streaming series in their respective franchises. Also add to the list:The Rings of Power season 2, which saw a 60% drop in total minutes watched when compared to season 1, which itself reportedly struggled to retain viewers all the way through to its finale in 2022. While its true that season 2 was always going to struggle to reach the meteoric numbers of season 1s early episodes, when anticipation and interest was at its highest for the blockbuster series, this is still a concerning downturn for an Amazon show which is also incredibly expensive to make and that has struggled to win over Lord of the Rings fans at large.Lets be clear here: Amazon is currently full steam ahead on The Rings of Power season 3. The show is still one of the most-watched series on Prime Video, with Amazon studio head Jennifer Salke saying in October that season 2 had been watched by over 55 million viewers (although its unclear how the company defines a viewer). Salke also said at the time that she expected season 2 to catch up to the first season, which had over 150 million viewers on the service. The second season remained in the top five in Nielsens list of most-watched original streaming series during its run, with the three-episode premiere topping the list in its first week (although with a 19% drop when compared to season 1s massive premiere). It was Prime Videos third highest opening week in the U.S. behind the Fallout and The Boys season 4 premieres, according to THR.Season 3 has also been in the works since last year. The answer is yes, were very excited, but we cant say anything other than were working on season 3, co-showrunner Patrick McKay confirmed when Total Film asked for a status update in December. We have a story we think is really strong, and were hoping to turn it around as fast as possible.The Rings of Power is planned as a five-season series with an estimated price tag of $1 billion. According to a report from THR in September, Amazon is still committed to that plan. Yet, its hard not to wonder what could happen if season 3 continues the downward trend in terms of how much time the audience is spending actually watching the series. With such a high production cost on the studios budget sheet, could the plan eventually change if the number of people watching and how long theyre watching becomes untenable?This desire to paint the show as anything less than a successits not reflective of any conversation Im having internally, Salke told THR in 2022 in response to a similarly concerning viewership report stating that less than 50% of viewers had finished season 1. At the time, Salke promised a second season with more dramatic story turns, and overall, the story did go over better with viewers, with a 49% audience score on Rotten Tomatoes versus season 1s 38%. An improvement, but still not stellar for a $1 billion series set in one of the most beloved fantasy worlds in pop culture.The season 2 finale set the stage for the wars to come, with Sauron now in control of the forces of Mordor and the nine rings of power for Men, and the Elves regrouping and planning to take the fight back to the Dark Lord. This all sounds like the setup for an excellent third seasonand, if viewers show up, fourth and fifth seasons, too.Lord of the Rings: The Rings of Power is streaming now on Prime Video.
    0 Комментарии ·0 Поделились ·33 Просмотры