VentureBeat
VentureBeat
Obsessed with covering transformative technology.
0 persone piace questo elemento
1071 Articoli
2 Foto
0 Video
0 Anteprima
Aggiornamenti recenti
  • The TAO of data: How Databricks is optimizing AI LLM fine-tuning without data labels
    venturebeat.com
    New approach flips the script on enterprise AI adoption by using input data you already have for fine-tuning instead of needing labelled data.Read More
    0 Commenti ·0 condivisioni ·29 Views
  • Second-person horror game Out of Sight launches on May 22
    venturebeat.com
    Out of Sight, a horror-puzzle title from The Gang and Starbreeze featuring a second-person perspective, finally gets its May 22 launch date.Read More
    0 Commenti ·0 condivisioni ·28 Views
  • ChatGPT gets smarter: OpenAI adds internal data referencing
    venturebeat.com
    Users of ChatGPT Team can now add internal databases as references for ChatGPT, making the chat platform respond with better context.Read More
    0 Commenti ·0 condivisioni ·10 Views
  • Tencent invests $1.25B in Ubisofts new core games operating division
    venturebeat.com
    Tencent has invested $1.25 billion in a new Ubisoft operating division that includes the core games Assassins Creed, Far Cry and Tom Clancys Rainbow Six brands. Ubisoft, the French video game giant that has struggled to launch hits until its most-recent launch of Assassins Creed: Shadows, said it is accelerating its transformatRead More
    0 Commenti ·0 condivisioni ·12 Views
  • ESA Foundation raises $1.25M for kids with Nite to Unite fundraiser
    venturebeat.com
    The ESA Foundation celebrated its 23rd annual Nite to Unite fundraiser for children on March 18, raising $1.25 million.Read More
    0 Commenti ·0 condivisioni ·9 Views
  • Anthropic scientists expose how AI actually thinks and discover it secretly plans ahead and sometimes lies
    venturebeat.com
    Anthropic has developed a new method for peering inside large language models like Claude, revealing for the first time how these AI systems process information and make decisions.The research, published today in two papers (available here and here), shows these models are more sophisticated than previously understood they plan ahead when writing poetry, use the same internal blueprint to interpret ideas regardless of language, and sometimes even work backward from a desired outcome instead of simply building up from the facts.The work, which draws inspiration from neuroscience techniques used to study biological brains, represents a significant advance in AI interpretability. This approach could allow researchers to audit these systems for safety issues that might remain hidden during conventional external testing.Weve created these AI systems with remarkable capabilities, but because of how theyre trained, we havent understood how those capabilities actually emerged, said Joshua Batson, a researcher at Anthropic, in an exclusive interview with VentureBeat. Inside the model, its just a bunch of numbers matrix weights in the artificial neural network.Large language models like OpenAIs GPT-4o, Anthropics Claude, and Googles Gemini have demonstrated remarkable capabilities, from writing code to synthesizing research papers. But these systems have largely functioned as black boxes even their creators often dont understand exactly how they arrive at particular responses.Anthropics new interpretability techniques, which the company dubs circuit tracing and attribution graphs, allow researchers to map out the specific pathways of neuron-like features that activate when models perform tasks. The approach borrows concepts from neuroscience, viewing AI models as analogous to biological systems.This work is turning what were almost philosophical questions Are models thinking? Are models planning? Are models just regurgitating information? into concrete scientific inquiries about whats literally happening inside these systems, Batson explained.Among the most striking discoveries was evidence that Claude plans ahead when writing poetry. When asked to compose a rhyming couplet, the model identified potential rhyming words for the end of the next line before it began writing a level of sophistication that surprised even Anthropics researchers.This is probably happening all over the place, Batson said. If you had asked me before this research, I would have guessed the model is thinking ahead in various contexts. But this example provides the most compelling evidence weve seen of that capability.For instance, when writing a poem ending with rabbit, the model activates features representing this word at the beginning of the line, then structures the sentence to naturally arrive at that conclusion.The researchers also found that Claude performs genuine multi-step reasoning. In a test asking The capital of the state containing Dallas is the model first activates features representing Texas, and then uses that representation to determine Austin as the correct answer. This suggests the model is actually performing a chain of reasoning rather than merely regurgitating memorized associations.By manipulating these internal representations for example, replacing Texas with California the researchers could cause the model to output Sacramento instead, confirming the causal relationship.Beyond translation: Claudes universal language concept network revealedAnother key discovery involves how Claude handles multiple languages. Rather than maintaining separate systems for English, French, and Chinese, the model appears to translate concepts into a shared abstract representation before generating responses.We find the model uses a mixture of language-specific and abstract, language-independent circuits, the researchers write in their paper. When asked for the opposite of small in different languages, the model uses the same internal features representing opposites and smallness, regardless of the input language.This finding has implications for how models might transfer knowledge learned in one language to others, and suggests that models with larger parameter counts develop more language-agnostic representations.When AI makes up answers: Detecting Claudes mathematical fabricationsPerhaps most concerning, the research revealed instances where Claudes reasoning doesnt match what it claims. When presented with difficult math problems like computing cosine values of large numbers, the model sometimes claims to follow a calculation process that isnt reflected in its internal activity.We are able to distinguish between cases where the model genuinely performs the steps they say they are performing, cases where it makes up its reasoning without regard for truth, and cases where it works backwards from a human-provided clue, the researchers explain.In one example, when a user suggests an answer to a difficult problem, the model works backward to construct a chain of reasoning that leads to that answer, rather than working forward from first principles.We mechanistically distinguish an example of Claude 3.5 Haiku using a faithful chain of thought from two examples of unfaithful chains of thought, the paper states. In one, the model is exhibiting bullshitting In the other, it exhibits motivated reasoning.Inside AI Hallucinations: How Claude decides when to answer or refuse questionsThe research also provides insight into why language models hallucinate making up information when they dont know an answer. Anthropic found evidence of a default circuit that causes Claude to decline to answer questions, which is inhibited when the model recognizes entities it knows about.The model contains default circuits that cause it to decline to answer questions, the researchers explain. When a model is asked a question about something it knows, it activates a pool of features which inhibit this default circuit, thereby allowing the model to respond to the question.When this mechanism misfires recognizing an entity but lacking specific knowledge about it hallucinations can occur. This explains why models might confidently provide incorrect information about well-known figures while refusing to answer questions about obscure ones.Safety implications: Using circuit tracing to improve AI reliability and trustworthinessThis research represents a significant step toward making AI systems more transparent and potentially safer. By understanding how models arrive at their answers, researchers could potentially identify and address problematic reasoning patterns.We hope that we and others can use these discoveries to make models safer, the researchers write. For example, it might be possible to use the techniques described here to monitor AI systems for certain dangerous behaviorssuch as deceiving the userto steer them towards desirable outcomes, or to remove certain dangerous subject matter entirely.However, Batson cautions that the current techniques still have significant limitations. They only capture a fraction of the total computation performed by these models, and analyzing the results remains labor-intensive.Even on short, simple prompts, our method only captures a fraction of the total computation performed by Claude, the researchers acknowledge.The future of AI transparency: Challenges and opportunities in model interpretationAnthropics new techniques come at a time of increasing concern about AI transparency and safety. As these models become more powerful and more widely deployed, understanding their internal mechanisms becomes increasingly important.The research also has potential commercial implications. As enterprises increasingly rely on large language models to power applications, understanding when and why these systems might provide incorrect information becomes crucial for managing risk.Anthropic wants to make models safe in a broad sense, including everything from mitigating bias to ensuring an AI is acting honestly to preventing misuse including in scenarios of catastrophic risk, the researchers write.While this research represents a significant advance, Batson emphasized that its only the beginning of a much longer journey. The work has really just begun, he said. Understanding the representations the model uses doesnt tell us how it uses them.For now, Anthropics circuit tracing offers a first tentative map of previously uncharted territory much like early anatomists sketching the first crude diagrams of the human brain. The full atlas of AI cognition remains to be drawn, but we can now at least see the outlines of how these systems think.Daily insights on business use cases with VB DailyIf you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.Read our Privacy PolicyThanks for subscribing. Check out more VB newsletters here.An error occured.
    0 Commenti ·0 condivisioni ·5 Views
  • PlayStation stalwarts Hot Shots and Patapon are coming to Switch
    venturebeat.com
    Two games in today's Nintendo Direct were Patapon 1+2 Replay and Everybody's Golf: Hot Shots, two series previously exclusive to PlayStation.Read More
    0 Commenti ·0 condivisioni ·4 Views
  • Nintendo is releasing another weird phone app
    venturebeat.com
    Nintendo had an interesting surprise at the end of today's Direct.Read More
    0 Commenti ·0 condivisioni ·2 Views
  • Amazon Luna signs multi-year deal with EA to bring big games to cloud gaming
    venturebeat.com
    AmazonLunaunveiled a multi-year deal with big game publisher Electronic Arts to bring many of its top games to the cloud gaming service.Read More
    0 Commenti ·0 condivisioni ·2 Views
  • INCYMO launches AI-powered mobile gaming creative ad platform
    venturebeat.com
    INCYMO.AI launched its human-powered, AI-aided creative platform for crafting mobile gaming advertisements.Read More
    0 Commenti ·0 condivisioni ·3 Views
  • Backbone One: Xbox Edition mobile game controller launches today
    venturebeat.com
    Backbone is expanding its partnership with Xbox to officially launch the Backbone One: Xbox Edition mobile game controller ($110) on March 27.Read More
    0 Commenti ·0 condivisioni ·3 Views
  • The open source Model Context Protocol was just updated heres why its a big deal
    venturebeat.com
    An updated version of the MCP spec introduced key upgrades to make AI agents more secure, capable and interoperable.Read More
    0 Commenti ·0 condivisioni ·47 Views
  • Studio Ghibli AI image trend overwhelms OpenAIs new GPT-4o feature, delaying free tier
    venturebeat.com
    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreIf youve been on the internet or, at least, on the social network X in the last day or so, youve likely come across colorful, smooth anime-style images of famous photographs rendered in the style of the Japanese studio, Studio Ghibli (the one that made Princess Mononoke, The Boy and the Crane, and My Neighbor Totoro, among many other classic animated films).In fact, some users are complaining because their feeds seem to be filled with nearly exclusively these types of images.Whether its current President Trump, the iconic image of the Tank Man during the 1989 pro-Democracy Tiananmen Square protests, Osama Bin Laden, Jeffrey Epstein, or even other pop culture moments and characters like Sam Rockwells iconic cameo on The White Lotus and many popular memes of yore, people have been making and sharing these images at a rapid clip.Powered by the new GPT-4o models native image genMuch of that is thanks to OpenAIs new update to the GPT-4o model behind ChatGPT for Pro, Plus, and Team subscription tiers, which turns on native image generation.While ChatGPT previously allowed users to create images from text prompts, it did so by routing them to another, separate OpenAI model, DALL-E 3. But OpenAIs GPT-4o model is so named with an o because it is an omni model the company trained it not only on text and code, but also on imagery and presumably, video and audio as well, allowing it to be able to understand all these forms of media and their similarities and differences, conceive of ideas across them (an apple is not just a word, but also something that can be drawn as a red or yellow or green fruit), and accurately produce said media given text prompts by a user without connecting to any external models.As a consequence, like rival Google AI Studios recent update to include a Gemini 2.0 Flash experimental image creation model, the new OpenAI GPT-4o can also accept image uploads of any pre-existing image in your camera roll or that youve screenshotted or saved off the web.How to use ChatGPT to make Studio Ghibli-style images (and change or transfer any image into any style!)First, navigate to Chat.com or ChatGPT.com and ensure youre logged in with your ChatGPT Plus, Pro, or Team account and that the AI model selector (located in the left corner of the session window) is showing GPT-4o as the chosen model (you can click it to drop down and select the proper model between the available options).Once you do that, you can upload an image to ChatGPT using the + button in the lower left hand corner of the prompt entry text box, you can now ask the new GPT-4o with image creation model to render your pre-existing image in a new style.If you want, you can try it by uploading a photo of yourself and friends and typing make all these people in the style of a Studio Ghibli animation. And after a few seconds, it will do so with some pretty convincing and amusing results. It even supports attaching multiple images and combining them into a single piece.ChatGPT free tier usage delayedOpenAI initially said it would also enable this feature for free (non-paying users of ChatGPT), but unfortunately for them, co-founder and CEO Sam Altman today posted that the feature will be delayed due to the overwhelming demand by existing paying subscribers to ChatGPT Plus, Pro, and Team tiers.As he wrote on X:images in chatgpt are wayyyy more popular than we expected (and we had pretty high expectations).rollout to our free tier is unfortunately going to be delayed for awhile.Meanwhile, those who do have access will likely continue cranking out image edits in this and other recognizable or novel styles.Of course, not everyone is a fan of OpenAIs work here. In fact, Studio Ghibli creator Hayao Miyazaki himself appeared in a documentary back in 2016 and one of the most memorable moments from it still referenced to this day is him reacting with overwhelming disgust and revulsion to an early example of AI-powered animation and physics by, you guessed it, an OpenAI model.As with many generative AI products and services, OpenAIs training data for this new image generation capability remains under wraps, but is widely speculated to contain copyrighted material and while imitating a style is generally not considered copyright infringement in the U.S., it is rubbing some fans of the original animation the wrong way.For now, those brands and enterprises looking to play with this style should do so with caution and after serious consideration, given the possible negative blowback among some users. But for those who are unabashedly pro-AI tools or with more forgiving and fun-loving fanbases, its clear that OpenAI has yet another hit on its hands.Daily insights on business use cases with VB DailyIf you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.Read our Privacy PolicyThanks for subscribing. Check out more VB newsletters here.An error occured.
    0 Commenti ·0 condivisioni ·17 Views
  • GDC draws nearly 30K attendees and returns to SF in early March 2026
    venturebeat.com
    The 2025 Game Developers Conference (GDC) drew nearly 30,000 registered attendees to San Francisco last week, about the same as last year.Read More
    0 Commenti ·0 condivisioni ·13 Views
  • Call of Duty: Warzone returns to Verdansk for Season 3 | The details
    venturebeat.com
    Activision dropped a lot of details on the return of Call of Duty: Warzone to Verdansk, the original map that made the game a massive success in 2020. The battle royale map for Season 03 will drop on April 3 at 9 a.m. Pacific time, a day after the the launch of Call of Duty: Black Ops 6: Season 03 on April 2 at 9 a.m Pacific. This map was the populRead More
    0 Commenti ·0 condivisioni ·34 Views
  • Observe launches VoiceAI agents to automate customer call centers with realistic, humanlike voices that dont interrupt
    venturebeat.com
    Observe.AI has officially launched VoiceAI agents, a solution designed to automate routine customer interactions in contact centers. The latest addition to the companys AI-driven conversational intelligence platform, VoiceAI agents aim to improve customer experience while reducing operational costs. With this release, Observe.AI is positioning itsRead More
    0 Commenti ·0 condivisioni ·20 Views
  • SingularityNET and Star Atlas partner to combine Web3 games and AI Agents
    venturebeat.com
    SingularityNET, a founding member of the ASI Alliance, has partnered with ATMTA, maker of Star Atlas, the Web3 space exploration online game.Read More
    0 Commenti ·0 condivisioni ·20 Views
  • Call of Duty: Warzone returns to Verdansk for Season 3 | The details
    venturebeat.com
    Activision dropped a lot of details on the return of Call of Duty: Warzone to Verdansk, the original map that made the game a massive success in 2020. The battle royale map for Season 03 will drop on April 3 at 9 a.m. Pacific time, a day after the the launch of Call of Duty: Black Ops 6: Season 03 on April 2 at 9 a.m Pacific. This map was the populRead More
    0 Commenti ·0 condivisioni ·18 Views
  • No Mans Sky adds paleontology in Relics update, launching today
    venturebeat.com
    No Man's Sky's newest update allows players to find and dig up fossils on its trillions of procedurally generated worlds.Read More
    0 Commenti ·0 condivisioni ·19 Views
  • Groq and PlayAI just made voice AI sound way more human heres how
    venturebeat.com
    Groq partners with PlayAI to deliver Dialog, an emotionally intelligent text-to-speech model that runs 10x faster than real-time speech, including the Middle East's first Arabic voice AI model.Read More
    0 Commenti ·0 condivisioni ·19 Views
  • Microsoft infuses enterprise agents with deep reasoning, unveils data Analyst agent that outsmarts competitors
    venturebeat.com
    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreMicrosoft has built the largest enterprise AI agent ecosystem, and is now extending its lead with powerful new capabilities that position the company ahead in one of enterprise techs most exciting segments.The company announced Tuesday evening two significant additions to its Copilot Studio platform: deep reasoning capabilities that enable agents to tackle complex problems through careful, methodical thinking, and agent flows that combine AI flexibility with deterministic business process automation. Microsoft also unveiled two specialized deep reasoning agents for Microsoft 365 Copilot: Researcher and Analyst.We have customers with thousands of agents already, Microsofts Corporate Vice President for Business and Industry Copilot Charles Lamanna, told VentureBeat in an exclusive interview on Monday. You start to have this kind of agentic workforce where no matter what the job is, you probably have an agent that can help you get it done faster.Microsofts distinctive Analyst agentWhile the Researcher agent mirrors capabilities from competitors like OpenAIs Deep Research and Googles Deep Research, Microsofts Analyst agent represents a more differentiated offering. Designed to function like a personal data scientist, the Analyst agent can process diverse data sources, including Excel files, CSVs, and embedded tables in documents, generating insights through code execution and visualization.This is not a base model off the shelf, Lamanna emphasized. This is quite a bit of extensions and tuning and training on top of the core models. Microsoft has leveraged its deep understanding of Excel workflows and data analysis patterns to create an agent that aligns with how enterprise users actually work with data.The Analyst can automatically generate Python code to process uploaded data files, produce visualizations, and deliver business insights without requiring technical expertise from users. This makes it particularly valuable for financial analysis, budget forecasting and operational reporting use cases that typically require extensive data preparation.Deep reasoning: Bringing critical thinking to enterprise agentsMicrosofts deep reasoning capability extends agents abilities beyond simple task completion to complex judgment and analytical work. By integrating advanced reasoning models like OpenAIs o1 and connecting them to enterprise data, these agents can tackle ambiguous business problems more methodically.The system dynamically determines when to invoke deeper reasoning, either implicitly based on task complexity or explicitly when users include prompts like reason over this or think really hard about this. Behind the scenes, the platform analyzes instructions, evaluates context, and selects appropriate tools based on the task requirements.This enables scenarios that were previously difficult to automate. For example, one large telecommunications company uses deep reasoning agents to generate complex RFP responses by assembling information from across multiple internal documents and knowledge sources, Lamanna told VentureBeat. Similarly, Thomson Reuters employs these capabilities for due diligence in mergers and acquisition reviews, processing unstructured documents to identify insights, he said. See an example of the agent reasoning at work in the video below:Agent flows: Reimagining process automationMicrosoft has also introduced agent flows, which effectively evolve robotic process automation (RPA) by combining rule-based workflows with AI reasoning. This addresses customer demands for integrating deterministic business logic with flexible AI capabilities.Sometimes they dont want the model to freestyle. They dont want the AI to make its own decisions. They want to have hard-coded business rules, Lamanna explained. Other times they do want the agent to freestyle and make judgment calls.This hybrid approach enables scenarios like intelligent fraud prevention, where an agent flow might use conditional logic to route higher-value refund requests to an AI agent for deep analysis against policy documents.Pets at Home, a U.K.-based pet supplies retailer, has already deployed this technology for fraud prevention. Lamanna revealed the company has saved over a million pounds through the implementation. Similarly, Dow Chemical has realized millions of dollars saved for transportation and freight management through agent-based optimization.Below is a video showing the Agent Flows at work:The Microsoft Graph advantageCentral to Microsofts agent strategy is its enterprise data integration through the Microsoft Graph, which is a comprehensive mapping of workplace relationships between people, documents, emails, calendar events, and business data. This provides agents with contextual awareness that generic models lack.The lesser known secret capability of the Microsoft graph is that were able to improve relevance on the graph based on engagement and how tightly connected some files are, Lamanna revealed. The system identifies which documents are most referenced, shared, or commented on, ensuring agents reference authoritative sources rather than outdated copies.This approach gives Microsoft a significant competitive advantage over standalone AI providers. While competitors may offer advanced models, Microsoft combines these with workplace context and fine-tuning optimized explicitly for enterprise use cases and Microsoft tools.Microsoft can leverage the same web data and model technology that competitors can,Lamanna noted, but we then also have all the content inside the enterprise. This creates a flywheel effect where each new agent interaction further enriches the graphs understanding of workplace patterns.Enterprise adoption and accessibilityMicrosoft has prioritized making these powerful capabilities accessible to organizations with varying technical resources, Lamanna said. The agents are exposed directly within Copilot, allowing users to interact through natural language without prompt engineering expertise.Meanwhile, Copilot Studio provides a low-code environment for custom agent development. Its in our DNA to have a tool for everybody, not just people who can boot up a Python SDK and make calls, but anybody can start to build these agents, Lamanna emphasized.This accessibility approach has fueled rapid adoption. Microsoft previously revealed that over 100,000 organizations have used Copilot Studioand thatmore than 400,000 agents were created in the last quarter.The competitive landscapeWhile Microsoft appears to lead enterprise agent deployment today, competition is intensifying. Google has expanded its Gemini capabilities for agents and agentic coding, while OpenAIs o1 model and Agents SDK provide powerful reasoning and agentic tools for developers. Big enterprise application companies like Salesforce, Oracle, ServiceNow, SAP and others have all launched agentic platforms for their customers over the last year. And also on Tuesday, Amazons AWS released an AI agent, called Amazon Q in Quicksight, to let employees to engage via natural language to perform data analysis without specialized skills.Employees can use natural language to perform expert-level data analysis, ask what-if questions, and get actionable recommendations, helping them unlock new insights and make decisions fasterHowever, Microsofts advantage lies in its more comprehensive approacha strong coupling with the leading reasoning model company, OpenAI, while also offering model choice, enterprise-grade infrastructure, extensive data integration across workplace tools, and a focus on business outcomes rather than raw AI capabilities. Microsoft has created an ecosystem that looks like best practice by combining personal copilots that understand individual work patterns with specialized agents for specific business processes.For enterprise decision-makers, the message is clear: agent technology has matured beyond experimentation to practical business applications with measurable ROI. The choice of platform increasingly depends on integration with existing tools and data. In this area, Microsoft holds an advantage in many application areas because of the number of users it has, for example, in Excel and Power Automate.Watch my full interview with Charles Lamanna embedded below to hear firsthand how Microsoft is driving its agent strategy, what these new capabilities mean for enterprise users, and how organizations are leveraging agents to deliver measurable business results:Daily insights on business use cases with VB DailyIf you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.Read our Privacy PolicyThanks for subscribing. Check out more VB newsletters here.An error occured.
    0 Commenti ·0 condivisioni ·31 Views
  • METASCALE improves LLM reasoning with adaptive strategies
    venturebeat.com
    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreA new framework called METASCALE enables large language models (LLMs) to dynamically adapt their reasoning mode at inference time. This framework addresses one of LLMs shortcomings, which is using the same reasoning strategy for all types of problems.Introduced in a paper by researchers at the University of California, Davis, the University of Southern California and Microsoft Research, METASCALE uses meta-thoughtsadaptive thinking strategies tailored to each taskto improve LLM performance and generalization across various tasks.This approach can offer enterprises a way to enhance the accuracy and efficiency of their LLM applications without changing models or engaging in expensive fine-tuning efforts.The limitations of fixed reasoning StrategiesOne of the main challenges of LLM applications is their fixed and inflexible reasoning behavior. Unlike humans, who can consciously choose different approaches to solve problems, LLMs often rely on pattern matching from their training data, which may not always align with sound reasoning principles that humans use.Current methods for adjusting the reasoning process of LLMs, such as chain-of-thought (CoT) prompting, self-verification and reverse thinking, are often designed for specific tasks, limiting their adaptability and effectiveness across diverse scenarios.The researchers point out that these approaches impose fixed thinking structures rather than enabling LLMs to adaptively determine the most effective task-specific strategy, potentially limiting their performance.To address this limitation, the researchers propose the concept of meta-thinking. This process allows LLMs to reflect on their approach before generating a response. Meta-thoughts guide the reasoning process through two components inspired by human cognition:Cognitive mindset: The perspective, expertise, or role the model adopts to approach the task.Problem-solving strategy: A structured pattern used to formulate a solution for the task based on the chosen mindset.Instead of directly tackling a problem, the LLM first determines how to think, selecting the most appropriate cognitive strategy. For example, when faced with a complex software problem, the LLM might first think about the kind of professional who would solve it (e.g., a software engineer) and choose a strategy to approach the problem (e.g., using design patterns to break down the problem or using a micro-services approach to simplify the deployment).By incorporating this meta-thinking step, LLMs can dynamically adapt their reasoning process to different tasks, rather than relying on rigid, predefined heuristics, the researchers write.Building upon meta-thoughts, the researchers introduce METASCALE, a test-time framework that can be applied to any model through prompt engineering.The goal is to enable LLMs to explore different thinking strategies, and generate the most effective response for a given input, they state.METASCALE operates in three phases:Initialization: METASCALE generates a diverse pool of reasoning strategies based on the input prompt. It does this by prompting the LLM to self-compose strategies and leveraging instruction-tuning datasets containing reasoning templates for different types of problems. This combination creates a rich initial pool of meta-thoughts.Selection: A Multi-Armed Bandit (MAB) algorithm selects the most promising meta-thought for each iteration. MAB is a problem framework where an agent must repeatedly choose between multiple options, or arms, each with unknown reward distributions. The core challenge lies in balancing exploration (e.g., trying different reasoning strategies) and exploitation (consistently selecting the reasoning strategy that previously provided the best responses). In METASCALE, each meta-thought is treated as an arm, and the goal is to maximize the reward (response quality) based on the selected meta-thought.Evolution: A genetic algorithm refines and expands the pool of cognitive strategies iteratively. METASCALE uses high-performing meta-thoughts as parents to produce new child meta-thoughts. The LLM is prompted to develop refined meta-thoughts that integrate and improve upon the selected parents. To remain efficient, METASCALE operates within a fixed sampling budget when generating meta-thoughts.The researchers evaluated METASCALE on mathematical reasoning benchmarks (GSM8K), knowledge and language understanding (MMLU-Pro), and Arena-Hard, comparing it to four baseline inference methods: direct responses (single-pass inference), CoT, Best-of-N (sampling multiple responses and choosing the best one), and Best-of-N with CoT. They used GPT-4o and Llama-3.1-8B-Instruct as the backbone models for their experiments.The results show that METASCALE significantly enhances LLM problem-solving capabilities across diverse tasks, consistently outperforming baseline methods. METASCALE achieved equal or superior performance compared to all baselines, regardless of whether they used CoT prompting. Notably, GPT-4o with METASCALE outperformed o1-mini under style control.These results demonstrate that integrating meta-thoughts enables LLMs to scale more effectively during test time as the number of samples increases, the researchers state.As the number of candidate solutions increased, METASCALE showed significantly higher gains than other baselines, indicating that it is a more effective scaling strategy.Implications for the enterpriseAs a test-time technique, METASCALE can help enterprises improve the quality of LLM reasoning through smart prompt engineering without the need to fine-tune or switch models. It also doesnt require building complex software scaffolding on top of models, as the logic is completely provided by the LLM itself.By dynamically adjusting the reasoning strategies of LLMs, METASCALE is also practical for real-world applications that handle various reasoning tasks. It is also a black-box method, which can be applied to open-source models running on the enterprise cloud or closed models running behind third-party APIs. It shows promising capabilities of test-time scaling techniques for reasoning tasks.Daily insights on business use cases with VB DailyIf you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.Read our Privacy PolicyThanks for subscribing. Check out more VB newsletters here.An error occured.
    0 Commenti ·0 condivisioni ·30 Views
  • Gunzilla Games acquires, resurrects Game Informer
    venturebeat.com
    Game Informer returns as Gunzilla Games has acquired and relaunched Game Informer, bringing back the staff and the website.Read More
    0 Commenti ·0 condivisioni ·25 Views
  • Beyond transformers: Nvidias MambaVision aims to unlock faster, cheaper enterprise computer vision
    venturebeat.com
    Nvidia is updating its computer vision models with new versions of MambaVision that combine the best of Mamba and transformers to improve efficiency.Read More
    0 Commenti ·0 condivisioni ·24 Views
  • Dreamhavens Moonshot Games unveils Wildgate, a crew-based FPS with tactical spaceship combat
    venturebeat.com
    Moonshot Games unveiled Wildgate, a crew-based sci-fi game where you can fight in first person or in ship-to-ship combat in space.Read More
    0 Commenti ·0 condivisioni ·42 Views
  • Google releases most intelligent model to date, Gemini 2.5 Pro
    venturebeat.com
    Gemini 2.5 Pro is now available for Gemini Advanced users and is Google's most capable model with a 1 million token context window.Read More
    0 Commenti ·0 condivisioni ·42 Views
  • Insane: OpenAI introduces GPT-4o native image generation and its already wowing users
    venturebeat.com
    As AI-generated images become more precise and accessible, GPT-4o represents a significant step forward in the space.Read More
    0 Commenti ·0 condivisioni ·38 Views
  • Immutable declares win for Web3 gaming as SEC ends investigation
    venturebeat.com
    Immutable announced that the SEC has notified the company it is formally closing its inquiry into the company and related parties.Read More
    0 Commenti ·0 condivisioni ·39 Views
  • The new best AI image generation model is here: say hello to Reve Image 1.0!
    venturebeat.com
    One of the models standout capabilities is its strong text rendering performance, addressing a common challenge in AI-generated imagery.Read More
    0 Commenti ·0 condivisioni ·26 Views
  • Discord revamps overlay and desktop to make PC gaming easier
    venturebeat.com
    Discord announced a series of updates for the desktop app that help make PC gaming easier and work better for players.Read More
    0 Commenti ·0 condivisioni ·26 Views
Altre storie