-
- EXPLORE
-
-
-
-
Obsessed with covering transformative technology.
Jüngste Beiträge
-
VENTUREBEAT.COMNvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging FaceAn attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription services...Read More0 Kommentare 0 Anteile 22 AnsichtenPlease log in to like, share and comment!
-
VENTUREBEAT.COMNvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging FaceJoin our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia has become one of the most valuable companies in the world in recent years thanks to the stock market noticing how much demand there is for graphics processing units (GPUs), the powerful chips Nvidia makes that are used to render graphics in video games but also, increasingly, train AI large language and diffusion models. But Nvidia does far more than just make hardware, of course, and the software to run it. As the generative AI era wears on, the Santa Clara-based company has also been steadily releasing more and more of its own AI models — mostly open source and free for researchers and developers to take, download, modify and use commercially — and the latest among them is Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can, in the words of Hugging Face’s Vaibhav “VB” Srivastav, “transcribe 60 minutes of audio in 1 second [mind blown emoji].” This is the new generation of the Parakeet model Nvidia first unveiled back in January 2024 and updated again in April of that year, but this version two is so powerful, it currently tops the Hugging Face Open ASR Leaderboard with an average “Word Error Rate” (times the model incorrectly transcribes a spoken word) of just 6.05% (out of 100). To put that in perspective, it nears proprietary transcription models such as OpenAI’s GPT-4o-transcribe (with a WER of 2.46% in English) and ElevenLabs Scribe (3.3%). And it’s offering all this while remaining freely available under a commercially permissive Creative Commons CC-BY-4.0 license, making it an attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription services into their paid applications. Performance and benchmark standing The model boasts 600 million parameters and leverages a combination of the FastConformer encoder and TDT decoder architectures. It is capable of transcribing an hour of audio in just one second, provided it’s running on Nvidia’s GPU-accelerated hardware. The performance benchmark is measured at an RTFx (Real-Time Factor) of 3386.02 with a batch size of 128, placing it at the top of current ASR benchmarks maintained by Hugging Face. Use cases and availability Released globally on May 1, 2025, Parakeet-TDT-0.6B-v2 is aimed at developers, researchers, and industry teams building applications such as transcription services, voice assistants, subtitle generators, and conversational AI platforms. The model supports punctuation, capitalization, and detailed word-level timestamping, offering a full transcription package for a wide range of speech-to-text needs. Access and deployment Developers can deploy the model using Nvidia’s NeMo toolkit. The setup process is compatible with Python and PyTorch, and the model can be used directly or fine-tuned for domain-specific tasks. The open-source license (CC-BY-4.0) also allows for commercial use, making it appealing to startups and enterprises alike. Training data and model development Parakeet-TDT-0.6B-v2 was trained on a diverse and large-scale corpus called the Granary dataset. This includes around 120,000 hours of English audio, composed of 10,000 hours of high-quality human-transcribed data and 110,000 hours of pseudo-labeled speech. Sources range from well-known datasets like LibriSpeech and Mozilla Common Voice to YouTube-Commons and Librilight. Nvidia plans to make the Granary dataset publicly available following its presentation at Interspeech 2025. Evaluation and robustness The model was evaluated across multiple English-language ASR benchmarks, including AMI, Earnings22, GigaSpeech, and SPGISpeech, and showed strong generalization performance. It remains robust under varied noise conditions and performs well even with telephony-style audio formats, with only modest degradation at lower signal-to-noise ratios. Hardware compatibility and efficiency Parakeet-TDT-0.6B-v2 is optimized for Nvidia GPU environments, supporting hardware such as the A100, H100, T4, and V100 boards. While high-end GPUs maximize performance, the model can still be loaded on systems with as little as 2GB of RAM, allowing for broader deployment scenarios. Ethical considerations and responsible use NVIDIA notes that the model was developed without the use of personal data and adheres to its responsible AI framework. Although no specific measures were taken to mitigate demographic bias, the model passed internal quality standards and includes detailed documentation on its training process, dataset provenance, and privacy compliance. The release drew attention from the machine learning and open-source communities, especially after being publicly highlighted on social media. Commentators noted the model’s ability to outperform commercial ASR alternatives while remaining fully open source and commercially usable. Developers interested in trying the model can access it via Hugging Face or through Nvidia’s NeMo toolkit. Installation instructions, demo scripts, and integration guidance are readily available to facilitate experimentation and deployment. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.0 Kommentare 0 Anteile 21 Ansichten
-
VENTUREBEAT.COMNew York Times Games launches leaderboard to celebrate daily winsNew York Times Games has lunched its multi-game leaderboard, letting puzzle game solvers celebrate their daily wins with friends. Whether you’re a first-time puzzler or a seasoned solver, you can now showcase your daily triumphs and fuel your friendly rivalries across Wordle, Connections, Spelling Bee and the Mini Crossword — in a brand-new leaderboard. With global solvers already frequently sharing scores with friends and family, the multi-game leaderboard will provide an even easier way to connect over the delight of puzzling, the company said. Registered users and subscribers can add fellow solvers to their leaderboard by downloading the New York Times Games app from the App Store on both iOS and Android devices. The rollout across devices will take place over the next few weeks. “We know our solvers love sharing their scores with fellow puzzle enthusiasts — whether in group chats or on social media — so we set out to enhance that experience by creating new ways for them to connect, celebrate and play together,” said Jonathan Knight, head of Games at The New York Times, in a statement. “There’s no better way to celebrate our community of curious solvers than to unite them with this easy way to follow each other’s daily puzzle scores — while bringing some of our celebrity fans in on the fun!” It’s no secret that New York Times Games has many famous fans, from professional basketball players creating their own gameplay tournaments to actors creating their own crosswords. For the first time, solvers are invited to add a group of New York Times Games’ most dedicated celebrity puzzlers to their leaderboard to share in the joy of solving and see how they stack up. This includes actor Chris Perfetti, who starts each morning with a different game; Jazz singer Laufey and actress Lola Tung, who stay connected through puzzling coast-to-coast; plus, reality TV friends Luann de Lesseps and Sonja Morgan who love to keep up with each other by comparing their daily scores. And don’t be surprised if you see more celebrities in leaderboards soon. Have a favorite celebrity who you would love to see solve? Send their name to gamesmarketing@nytimes.com. “With my schedule, I’m always on the go. I love New York Times Games, like Wordle and the Mini Crossword, to center myself and stay connected with my friends no matter where in the world we are, like Lola Tung,” said Laufey, in a statement. “Whether we’re comparing scores or sharing hints, it’s become somewhat of a tradition and gives us daily topics to yap about. So when I heard New York Times Games was adding a leaderboard to its app, I loved the idea of bringing even more friends (and my fans!) into the mix. It’s such a fun way to stay close no matter the time zone.” Beginning May 8 at 10 a.m. E.T., celebrity puzzlers can be added to leaderboards on a first-come- first-served basis. Keep an eye on this page and their Instagram Stories for direct links — only the first 100 solvers to add each celebrity starting at this time will be able to participate. (Terms and Conditions apply). Celebrities’ scores will be visible in the New York Times Games app until May 11, 2025. Since the launch of the Crossword in 1942, New York Times Games have captivated solvers by providing engaging word and logic games. In 2014, the Mini Crossword was introduced — followed by Spelling Bee, Letter Boxed, Tiles, Wordle, Connections and Strands, offering puzzles for all skill levels that everyone can enjoy playing every day. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.0 Kommentare 0 Anteile 18 Ansichten
-
VENTUREBEAT.COMGame veteran Warren Spector raises awareness about working with bipolar condition | interviewWarren Spector shared something more human and acknowledged for the first time to his peers that he is bipolar.Read More0 Kommentare 0 Anteile 29 Ansichten
-
VENTUREBEAT.COMMy.Games’ Castle Duels mobile card strategy game hits $1M in monthly revenueMy.Games said its Castle Duels strategy game has hit $1 million in monthly revenue in both March and April. The European game developer and publisher said the revenue reflects steady growth and continued player interest. Since its launch in 2024, Castle Duels has surpassed 2.6 million installs and reached over $6 million in lifetime revenue. In Castle Duels, players balance offensive and defensive tactics, battling to destroy enemy castles while defending their own. Featuring a dynamic card-based system, players summon units like mages, minotaurs, ghosts, and succubi for fast-paced, real-time combat. With over 30 unique units, from common to legendary, players can build personalized teams and strategize to outsmart their opponents. “We’re thrilled to see so many players enjoying Castle Duels and helping it grow so quickly,” said Aleksei Moshkovich, lead project manager at Castle Duels at My.Games, in a statement. “We want to extend a heartfelt thank you to our amazing community for their support. This is just the beginning — we have plenty of exciting updates planned to continue delivering a fun and dynamic experience for our players.” To celebrate this achievement, Castle Duels is rolling out a new update featuring Glyphs and introducing a powerful new Legendary Unit — the Chef. Leading the update are Glyphs, a brand-new feature that adds a deeper layer of strategy to the game. These powerful enhancements are designed for Legendary units and come in Common, Rare, Epic, and Legendary tiers. Glyphs boost stats, unlock passive abilities, and enhance unit powers in unique ways. Equip them by unlocking a socket with Minerals and Gold, spin for a random Glyph, and customize your unit’s abilities to dominate the battlefield. Also joining the fray is the Chef — the latest Legendary unit to heat up the action. A fast and unpredictable attacker, the Chef specializes in dodging blows and delivering lightning-quick dashes that deal massive damage. His special seasoning debuffs enemies, while his lightning dash gives him a chance to strike twice. Available for a limited time in the Starseeking event, the Chef brings deadly flavor to any lineup. The game is on iOS and Android. GB Daily Stay in the know! Get the latest news in your inbox daily Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.0 Kommentare 0 Anteile 21 Ansichten
-
VENTUREBEAT.COMThe great cognitive migration: How AI is reshaping human purpose, work and meaningHumans need to embrace domains where AI still falters, and where human creativity, ethics and emotion emain indispensable.Read More0 Kommentare 0 Anteile 26 Ansichten
-
VENTUREBEAT.COMThe great cognitive migration: How AI is reshaping human purpose, work and meaningHumans need to embrace domains where AI still falters, and where human creativity, ethics and emotion emain indispensable.Read More0 Kommentare 0 Anteile 27 Ansichten
-
VENTUREBEAT.COMNot everything needs an LLM: A framework for evaluating when AI makes senseThe answer to 'What customer needs requires an AI solution?' isn’t always 'Yes.' LLMs are still expensive and not always accurate.Read More0 Kommentare 0 Anteile 36 Ansichten
-
VENTUREBEAT.COMNot everything needs an LLM: A framework for evaluating when AI makes senseThe answer to 'What customer needs requires an AI solution?' isn’t always 'Yes.' LLMs are still expensive and not always accurate.Read More0 Kommentare 0 Anteile 32 Ansichten
-
VENTUREBEAT.COMCarry1st brings Call of Duty: Mobile tournament to Africa as championship qualifierCarry1st announced the return of its flagship Call of Duty: Mobile tournament as part of the Carry1st Africa Cup 2025.Read More0 Kommentare 0 Anteile 25 Ansichten
-
VENTUREBEAT.COMRoblox breaks ground on data center in Brazil for early 2026At Gamescom Latam, Roblox announced it has broken ground on a new data center in Brazil -- slated to go live in early 2026.Read More0 Kommentare 0 Anteile 32 Ansichten
-
VENTUREBEAT.COMRun Games surprise-launches Football Heroes LeagueRun Games both announced and launched its new game, Football Heroes League, in early access on PC during Gamescom LatamRead More0 Kommentare 0 Anteile 39 Ansichten
-
VENTUREBEAT.COMRSAC 2025: Why the AI agent era means more demand for CISOSRSAC 2025 made one thing clear: AI agents are entering security workflows, but boards want proof they work.Read More0 Kommentare 0 Anteile 26 Ansichten
-
VENTUREBEAT.COMRSAC 2025: Why the AI agent era means more demand for CISOSRSAC 2025 made one thing clear: AI agents are entering security workflows, but boards want proof they work.Read More0 Kommentare 0 Anteile 27 Ansichten
-
VENTUREBEAT.COMOpenAI overrode concerns of expert testers to release sycophantic GPT-4oOnce again, it shows the importance of incorporating more domains beyond the traditional math and computer science into AI development.Read More0 Kommentare 0 Anteile 37 Ansichten
-
VENTUREBEAT.COMOpenAI overrode concerns of expert testers to release sycophantic GPT-4oOnce again, it shows the importance of incorporating more domains beyond the traditional math and computer science into AI development.Read More0 Kommentare 0 Anteile 35 Ansichten
-
VENTUREBEAT.COMRoblox breaks ground on data center in Brazil for early 2026At Gamescom Latam, Roblox announced it has broken ground on a new data center in Brazil -- slated to go live in early 2026.Read More0 Kommentare 0 Anteile 28 Ansichten
-
VENTUREBEAT.COMApple’s court loss to Epic Games is a stunning turnaround | The DeanBeatIt took more than four years, but Epic Games finally prevailed this week in its antitrust case against Apple, the world’s most valuable company with a market value of $3.2 trillion. And now it’s possible that the floodgates are open to bring competition and financial gains for mobile game companies on iOS. Epic CEO Tim Sweeney and his l…Read More0 Kommentare 0 Anteile 32 Ansichten
-
VENTUREBEAT.COMNCSoft makes strategic equity investment in FPS studio EmptyvesselSouth Korea's NCSoft completed a strategic equity investment in the Austin, Texas-based development studio, Emptyvessel.Read More0 Kommentare 0 Anteile 55 Ansichten
-
VENTUREBEAT.COMLamborghini joins DreamHack Dallas as a main partnerLamborghini announced it will be present at DreamHack as a partner as part of its efforts to reach a broader young audience.Read More0 Kommentare 0 Anteile 56 Ansichten
-
VENTUREBEAT.COMMonopoly Go passes $5B in gross bookings at a speed unseen in mobile gamingScopely announced today that Monopoly Go! has surpassed $5B in gross bookings, the fastest title in mobile game history to do so.Read More0 Kommentare 0 Anteile 50 Ansichten
-
VENTUREBEAT.COMHidden costs in AI deployment: Why Claude models may be 20-30% more expensive than GPT in enterprise settingsIt is a well-known fact that different model families can use different tokenizers. However, there has been limited analysis on how the process of “tokenization” itself varies across these tokenizers. Do all tokenizers result in the same number of tokens for a given input text? If not, how different are the generated tokens? How significant are the…Read More0 Kommentare 0 Anteile 53 Ansichten
-
VENTUREBEAT.COMHidden costs in AI deployment: Why Claude models may be 20-30% more expensive than GPT in enterprise settingsIt is a well-known fact that different model families can use different tokenizers. However, there has been limited analysis on how the process of “tokenization” itself varies across these tokenizers. Do all tokenizers result in the same number of tokens for a given input text? If not, how different are the generated tokens? How significant are the…Read More0 Kommentare 0 Anteile 54 Ansichten
-
VENTUREBEAT.COMMicrosoft launches Phi-4-Reasoning-Plus, a small, powerful, open weights reasoning model!The release demonstrates that with carefully curated data and training techniques, small models can deliver strong reasoning performance.Read More0 Kommentare 0 Anteile 55 Ansichten
-
VENTUREBEAT.COMGamefam launches Karate Kid Training Simulator on RobloxGamefam is launching Karate Kid Training Simulator, a title that will immerse players in the world of the upcoming film Karate Kid: Legends, on Roblox.Read More0 Kommentare 0 Anteile 55 Ansichten
-
VENTUREBEAT.COMAstronomer’s $93M raise underscores a new reality: Orchestration is king in AI infrastructureAstronomer secures $93 million in Series D funding to solve the AI implementation gap through data orchestration, helping enterprises streamline complex workflows and operationalize AI initiatives at scale.Read More0 Kommentare 0 Anteile 52 Ansichten
-
VENTUREBEAT.COMAstronomer’s $93M raise underscores a new reality: Orchestration is king in AI infrastructureAstronomer secures $93 million in Series D funding to solve the AI implementation gap through data orchestration, helping enterprises streamline complex workflows and operationalize AI initiatives at scale.Read More0 Kommentare 0 Anteile 52 Ansichten
-
VENTUREBEAT.COMSalesforce takes aim at ‘jagged intelligence’ in push for more reliable AISalesforce unveils groundbreaking AI research tackling "jagged intelligence," introducing new benchmarks, models, and guardrails to make enterprise AI agents more intelligent, trusted, and consistently reliable for business use.Read More0 Kommentare 0 Anteile 66 Ansichten
-
VENTUREBEAT.COMQwen swings for a double with 2.5-Omni-3B model that runs on consumer PCs, laptopsJoin our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese e-commerce and cloud giant Alibaba isn’t taking the pressure off other AI model providers in the U.S. and abroad. Just days after releasing its new, state-of-the-art open source Qwen3 large reasoning model family, Alibaba’s Qwen team today released Qwen2.5-Omni-3B, a lightweight version of its preceding multimodal model architecture designed to run on consumer-grade hardware without sacrificing broad functionality across text, audio, image, and video inputs. Qwen2.5-Omni-3B is a scaled-down, 3-billion-parameter variant of the team’s flagship 7 billion parameter (7B) model. (Recall parameters refer to the number of settings governing the model’s behavior and functionality, with more typically denoting more powerful and complex models). While smaller in size, the 3B version retains over 90% of the larger model’s multimodal performance and delivers real-time generation in both text and natural-sounding speech. A major improvement comes in GPU memory efficiency. The team reports that Qwen2.5-Omni-3B reduces VRAM usage by over 50% when processing long-context inputs of 25,000 tokens. With optimized settings, memory consumption drops from 60.2 GB (7B model) to just 28.2 GB (3B model), enabling deployment on 24GB GPUs commonly found in high-end desktops and laptop computers — instead of the larger dedicated GPU clusters or workstations found in enterprises. According to the developers, it achieves this through architectural features such as the Thinker-Talker design and a custom position embedding method, TMRoPE, which aligns video and audio inputs for synchronized comprehension. However, the licensing terms specify for research only — meaning enterprises cannot use the model to build commercial products unless they obtain a separate license from Alibaba’s Qwen Team, first. The announcement follows increasing demand for more deployable multimodal models and is accompanied by performance benchmarks showing competitive results relative to larger models in the same series. The model is now freely available for download from: Developers can integrate the model into their pipelines using Hugging Face Transformers, Docker containers, or Alibaba’s vLLM implementation. Optional optimizations such as FlashAttention 2 and BF16 precision are supported for enhanced speed and reduced memory consumption. Benchmark performance shows strong results even approaching much larger parameter models Despite its reduced size, Qwen2.5-Omni-3B performs competitively across key benchmarks: TaskQwen2.5-Omni-3BQwen2.5-Omni-7BOmniBench (multimodal reasoning)52.256.1VideoBench (audio understanding)68.874.1MMMU (image reasoning)53.159.2MVBench (video reasoning)68.770.3Seed-tts-eval test-hard (speech generation)92.193.5 The narrow performance gap in video and speech tasks highlights the efficiency of the 3B model’s design, particularly in areas where real-time interaction and output quality matter most. Real-time speech, voice customization, and more Qwen2.5-Omni-3B supports simultaneous input across modalities and can generate both text and audio responses in real time. The model includes voice customization features, allowing users to choose between two built-in voices—Chelsie (female) and Ethan (male)—to suit different applications or audiences. Users can configure whether to return audio or text-only responses, and memory usage can be further reduced by disabling audio generation when not needed. The Qwen team emphasizes the open-source nature of its work, providing toolkits, pretrained checkpoints, API access, and deployment guides to help developers get started quickly. The release also follows recent momentum for the Qwen2.5-Omni series, which has reached top rankings on Hugging Face’s trending model list. Junyang Lin from the Qwen team commented on the motivation behind the release on X, stating, “While a lot of users hope for smaller Omni model for deployment we then build this.” What it means for enterprise technical decision-makers For enterprise decision makers responsible for AI development, orchestration, and infrastructure strategy, the release of Qwen2.5-Omni-3B may appear, at first glance, like a practical leap forward. A compact, multimodal model that performs competitively against its 7B sibling while running on 24GB consumer GPUs offers real promise in terms of operational feasibility. But as with any open-source technology, licensing matters—and in this case, the license draws a firm boundary between exploration and deployment. The Qwen2.5-Omni-3B model is licensed for non-commercial use only under Alibaba Cloud’s Qwen Research License Agreement. That means organizations can evaluate the model, benchmark it, or fine-tune it for internal research purposes—but cannot deploy it in commercial settings, such as customer-facing applications or monetized services, without first securing a separate commercial license from Alibaba Cloud. For professionals overseeing AI model lifecycles—whether deploying across customer environments, orchestrating at scale, or integrating multimodal tools into existing pipelines—this restriction introduces important considerations. It may shift Qwen2.5-Omni-3B’s role from a deployment-ready solution to a testbed for feasibility, a way to prototype or evaluate multimodal interactions before deciding whether to license commercially or pursue an alternative. Those in orchestration and ops roles may still find value in piloting the model for internal use cases—like refining pipelines, building tooling, or preparing benchmarks—so long as it remains within research bounds. Data engineers or security leaders might likewise explore the model for internal validation or QA tasks, but should tread carefully when considering its use with proprietary or customer data in production environments. The real takeaway here may be about access and constraint: Qwen2.5-Omni-3B lowers the technical and hardware barrier to experimenting with multimodal AI, but its current license enforces a commercial boundary. In doing so, it offers enterprise teams a high-performance model for testing ideas, evaluating architectures, or informing make-vs-buy decisions—yet reserves production use for those willing to engage Alibaba for a licensing discussion. In this context, Qwen2.5-Omni-3B becomes less a plug-and-play deployment option and more a strategic evaluation tool—a way to get closer to multimodal AI with fewer resources, but not yet a turnkey solution for production. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.0 Kommentare 0 Anteile 70 Ansichten
-
VENTUREBEAT.COMThe ‘era of experience’ will unleash self-learning AI agents across the web—here’s how to prepareJoin our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More David Silver and Richard Sutton, two renowned AI scientists, argue in a new paper that artificial intelligence is about to enter a new phase, the “Era of Experience.” This is where AI systems rely increasingly less on human-provided data and improve themselves by gathering data from and interacting with the world. While the paper is conceptual and forward-looking, it has direct implications for enterprises that aim to build with and for future AI agents and systems. Both Silver and Sutton are seasoned scientists with a track record of making accurate predictions about the future of AI. The validity predictions can be directly seen in today’s most advanced AI systems. In 2019, Sutton, a pioneer in reinforcement learning, wrote the famous essay “The Bitter Lesson,” in which he argues that the greatest long-term progress in AI consistently arises from leveraging large-scale computation with general-purpose search and learning methods, rather than relying primarily on incorporating complex, human-derived domain knowledge. David Silver, a senior scientist at DeepMind, was a key contributor to AlphaGo, AlphaZero and AlphaStar, all important achievements in deep reinforcement learning. He was also the co-author of a paper in 2021 that claimed that reinforcement learning and a well-designed reward signal would be enough to create very advanced AI systems. The most advanced large language models (LLMs) leverage those two concepts. The wave of new LLMs that have conquered the AI scene since GPT-3 have primarily relied on scaling compute and data to internalize vast amounts of knowledge. The most recent wave of reasoning models, such as DeepSeek-R1, has demonstrated that reinforcement learning and a simple reward signal are sufficient for learning complex reasoning skills. What is the era of experience? The “Era of Experience” builds on the same concepts that Sutton and Silver have been discussing in recent years, and adapts them to recent advances in AI. The authors argue that the “pace of progress driven solely by supervised learning from human data is demonstrably slowing, signalling the need for a new approach.” And that approach requires a new source of data, which must be generated in a way that continually improves as the agent becomes stronger. “This can be achieved by allowing agents to learn continually from their own experience, i.e., data that is generated by the agent interacting with its environment,” Sutton and Silver write. They argue that eventually, “experience will become the dominant medium of improvement and ultimately dwarf the scale of human data used in today’s systems.” According to the authors, in addition to learning from their own experiential data, future AI systems will “break through the limitations of human-centric AI systems” across four dimensions: Streams: Instead of working across disconnected episodes, AI agents will “have their own stream of experience that progresses, like humans, over a long time-scale.” This will allow agents to plan for long-term goals and adapt to new behavioral patterns over time. We can see glimmers of this in AI systems that have very long context windows and memory architectures that continuously update based on user interactions. Actions and observations: Instead of focusing on human-privileged actions and observations, agents in the era of experience will act autonomously in the real world. Examples of this are agentic systems that can interact with external applications and resources through tools such as computer use and Model Context Protocol (MCP). Rewards: Current reinforcement learning systems mostly rely on human-designed reward functions. In the future, AI agents should be able to design their own dynamic reward functions that adapt over time and match user preferences with real-world signals gathered from the agent’s actions and observations in the world. We’re seeing early versions of self-designing rewards with systems such as Nvidia’s DrEureka. Planning and reasoning: Current reasoning models have been designed to imitate the human thought process. The authors argue that “More efficient mechanisms of thought surely exist, using non-human languages that may, for example, utilise symbolic, distributed, continuous, or differentiable computations.” AI agents should engage with the world, observe and use data to validate and update their reasoning process and develop a world model. The idea of AI agents that adapt themselves to their environment through reinforcement learning is not new. But previously, these agents were limited to very constrained environments such as board games. Today, agents that can interact with complex environments (e.g., AI computer use) and advances in reinforcement learning will overcome these limitations, bringing about the transition to the era of experience. What does it mean for the enterprise? Buried in Sutton and Silver’s paper is an observation that will have important implications for real-world applications: “The agent may use ‘human-friendly’ actions and observations such as user interfaces, that naturally facilitate communication and collaboration with the user. The agent may also take ‘machine-friendly’ actions that execute code and call APIs, allowing the agent to act autonomously in service of its goals.” The era of experience means that developers will have to build their applications not only for humans but also with AI agents in mind. Machine-friendly actions require building secure and accessible APIs that can easily be accessed directly or through interfaces such as MCP. It also means creating agents that can be made discoverable through protocols such as Google’s Agent2Agent. You will also need to design your APIs and agentic interfaces to provide access to both actions and observations. This will enable agents to gradually reason about and learn from their interactions with your applications. If the vision that Sutton and Silver present becomes reality, there will soon be billions of agents roaming around the web (and soon in the physical world) to accomplish tasks. Their behaviors and needs will be very different from human users and developers, and having an agent-friendly way to interact with your application will improve your ability to leverage future AI systems (and also prevent the harms they can cause). “By building upon the foundations of RL and adapting its core principles to the challenges of this new era, we can unlock the full potential of autonomous learning and pave the way to truly superhuman intelligence,” Sutton and Silver write. DeepMind declined to provide additional comments for the story. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.0 Kommentare 0 Anteile 57 Ansichten
Mehr Artikel