TAI #143: New Scaling Laws Incoming? Ilyas SSI Raises at $30bn,...

@TowardsAI shared a link

2025-03-11 20:25:22 ·

TAI #143: New Scaling Laws Incoming? Ilyas SSI Raises at $30bn, Manus Takes AI Agents Mainstream

Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by LouieAs Ilya Sutskevers Safe SuperIntelligence (SSI) secures another $2bn round at a hefty $30bn valuation, speculation has grown around what he is working on and whether he will discover yet more groundbreaking scaling laws for AI. While another scaling breakthrough would be exciting, an alternative, pragmatic pathway to progress AI capabilities continues to emerge building advanced agents on top of existing foundation models. China-based startup Monica is proving precisely this point with Manus, their invite-only multi-agent product, which has rapidly captured attention despite not developing their own base LLM. Instead, Manus stitches together Claude 3.5 Sonnet and custom fine-tuned open-source Qwen models, paired with specialized tools and sandboxes, to autonomously tackle real-world complex tasks.Manuss architecture neatly divides into two key highly specialized layers: the planner, powered by fine-tuned Qwen models optimized for strategic reasoning and task decomposition, and the executor, driven by Claude 3.5 Sonnet alongside a diverse set of 29 dedicated sub-agents. This system demonstrates remarkable capabilities by seamlessly integrating code execution, web browsing, multi-file code management, and interactive frontend generation features reminiscent of recent advanced tools like Cursor, OpenAIs Operator and Deep Research agents, and Claudes Artifact UI. Manuss success emerges from coherently assembling these previously separate functionalities into a unified agent framework, unlocking greater autonomy and practical utility. Its GAIA benchmark performance reflects this clearly: scoring an impressive 86.5% on simpler Level 1 questions which easily surpasses OpenAI Deep Researchs result (74.3%). Even on more complex, Level 3 multi-step tasks, Manus leads notably, achieving 57.7% versus OpenAI Deep Researchs 47.6%.Yet, despite Monicas innovation using existing models, even more could be unlocked with improvements to base model intelligence. Ilya Sutskever, previously at Google and OpenAI, has been intimately involved in many of the major Deep Learning and LLMs breakthroughs in the past 1015 years.Ilya Sutskevers SSIs 5x valuation increase to $30bn in less than six months has raised speculation on what he has been working on (in heavy secrecy, reportedly requiring job candidates to leave phones in a Faraday cage before entering its offices). Ilya has consistently been central to major breakthroughs in deep learning scaling laws and training objectives for LLMs, making it plausible hes discovered yet another one. Indeed, clues from recent interviews suggest precisely this. Ilya himself in September mentioned discovering a different mountain to climb, hinting at a new scaling law. Everyone just says scaling hypothesis, he noted pointedly. But scaling what?.Ilya first demonstrated GPU-driven neural network scaling with AlexNet in 2012 alongside Geoffrey Hinton and Alex Krizhevsky, paving the way for dramatically accelerating model depth, performance, and computational intensity. While he didnt invent the next-token prediction objective (which was a much earlier technique) or the transformer architecture introduced in 2017, he laid essential groundwork for transformers with sequence-to-sequence (seq2seq) models. He also crucially pushed OpenAIs strategic decision to massively scale next-token prediction using GPUs and transformers, thus pushing data scaling bottlenecks (and corresponding useful compute scaling) to the scale of the entire internet. Most recently, Ilyas foundational contributions to test-time compute reportedly laid the groundwork for development into Q* and o1 by Jakub Pachocki and Szymon Sidor. This approach led to a new training objective predicting full solutions to verifiable problems and introduced both a new training scaling regime (reinforcement learning with verifiable rewards or RLVR) and new inference-time scaling laws.If Ilya is indeed onto yet another new scaling mechanism and SSIs rapid valuation jump seems to suggest investors belief this would mark quite a breakout from the many years we spent focused only on the next token prediction objective and scaling just pre-training data and parameters. Scaling both the new RLVR training method and corresponding inference time tokens alone might well be sufficient for approaching AGI-like capabilities across many human standalone tasks (particularly together with Agent pipelines and LLM Developers using reinforcement fine tuning to customize models to different tasks). New training objectives on the other hand could accelerate this and also unlock entirely new types of intelligence and categories of AI capability.Why should you care?The convergence of new scaling paradigms and advanced agent architectures suggests an approaching tipping point. Companies like Monica with Manus demonstrate how effectively existing models can be recombined to produce substantial leaps in real-world task performance. At the same time, breakthroughs from Ilya and SSI, or indeed any of the AI labs or even individual researchers, may fundamentally alter what we even think of as scalable AI, setting the stage for a far broader spectrum of intelligence capabilities. For developers and entrepreneurs alike, this dual innovation track practical agent integration versus groundbreaking foundational shifts offers compelling paths forward. While waiting for the next great leap, significant competitive advantages can still be gained today by intelligently leveraging and refining existing tools into specialized agents. But make no mistake: if Ilya is indeed pioneering another new scaling law, AIs landscape may soon be reshaped once again.This issue is brought to you thanks to NVIDIA GTC:Join Us at NVIDIA GTC The AI Event of the Year!NVIDIA GTC is back, and its shaping up to be one of the biggest AI events of the year! Running from March 17 to 21 in San Jose, CA, GTC will bring together developers, researchers, and business leaders to explore cutting-edge advancements in AI, accelerated computing, and data science.Theres a packed agenda, including:Keynote by NVIDIA CEO Jensen Huang covering AI agents, robotics, and the future of accelerated computingThe Rise of Humanoid Robots exploring how AI is pushing robotics forwardAI & Computing Frontiers with Yann LeCun and Bill Dally a deep dive into where AI is headedIndustrial AI & Digitalization how AI is transforming industries in the physical worldHands-on Workshops & Training Labs practical sessions on AI, GPU programming, and moreOur CTO, Louis-Franois Bouchard, will be attending, so if youre around, lets connect! March 1721 San Jose, CA & OnlineHottest News1. Alibaba Released Its QwQ-32B Model Based on High Scale Reinforcement Learning TechniquesAlibabas Qwen team has introduced QwQ-32B, a 32-billion-parameter AI model designed for advanced reasoning, coding, and math problem-solving. Because of reinforcement learning, it performs on par with larger models like DeepSeek R1. QwQ-32B is open-source under Apache 2.0 and available on Hugging Face and ModelScope.2. AI Pioneers Andrew Barto and Richard Sutton Win 2025 Turing Award for Groundbreaking Contributions to Reinforcement LearningAndrew Barto and Richard Sutton, pioneers of reinforcement learning, have won the 2024 Turing Award for their groundbreaking contributions to AI. Their work laid the foundation for modern AI systems like chatbots, autonomous vehicles, and personalized recommendations. Their work also bridged AI and neuroscience, revealing insights into dopamines role in human and machine learning.3. Microsoft Reportedly Ramps Up AI Efforts To Compete With OpenAIMicrosoft is developing its own AI reasoning models called MAI, to reduce reliance on OpenAI and enhance its AI offerings. It is reportedly training much larger models relative to its more famous synthetic data-focused Phi series. These new models have been tested as potential replacements for OpenAIs technology in Microsofts 365 Copilot system. Additionally, Microsoft plans to unveil future developments for its Copilot AI companion at a special event on April 4th, marking its 50th anniversary.4. Chinas Second DeepSeek Moment? Meet Manus, the First General AI AgentManus, developed by Chinese startup Monica, is an autonomous AI agent designed to handle complex tasks independently. Since its beta launch on March 6, 2025, it has generated significant buzz, with some comparing its impact to DeepSeek. Available by invitation only, it has sparked excitement among users eager to test its capabilities.5. Mistral AI Introduced Mistral OCRMistral launched Mistral OCR, a multimodal OCR API that converts PDFs into AI-ready Markdown files, facilitating easier AI model ingestion. It outperforms competitors in complex and non-English documents and integrates them into RAG systems. Mistral OCR is available on its API platform and cloud partners, offering on-premise deployment for sensitive data handling.6. Google Searchs New AI Mode Lets Users Ask Complex, Multi-Part QuestionsGoogle enhances its search experience by introducing expanded AI-generated overviews and a new AI Mode. The AI overviews will now cover a broader range of topics and be accessible to more users, including those not logged into Google. The experimental AI Mode, currently available to Google One AI Premium subscribers, offers a search-centric AI chatbot experience, providing generated answers based on Googles search index.7. Microsoft Dragon Copilot Provides the Healthcare Industrys First Unified Voice AI AssistantMicrosoft launched Dragon Copilot, a unified AI voice assistant for healthcare. Designed to alleviate clinician burnout and streamline documentation, Dragon Copilot aims to improve efficiency and patient experiences while supporting healthcare workers across various settings with its advanced speech and task automation capabilities, rolling out in select regions.Five 5-minute reads/videos to keep you learning1. Starter Guide for Running Large Language Models LLMsThis article is a practical guide to running LLMs, covering key considerations like balancing model size and dataset requirements using scaling laws such as Chinchilla. It also highlights the importance of proper dataset preprocessing like tokenization and cleaning to improve efficiency.2. What Changed in the Transformer ArchitectureThis article explores key improvements in Transformer architecture since 2017, focusing on efficiency and scalability. It covers the shift from sinusoidal positional encodings to Rotary Positional Embeddings (RoPE) for better handling of long sequences, the adoption of pre-layer normalization for more stable training, and the introduction of Grouped-Query Attention (GQA) to reduce computational costs.3. AIs Butterfly Effect: Early Decisions Matter More Than You ThinkBased on insights from Polyas Urn Model, this article shows how an initial random bias can have lasting effects on an AI systems learning trajectory. The insights derived from Polyas Urn Model deepen our understanding of the interplay between chance and choice and encourage a more thoughtful approach to managing data biases and long-term trends in complex systems.4. The Rise of Diffusion LLMsThis article explores diffusion-based LLMs, a novel approach to text generation that refines noisy data into structured outputs. It discusses how these models differ from traditional autoregressive LLMs, their potential benefits in reducing biases and improving efficiency, and their challenges in real-world applications.5. AI Is Killing Some Companies, yet Others Are Thriving Lets Look at the DataThis article explores how AI-powered search and chatbots are reshaping the digital landscape, hitting some companies hard while leaving others untouched. It looks at why platforms like WebMD, G2, and Chegg are losing traffic as AI delivers instant answers, while sites like Reddit and Wikipedia remain strong. It also argues that user-generated content and community-driven platforms may have a built-in advantage in an AI-dominated world.6. DeepSeek-V3/R1 Inference System OverviewThe article provides an overview of DeepSeeks inference system for their V3 and R1 models, focusing on optimizing throughput and reducing latency. It also discusses strategies to address these challenges such as increased system complexity due to cross-node communication and the need for effective load balancing across Data Parallelism (DP) instances.Repositories & Tools1. MetaGPT is an AI framework that acts like a software team, breaking down a simple request into detailed project plans, code, and documentation.2. Light R1 introduces Light-R132B, a 32-billion-parameter language model optimized for mathematical problem-solving.Top Papers of The Week1. START: Self-taught Reasoner with ToolsThe paper introduces START, a self-taught reasoning LLM that integrates external tools. This integration allows START to perform complex computations, self-checking, and debugging, addressing limitations like hallucinations found in traditional reasoning models. It uses Hint-infer (prompting tool use) and Hint-RFT (fine-tuning with filtered reasoning steps) to enhance accuracy. START, built on QwQ-32B, outperforms its base model and rivals top-tier models on math, science, and coding benchmarks.2. Predictive Data Selection: The Data That Predicts Is the Data That TeachesResearchers have introduced Predictive Data Selection (PreSelect), a method enhancing language model pretraining by using fastText-based scoring for efficient data selection. Models trained on 30 billion tokens selected with PreSelect outperform those trained on 300 billion vanilla tokens, reducing compute needs tenfold. PreSelect also surpasses other methods, like DCLM and FineWeb-Edu, in 3 billion parameter models.3. Unified Reward Model for Multimodal Understanding and GenerationUnifiedReward, a novel model for multimodal understanding and generation assessment, improves image and video preference alignment. By training on a large-scale human preference dataset, UnifiedReward facilitates pairwise ranking and pointwise scoring.4. Babel: Open Multilingual Large Language Models Serving Over 90% of Global SpeakersBabel introduces an open multilingual large language model that covers the top 25 languages and supports over 90% of the global population. Babel employs a layer extension technique and elevates performance with two variants: Babel-9B for efficient use and Babel-83B, which sets new standards. Both variants demonstrate superior multilingual task performance compared to similar open LLMs.5. Large-Scale Data Selection for Instruction TuningThe paper examines large-scale data selection for instruction tuning and testing methods on datasets of up to 2.5M samples. It finds that many selection techniques underperform random selection at scale, while a simple representation-based method (RDS+) is both effective and efficient.Quick Links1. Google debuts a new Gemini-based text embedding model. Google claims that Gemini Embedding surpasses the performance of its previous embedding model, text-embedding-004, and achieves competitive performance on popular embedding benchmarks. Compared to the previous model, Gemini Embedding can accept larger chunks of text and code simultaneously and supports over 100 languages.2. Cohere released a multimodal open AI model called Aya Vision. It can perform tasks such as writing image captions, answering questions about photos, translating text, and generating summaries in 23 major languages. Cohere is also making Aya Vision available for free through WhatsApp.3. Anthropic has launched an upgraded Anthropic Console that lets everyone in your company collaborate on AI. The updated platform also introduces extended thinking controls for Claude 3.7 Sonnet, allowing developers to specify when the AI should use deeper reasoning while setting budget limits to control costs.Whos Hiring in AIData Scientist Python @Motion Recruitment Partners (Florida, USA)ML Engineer @Numerator (Remote/India)Software Engineer, AI Decisioning @Hightouch (Remote/North America)Gen AI Consultant @Capco (Pune, India)Natural Language Processing (NLP) Intern @IMO Health (Hybrid/Texas, USA)Junior Data Scientist Intern @INTEL (Hybrid/Singapore)Software Engineer, GenAI Enablement @Principal Financial Group (Multiple US Locations)Interested in sharing a job opportunity here? Contact [emailprotected].Think a friend would enjoy this too? Share the newsletter and let them join the conversation.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI

0 Comments ·0 Shares ·41 Views

Upgrade to Pro