towardsai.net
Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by LouieThis week, the LLM race was blown wide open with Deepseeks open-source release of R1. Performance is close to o1 in most benchmarks. Built on top of DeepSeeks v3 model, R1 API output token prices are 30x less than o1. Its available under the MIT license, supporting commercial use and modifications. Deepseek also disclosed many of its methods and experiments in its paper, in stark contrast to the secrecy surrounding reasoning techniques at AI labs in the U.S.R1 wasnt the only huge LLM release from China this week. Two new LLM competitors hit the ground running with very strong models. MiniMax-01, a 456bn parameter Mixture of Experts Model, challenges Googles Gemini models for SoTA in long context capabilities. It offers 4 million input context due to its new Lightning Attention (hybrid) architecture. Kimi-1.5, on the other hand, is another new reasoning model that challenges o1 on multimodal capabilities.Deepseeks release included three different models/ model families:DeepSeek-R1-Zero was an experiment that applied reinforcement learning (RL) directly to a base language model (V3) without any prior supervised fine-tuning. In essence, they attempted to teach the model to reason purely through trial and error, providing it with rewards for correct answers and well-formatted responses. This is somewhat analogous to how AlphaZero mastered games like Go and chess, learning solely through self-play and a reward signal based on winning or losing. The results were very impressive on many benchmarks; however, it fell short in some fields, and the models output was often messy and hard to read.To address the limitations of R1-Zero and enhance its reasoning abilities further, the DeepSeek team introduced R1, which incorporated a cold start of human-like reasoning data before applying reinforcement learning. This involved creating a small dataset of examples demonstrating desired reasoning patterns and output formats. This was followed by a multi-stage process. First, reasoning-oriented RL was applied, focusing on tasks with clear solutions, like math and coding. Then, they generated a new batch of high-quality data samples for fine-tuning, created by filtering model outputs during the RL phase. Finally, they applied a final round of reinforcement learning, this time focusing on general helpfulness and harmlessness in addition to reasoning.Across key benchmarks like AIME 2024, Codeforces, GPQA Diamond, and MATH-500, DeepSeek-R1 consistently performs on par with OpenAIs o1 (79.8 vs. 79.2, 96.3 vs. 96.6, 71.5 vs. 75.7, and 97.3 vs. 96.4, respectively). They also got very similar performance on the SWE-bench Verified coding challenge (49.2 vs 48.9).The final piece of DeepSeeks work involved distilling the advanced reasoning capabilities of R1 into smaller, cheaper, dense models (Llama and Qwen series). Using the larger R1 model as a teacher, they fine-tuned several smaller models (ranging from 1.5B to 70B parameters) on the high-quality data curated from the R1 training process. The smaller distilled models significantly outperformed other models of similar sizes and even rivaled much larger models on reasoning benchmarks. DeepSeek-R1 outputs distilled into the tiny Qwen-1.5B even beat 4o on some math and code benchmarks!Why should you care?DeepSeek-R1s release is significant for several reasons. First, its open-source nature and competitive performance at a fraction of the cost of o1 democratizes access to advanced reasoning capabilities. The API costs of DeepSeek-R1 per million tokens are currently $0.14 for cached inputs, $0.55 for non-cached inputs, and $2.19 for outputs. In contrast, the API costs for o1 are respectively $7.5, $15, and $60. About a x30 difference in costs! Moreover, the open model weights open up huge opportunities for adapting and fine-tuning these models for different domains and industries. The open release of its training methods also provides a blueprint for many others to follow. One surprise from the paper was that simpler techniques for enabling reasoning abilities worked better than some more complex options. We think there is a huge area for exploring and experimenting with these techniques now that scaled reinforcement learning for LLMs has been unlocked!The huge success shown by distilling big reasoning models into much smaller non-reasoning models also suggests we will get another wave of rapid improvement and cost reduction across the LLM spectrum.The fact a Chinese company is leading this charge also adds a geopolitical dimension, particularly given that Deepseek has managed to achieve this despite GPU export restrictions and a far smaller budget than Western AI labs.Introducing Our Brand New 8-hour Generative AI Primer CourseA programming language-agnostic 1-day LLM Bootcamp designed for developers.95% of developers I meet are only scratching the surface of what LLMs can do. When working with LLMs, you are CONSTANTLY making decisions such as open-source vs. closed-source, how to fit LLMs into your use case, whether no-code solutions are good enough for your workflow, the extent to which consider the limitations of LLMs, and so on. And the biggest gap we see on top of all this is whether you are using LLMs to their full capacity, even with chat interfaces like ChatGPT or APIs for models like Gemini. The question is: are you?This certification course is specifically designed to cut through the noise, help you ask the right questions, and show you exactly how to find answers. LLMs are moving so fast, with updates being released almost every day; what you need is an intuitive framework, and just like LLMs, you need enough context to know what developments are relevant to you and your use case so you can make the most out of this transformative technology.In just 8 hours, through lessons, videos, exercises, quizzes, and hands-on projects, youll:Dive deep into the psyche of LLMs: how they work, how to make them work better, and how to train them for tasks you hate doing.Work with leading AI models and integrate them into your workflows seamlessly.Build your own no-code/low-code prototype that brings your ideas to life.Youll finish before you even realize it, and by tomorrow, youll already be AI-proofed. Secure your spot now!Hottest News1. OpenAI Released Scheduled Tasks in ChatGPTOpenAI has introduced scheduled tasks in ChatGPT for Plus, Pro, and Team plans. These allow automated prompts and notifications on the Web, iOS, Android, and MacOS. Users can assign tasks like daily updates or reminders and receive notifications via push or email. Windows support will follow in Q1. Currently, a limit of 10 active tasks is enforced.2. Chinese AI Company MiniMax Releases New ModelsChinese AI company MiniMax, an Alibaba- and Tencent-backed startup, debuted three new models. MiniMax-Text-01 is a text-only model, while MiniMax-VL-01 can understand images and text. T2A-01-HD, meanwhile, generates audio specifically speech. MiniMax claims that MiniMax-Text-01 performs better than models such as Gemini 2.0 Flash and MiniMax-VL-01 rivals Claude 3.5 Sonnet.3. Kimi Launches New SOTA Multimodal ModelBeijing Moonlit Dark Side Technology introduced the new Kimi k1.5 multimodal thinking model. Updates include long context extension, improved policy optimization, and multimodality. Its report shows their Sota short-CoT performance outperforms GPT-4o and Claude Sonnet 3.5 on AIME, MATH-500, and LiveCodeBench by a large margin.4. Alibaba Slashes Prices on LLMs by Up to 85% As Chinas AI Rivalry Heats UpAlibaba Cloud announced an 85% price reduction on its Qwen-VL visual language model. The move demonstrates how competition among Chinas technology giants to win more business for their nascent artificial intelligence products is intensifying.5. Google Is Forming a New Team To Build AI That Can Simulate the Physical WorldGoogle is forming a new team led by Tim Brooks under DeepMind to build AI models for simulating the physical world, collaborating with Gemini, Veo, and Genie teams on world models. These models aid in video generation, multimodal data, and interactive environments.6. Mistral Signs Deal With AFP To Offer Up-to-Date Answers in Le ChatMistral has announced a content deal with newswire Agence France-Presse (AFP) to improve the accuracy of answers in Le Chat, Mistrals chatbot. Le Chat will be able to tap into AFPs stories around 2,300 stories per day in six languages and query AFPs entire archive dating back to 1983.7. President Trump Repeals Bidens AI Executive OrderPresident Donald Trump revoked a 2023 executive order signed by former President Joe Biden that sought to reduce the potential risks AI poses to consumers, workers, and national security. During his campaign, Trump promised policies to support AI development rooted in free speech and human flourishing.Five 5-minute reads/videos to keep you learning1. Retrieval-Augmented Generation (RAG) vs. Cache-Augmented Generation (CAG): A Deep Dive Into Faster, Smarter Knowledge IntegrationRetrieval-augmented generation (RAG) and cache-augmented generation (CAG) are two methodologies for generating more context-aware responses from LLMs. This article provides an extensive, step-by-step guide on both approaches, dives into their workflows, compares their advantages and drawbacks, and offers an implementation guide for CAG.2. Why AI Language Models Choke On Too Much TextGPUs revolutionized AI by enabling massive parallel processing, leading to transformer models scaling rapidly. Despite advancements, transformers remain inefficient with long contexts due to quadratic compute costs. This article discusses why this happens and shares some approaches to solving this problem.3. Simplifying Alignment: From RLHF To Direct Preference Optimization (DPO)This article explores how Direct Preference Optimization (DPO) simplifies aligning large language models with human preferences over Reinforcement Learning with Human Feedback (RLHF). It breaks down the math and highlights why DPO might be the smarter, easier way forward.4. Mastering Data Scaling: The Only Guide Youll Ever Need (Straight From My Journey)Data scaling is a crucial step in ensuring optimal model function. It prepares datasets for machine learning models. This article discusses why scaling is important, its types, and how and when to apply it.5. Takes On Alignment Faking in Large Language ModelsResearchers revealed that Claude 3 Opus fakes alignment with training objectives to avoid behavioral modification a phenomenon labeled alignment faking. This author shares their take on the results.Repositories & ToolsThe micro diffusion repository demonstrates the training of large-scale diffusion models from scratch on a minimal budget.LocalAI is a free, open-source alternative to OpenAI, Claude, and others.Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot.Agentless is an agentless approach to automatically solve software development problems.CopilotKit provides React UI and infrastructure for AI Copilots, in-app AI agents, AI chatbots, and more.Top Papers of The Week1. LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMsLlamaV-o1 redefines step-by-step visual reasoning in large language models by introducing a benchmark with eight challenge categories and a metric for granular evaluation. The multimodal model, trained through multi-step curriculum learning, surpasses existing models like Llava-CoT by 3.8% in performance across six benchmarks and runs five times faster during inference.2. KaLM-Embedding: Superior Training Data Brings A Stronger Embedding ModelResearchers developed KaLM-Embedding, a multilingual embedding model using high-quality, diverse training data. Techniques like persona-based synthetic data, ranking consistency filtering, and semi-homogeneous task batch sampling enhance its performance. The model excels in multilingual embedding tasks, outperforming others of similar size on the MTEB benchmark.3. Titans: Learning to Memorize at Test TimeThis paper introduces a new family of architecture called Titans based on a new neural long-term memory module. The module learns to memorize historical context and helps attention to attend to the current context while utilizing long-past information. Experimental results show that Titans are more effective than Transformers and recent modern linear recurrent models.4. Transformer 2: Self-adaptive LLMsThis paper introduces Transformer 2, a framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, Transformer 2 employs a dispatch system to identify the task properties, and then task-specific expert vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. It outperforms approaches such as LoRA with fewer parameters.Quick Links1. Six charts about AI revenue. OpenAI captures approximately 62.5% of consumer AI spending. xAIs revenue jumped from $5M to $100M, while OpenAI soared from $200M to $5B. Sapphire Ventures reports 28 AI-native companies exceeding $25MM in ARR, predicting substantial growth for AI-native startups in the coming year.2. DeepSeek-R1 achieves performance comparable to OpenAIs o1 system across mathematics, coding, and general reasoning tasks, cementing its place as a leading competitor. DeepSeek has open-sourced DeepSeek-R1-Zero and DeepSeek-R1, along with six smaller distilled models.Whos Hiring in AIApplied AI Engineer, Applied Science @Mistral AI (Paris, France)Cambridge Internship in ML Model Optimization @Microsoft Corporation (Cambridge, United Kingdom)Machine Learning Software Engineering Undergraduate Intern @INTEL (Santa Clara, CA, USA)Tech Consulting AI LLM Developer Manager @Accenture (Multiple Locations)Full-Stack Developer (React + Python + Azure) @Solvd (Remote)GenAI/Machine Learning Technical Project Manager @Deloitte (Multiple US Locations)Interested in sharing a job opportunity here? Contact [emailprotected].Think a friend would enjoy this too? Share the newsletter and let them join the conversation.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI