TOWARDSAI.NET
LAI #72: From Python Groundwork to Function Calling, ICL Theory, and Load Balancing MoEs
Author(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts! This week’s issue bridges two ends of the spectrum: the foundations you need to get started, and the nuanced tools and ideas shaping how we build with AI today. We begin with a clear, approachable guide to Python and core computer science concepts — ideal if you’re just starting out or brushing up on the basics. But from there, things go deeper. You’ll learn how to train NanoGPT to handle function calling natively — no prompt tricks required. We explore how to turn raw data into business-ready rules, improve forecasting with adaptive decay, and evaluate LLM performance with statistical rigor. And if you’ve been following our DeepSeek series, this week’s feature on load balancing without auxiliary loss closes the loop with a surprisingly elegant solution. What’s AI Weekly This week in What’s AI, I dive into Python Fundamentals and CS Concepts. This is meant to be a one-stop starter guide for a total programming beginner. I’ll take things one step at a time and use examples to explain each concept. Don’t worry, if you don’t grasp all concepts from just this single article, you can always learn more about them in our Python course. Start your learning with this article or watch the video on YouTube, and practice these concepts to really understand them! — Louis-François Bouchard, Towards AI Co-founder & Head of Community Learn AI Together Community section! Featured Community post from the Discord Blondu0994 has built an all-in-one platform for translations, transcriptions, OCR, PDF/Word/Excel conversions, and electronic signatures. It is powered by AI, fully automated, and runs without commercial APIs. He is looking for feedback, go check it out and support a fellow community member. If you have any questions about the tool, reach out in the thread! AI poll of the week! While the polls show most of you use 4o, the discussion in the thread has moved from OpenAI to Deepseek, Perplexity, and Gemini. Is price guiding this decision, or performance? Also curious to know why anyone still isn’t using Grok. Tell me in the thread on Discord! Collaboration Opportunities The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week! 1. Uwaix. wants to do some research in AI and is looking for people who’d like to join them. If you have any topic ideas or want to pursue research, connect with them in the thread! 2. _madara_uchiha_ is exploring numpy and other Python libraries and is looking for an accountability partner available to study for three hours per day. If you have the time and are focusing on the same topics, reach out to him in the thread! Meme of the week! Meme shared by bin4ry_d3struct0r TAI Curated Section Article of the week From First Principles: Building Function Calling by Fine-tuning NanoGPT By Suyash Harlalka This blog provides a detailed walkthrough for implementing function calling by fine-tuning a NanoGPT-like model using only PyTorch and Tiktoken. Unlike methods requiring function definitions in prompts, this approach trains the model to generate structured outputs directly, improving efficiency. It explains dataset requirements, tokenizer adjustments with special tokens, custom loss masking techniques during training, and the overall training execution. The model’s progress is illustrated through examples at different training stages. Developers and researchers interested in a low-level understanding of LLM customization and efficient function calling implementation without high-level library abstractions will find this guide informative. Our must-read articles 1. Extracting Actionable Rules from Raw Data By Nehdiii This work details methods for extracting interpretable business rules from data using Decision Tree Classifiers, useful when speed or clarity is preferred over complex models. It covered decision tree theory, including Gini impurity, and offered a practical guide using sklearn with a bank marketing dataset. Key steps involved building the model, programmatically parsing the tree structure for rules, and addressing categorical feature encoding. Different strategies, like count and target encoding (with smoothing for high-cardinality features), were compared. 2. Adaptive Decay-Weighted ARMA: A Novel Approach to Time Series Forecasting By Shenggang Li This article presents Adaptive Decay-Weighted ARMA, a time series forecasting approach addressing the limitation of traditional models that treat all past data equally. It assigns higher importance to recent observations using a decay function in the loss calculation, with the decay rate adaptable or learned from the data. The method integrates standard AR lags, moving averages, and seasonal components. Empirical tests on electricity production data showed this technique, particularly with a learned decay factor, achieved lower Mean Absolute Percentage Error (MAPE) compared to standard AR, ARMA(1,1), and AR-cycle models across various forecast horizons, demonstrating improved predictive accuracy. 3. In-Context Learning Explained Like Never Before By Allohvk This article examined In-Context Learning (ICL), an emergent capability where Large Language Models (LLMs) learn tasks from prompt examples without fine-tuning. It reviewed several proposed mechanisms behind this phenomenon. Explanations include pattern completion, induction heads copying concepts, nearest-neighbor search, and Bayesian inference. A prominent theory suggests attention mechanisms simulate gradient descent during inference, learning by adjusting activations based on prompt examples. Understanding these different theories provides deeper insight into LLM capabilities. 4. Data-Driven LLM Evaluation with Statistical Testing By Robert Martin-Short This piece explored using empirical statistical methods, specifically bootstrap and permutation testing, to evaluate improvements in LLM applications. It tackled the challenge of assessing non-deterministic outputs by applying these tests to evaluation metrics, demonstrated through an example of enhancing medical note summaries based on readability scores. The analysis showed how statistical significance can quantify confidence in iterative prompt changes, considering the inherent variability in LLM outputs. This data-driven approach helps confirm whether observed performance gains are meaningful. 5. DeepSeek-V3 Explained Part 3: Auxiliary-Loss-Free Load Balancing By Nehdiii As the third part in a series on DeepSeek-V3’s architecture (which previously covered Multi-head Latent Attention and DeepSeekMoE), this piece details its auxiliary-loss-free load balancing technique for Mixture-of-Experts (MoE) models. It outlined the necessity of load balancing to prevent issues like route collapse and training instability. Prior methods, including auxiliary loss functions (risking gradient interference) and Expert Choice (with causality concerns), were reviewed. DeepSeek’s approach directly adjusts gating scores using an expert-wise bias based on token assignments, circumventing auxiliary losses while preserving causality. Evaluations indicated this method achieves a favorable balance between model performance and load distribution. If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
0 Comments 0 Shares 12 Views