
TOWARDSAI.NET
Phi-4 Reasoning Models
Author(s): Naveen Krishnan
Originally published on Towards AI.
Image Source — Author
Massive large language models (LLMs) dominated headlines, showcasing impressive feats of text generation and comprehension. However, a new wave is cresting — the era of Small Language Models (SLMs). These compact yet surprisingly capable models are challenging the status quo, proving that exceptional performance doesn’t always require colossal size and resources. Microsoft has been at the forefront of this movement with its Phi family of models. Following the success of Phi-3, which demonstrated remarkable abilities for its size, Microsoft has taken another significant leap forward with the introduction of the Phi-4 reasoning models.
These new additions Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning — are specifically engineered to excel at complex reasoning tasks, a domain previously thought to be the exclusive territory of much larger models. They represent a pivotal moment, bringing sophisticated reasoning capabilities within reach for a wider range of applications and hardware.
The Phi-4 Reasoning Family: More Than Just Words
Before getting into the specifics of the Phi-4 models, it’s helpful to understand what sets “reasoning models” apart. While standard language models excel at predicting the next word in a sequence, reasoning models are trained to go further. They leverage techniques like inference-time scaling to tackle complex problems that require breaking tasks down into multiple steps, performing internal checks, and logically connecting pieces of information. This often involves generating intermediate steps or a “chain of thought” to arrive at a final answer, a capability typically associated with much larger, resource-intensive frontier models. The Phi-reasoning models represent a significant advancement by bringing these sophisticated reasoning abilities into the more efficient framework of SLMs, achieved through meticulous data curation, distillation from larger models, reinforcement learning, and a focus on high-quality synthetic datasets.
Phi-4-reasoning: The Foundation
The cornerstone of this new lineup is Phi-4-reasoning. With 14 billion parameters, it’s significantly smaller than many models boasting similar reasoning prowess. This process specifically trains the model to generate detailed reasoning chains, effectively utilizing additional computational steps during inference to solve complex problems. The key achievement here is demonstrating that smaller models, when trained on exceptionally high-quality and targeted data (including synthetic datasets), can effectively compete with their much larger counterparts in demanding reasoning tasks, particularly in mathematics and science.
Phi-4-reasoning-plus: Enhanced Accuracy Through Reinforcement
Building directly upon the foundation laid by Phi-4-reasoning, the “plus” variant introduces an additional layer of refinement through reinforcement learning (RL). While Phi-4-reasoning is trained via SFT on reasoning examples, Phi-4-reasoning-plus undergoes further training using RL techniques.
Phi-4-mini-reasoning: Compact Power for the Edge
Addressing the growing need for capable AI on devices with limited resources, this 3.8-billion parameter model is specifically optimized for mathematical reasoning and step-by-step problem-solving in environments where computational power or network latency is a constraint. Phi-4-mini-reasoning strikes a crucial balance between efficiency and advanced reasoning ability, making it an ideal candidate for applications like embedded tutoring systems, on-device AI assistants (like those envisioned for Copilot+ PCs), and lightweight deployments on mobile or edge systems where complex reasoning is needed without relying on cloud connectivity.
Benchmarks: Performance Highlights
Microsoft’s technical reports and blog posts highlight impressive results across a range of demanding benchmarks, demonstrating that these SLMs punch well above their weight class.
Phi-4-reasoning performance across representative reasoning benchmarks spanning mathematical and scientific reasoning
Both the 14B parameter Phi-4-reasoning and Phi-4-reasoning-plus models showcase remarkable capabilities. They consistently outperform not only the base Phi-4 model but also significantly larger open-weight models like the 70-billion parameter DeepSeek-R1-Distill-Llama-70B across various reasoning benchmarks. This includes complex mathematical reasoning tasks, such as those found in the AIME 2025 (the qualifier for the USA Math Olympiad), where they reportedly achieve performance better than even the full 671-billion parameter DeepSeek-R1 Mixture-of-Experts model. They also show strong performance on Ph.D. level science questions and general capability benchmarks like MMLUPro (measuring knowledge and language understanding), HumanEvalPlus (coding), IFEval (instruction following), and ArenaHard (general skills).
The graph compares the performance of various models on popular math benchmarks for long sentence generation.
The compact Phi-4-mini-reasoning (3.8B parameters) also holds its own impressively. Designed for efficiency, it still manages to outperform its base model and several larger models (including some 7B and 8B parameter models like OpenThinker-7B and DeepSeek-R1-Distill-Llama-8B) on popular math benchmarks such as MATH and GPQA Diamond.
Accessing Phi-4
The Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning models are available in the Azure AI model catalog. From there, they can often be deployed as serverless API endpoints. This pay-as-you-go approach simplifies access, allowing developers to integrate these powerful reasoning capabilities into their applications without managing the underlying infrastructure.
Image Source — Author
Using the Azure AI Inference Python SDK
One of the most straightforward ways to interact with deployed Phi-4 models on Azure is through the azure-ai-inference Python SDK. Here’s a breakdown of how to get started:
Install the SDK: If you haven’t already, install the necessary Python package:
pip install azure-ai-inference
# Required importsimport osfrom azure.ai.inference import ChatCompletionsClientfrom azure.ai.inference.models import SystemMessage, UserMessagefrom azure.core.credentials import AzureKeyCredentialendpoint = "<<Your Model Endpoint>>"model_name = "<<Your Model Name>>"client = ChatCompletionsClient( endpoint=endpoint, credential=AzureKeyCredential("<API_KEY>"), )response = client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="I am going to Paris, what should I see?"), ], max_tokens=4096, temperature=0.8, top_p=0.95, presence_penalty=0.0, frequency_penalty=0.0, model=model_name)print(response.choices[0].message.content)
Run a multi-turn conversation:
This sample demonstrates a multi-turn conversation with the chat completion API. When using the model for a chat application, you’ll need to manage the history of that conversation and send the latest messages to the model.
import osfrom azure.ai.inference import ChatCompletionsClientfrom azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessagefrom azure.core.credentials import AzureKeyCredentialendpoint = "<<Your Model Endpoint>>"model_name = "<<Your Model Name>>"client = ChatCompletionsClient( endpoint=endpoint, credential=AzureKeyCredential("<API_KEY>"), )response = client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="I am going to Paris, what should I see?"), AssistantMessage(content="Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n \n 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n \n These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."), UserMessage(content="What is so great about #1?") ], max_tokens=4096, temperature=0.8, top_p=0.95, presence_penalty=0.0, frequency_penalty=0.0, model=model_name)print(response.choices[0].message.content)
Stream the output:
For a better user experience, you will want to stream the response of the model so that the first token shows up early and you avoid waiting for long responses.
import osfrom azure.ai.inference import ChatCompletionsClientfrom azure.ai.inference.models import SystemMessage, UserMessagefrom azure.core.credentials import AzureKeyCredentialendpoint = "<<Your Model Endpoint>>"model_name = "<<Your Model Name>>"client = ChatCompletionsClient( endpoint=endpoint, credential=AzureKeyCredential("<API_KEY>"), )response = client.complete( stream=True, messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="I am going to Paris, what should I see?") ], max_tokens=4096, temperature=0.8, top_p=0.95, presence_penalty=0.0, frequency_penalty=0.0, model=model_name)for update in response: if update.choices: print(update.choices[0].delta.content or "", end="")client.close()
Use Cases
The enhanced reasoning capabilities packed into the efficient Phi-4 models unlock a diverse range of potential applications, pushing the boundaries of what AI can achieve, especially in resource-constrained scenarios.
Agentic Applications: The ability to handle complex, multi-step tasks makes Phi-4-reasoning and reasoning-plus strong candidates for powering AI agents. These agents could potentially automate complex workflows, perform sophisticated research, or manage intricate planning tasks that require logical decomposition and execution.
Educational Tools: Phi-4-mini-reasoning, with its strong mathematical focus and compact size, is particularly well-suited for educational applications. Imagine intelligent tutoring systems embedded directly into learning platforms or devices, providing step-by-step guidance and personalized feedback for subjects like math and logic.
Coding and Development: The strong performance on coding benchmarks suggests utility as a powerful coding assistant, capable of understanding complex logic, generating code snippets, debugging, and potentially even assisting with algorithmic problem-solving.
Mathematical & Scientific Problem Solving: The models’ demonstrated prowess in math and science reasoning opens doors for applications in research, data analysis, and engineering, helping professionals tackle complex calculations, simulations, or hypothesis generation.
On-Device AI: Phi-4-mini-reasoning, along with optimizations like Phi Silica for Windows NPUs, paves the way for more powerful AI experiences directly on personal computers (like Copilot+ PCs) and mobile devices. This enables features like offline summarization, advanced text intelligence, and potentially more sophisticated local assistants that don’t rely constantly on cloud connectivity.
The broader impact of the Phi-4 reasoning family lies in democratizing access to advanced AI reasoning. By delivering capabilities previously confined to massive models within a smaller footprint, Microsoft is enabling developers to build more intelligent applications that can run efficiently on a wider variety of hardware, fostering innovation across numerous fields.
Conclusion: Small Models, Big Reasoning
Microsoft’s Phi-4 reasoning models mark a significant stride in the evolution of artificial intelligence. By packing state-of-the-art reasoning capabilities into remarkably efficient Small Language Models, they challenge the notion that only massive parameter counts can yield sophisticated cognitive abilities. From the foundational Phi-4-reasoning, enhanced by supervised fine-tuning, to the accuracy-boosted Phi-4-reasoning-plus refined with reinforcement learning, and the compact Phi-4-mini-reasoning designed for edge deployment, this family offers tailored solutions for diverse needs.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
0 Comentários
0 Compartilhamentos
26 Visualizações