Phi-3

Phi-3 is Microsoft AI's third-generation small language model family, released beginning in April 2024 in three core sizes -- Phi-3-mini (3.8B parameters), Phi-3-small (7B), and Phi-3-medium (14B) -- followed by a multimodal Phi-3-vision variant (4.2B) later that year. All variants are open weights under the MIT license, distributed via Hugging Face and Azure AI Studio, and trained on a carefully curated mixture of synthetic and filtered web data that allows them to punch well above their parameter count on standard reasoning and language benchmarks. As of April 2026, Phi-3 has been superseded by Phi-4 as Microsoft's primary small-model recommendation, but the family remains widely deployed in production environments and represents a significant milestone in demonstrating what curated data quality can achieve at small model sizes.

At a glance

Lab: Microsoft AI
Released: April 2024 (Phi-3-mini); Phi-3-small and Phi-3-medium followed shortly after; Phi-3-vision released May 2024
Modality: Text (all core variants); text and vision (Phi-3-vision)
Open weights: Yes; MIT license
Context window: Phi-3-mini available in 4K and 128K token variants; Phi-3-small and Phi-3-medium available in 4K and 128K variants; Phi-3-vision supports 128K tokens
Pricing: Free for self-hosting; per-token pricing on Azure AI Studio managed endpoints
Distribution channels: Hugging Face microsoft organization, Azure AI Studio model catalog, Ollama library

Origins

The Phi research line at Microsoft Research began with Phi-1 in mid-2023, a 1.3-billion-parameter model described in the paper "Textbooks Are All You Need." The core premise was that training data quality matters more than scale: a small model trained on synthetic, textbook-grade data could match or exceed a model several times larger trained on the broad, noisy web corpus that had been standard practice. Phi-1 demonstrated this on coding benchmarks, achieving 50.6% pass@1 on HumanEval despite having far fewer parameters than competing code models.

Phi-1.5, also 1.3B parameters, extended the thesis to common sense reasoning and language understanding, showing that the same data-quality approach transferred beyond code. Phi-2, released in December 2023 with 2.7B parameters, widened the scope again. Trained on 1.4 trillion tokens from a mixture of synthetic and web data, Phi-2 matched or outperformed models up to 25 times larger on complex reasoning benchmarks, and was the first Phi model to attract broad commercial interest from developers building production applications.

Phi-3 scaled and formalized the approach. The team, which includes Sébastien Bubeck as a key contributor across the Phi research line, built the Phi-3 training data as a scaled-up version of the Phi-2 dataset: heavily filtered web data selected for educational value, plus synthetic "textbook-like" content generated by larger language models and covering mathematics, coding, common sense reasoning, and general world knowledge. Pre-training ran in two phases -- a first pass emphasizing general knowledge and language understanding, a second pass layering in more synthetic data for logical reasoning and specialized skills. Phi-3-mini trained on 3.3 trillion tokens; Phi-3-small and Phi-3-medium trained on 4.8 trillion tokens each.

Microsoft announced Phi-3-mini at the Microsoft Build developer conference in April 2024, presenting it as the first model in its class to fit within a 128K context window at 3.8B parameters -- small enough for on-device deployment including mobile phones. Phi-3-small and Phi-3-medium followed on Azure within weeks. Phi-3-vision, a 4.2B-parameter multimodal extension combining an image encoder and projector with a Phi-3-mini language backbone, was released in May 2024.

Capabilities

Phi-3-mini's primary claim at launch was that a 3.8B-parameter model could match or exceed the capability of models twice to three times its size on reasoning and language tasks. The model handles instruction following, multi-turn dialogue, structured output, code generation, and mathematical reasoning. The 128K context variant extends this to long-document processing within the same parameter budget. Both the 4K and 128K context variants shipped in instruction-tuned form as the primary distribution; base (non-instruction-tuned) weights were also made available.

Phi-3-small at 7B and Phi-3-medium at 14B follow the same architecture pattern with larger capacity, and both show further benchmark improvement over mini at the cost of higher inference compute. Each is available in 4K and 128K context variants, and both are fine-tunable through standard open-source tooling.

Phi-3-vision adds multimodal capability by coupling a vision encoder with the Phi-3-mini language model through a trainable connector and projector layer. The result is a 4.2B-parameter model supporting image description, visual question answering, chart and document parsing, and mixed text-and-image reasoning -- a notable capability footprint for a model at that parameter count. Phi-3-vision was trained with supervised fine-tuning and direct preference optimization to improve instruction following and safety properties.

All Phi-3 models support deployment on standard open-source inference stacks including llama.cpp for consumer hardware, and the quantized variants available through Ollama make Phi-3-mini accessible on Apple Silicon laptops and single consumer-grade GPUs. The MIT license permits commercial use, fine-tuning, and redistribution without the usage restrictions attached to models like Llama 3 and its derivatives.

Benchmarks and standing

At launch in April 2024, Phi-3-mini's benchmark profile was notable for a sub-4B model. On MMLU (general knowledge), Phi-3-mini scored 68.8%, exceeding both Mistral 7B (61.7%) and Gemma 7B (63.6%) with fewer parameters. Microsoft positioned Phi-3-mini as matching the overall performance profile of Mixtral 8x7B and GPT-3.5 on selected benchmarks, a claim that generated attention given the size differential.

The larger variants extended the gains. Phi-3-small (7B) achieved 75% on MMLU and 8.7 on MT-Bench (chat quality); Phi-3-medium (14B) reached 78% on MMLU and 8.9 on MT-Bench. These figures placed Phi-3-medium competitively against significantly larger open-weights models available at the same time.

The relevant comparison set for Phi-3-mini at launch was Llama 4's predecessor, Llama 3 8B (April 2024), and Mistral AI's Mistral 7B. Microsoft's published numbers showed Phi-3-mini outperforming both on reasoning-heavy tasks despite the parameter deficit, which the company attributed to the curated training data rather than architectural novelty.

By April 2026, Phi-3's benchmark position is primarily of historical interest. Phi-4 and subsequent Phi-4 mini variants have superseded Phi-3 at equivalent or smaller size, and the wider open-weights field has advanced substantially. Phi-3-mini is no longer competitive with contemporary 7B-class models like Llama 3.1 8B or Qwen 2 7B on current evaluation suites.

Access and pricing

The full Phi-3 family is available through the microsoft organization on Hugging Face. Each variant ships in instruction-tuned and base forms, with GGUF-quantized files for llama.cpp deployment also maintained in the microsoft namespace. All weights are under the MIT license with no usage restrictions beyond standard attribution requirements.

For managed inference without self-hosting, Azure AI Studio provides Phi-3 models as serverless API endpoints with per-token pricing. This path integrates with existing Azure subscription agreements and provides the access controls, content filtering, and monitoring tooling of the Azure AI platform. Azure AI Studio also hosts Phi-3-vision under the same managed endpoint model.

The Ollama library provides pre-packaged quantized variants for local desktop deployment, making Phi-3-mini accessible for single-command setup on consumer hardware. Integration with the broader llama.cpp ecosystem means Phi-3 is supported by most open-source inference frontends.

Fine-tuning is supported through standard frameworks including Hugging Face Transformers and the Azure AI fine-tuning service, which Microsoft documented with Phi-3 as a reference model after the initial release.

Comparison

Direct comparisons relevant to Phi-3's original market position and its legacy standing:

Phi-4 (Microsoft AI). The direct successor, released December 2024. Phi-4 at 14B parameters significantly outperforms Phi-3-medium on mathematics and reasoning tasks and is Microsoft's current small-model recommendation. For new deployments where Phi-3-medium was the choice, Phi-4 is the upgrade path. Phi-3-mini's niche for extremely constrained on-device deployment has a corresponding Phi-4 mini that also surpasses it.
Llama 4 Scout and Maverick (Meta AI). Llama 4's open-weights variants are far larger and more capable than Phi-3 across all benchmarks, representing a different point on the capability-size tradeoff. For organizations that need open weights but can accommodate multi-GPU inference, Llama 4 is the contemporary default; Phi-3 remains relevant only for single-device or constrained-memory deployments.
Mistral 7B (Mistral AI). At the same 7B-class parameter range as Phi-3-small, Mistral 7B was a direct peer at launch. Phi-3-small benchmarked above Mistral 7B on most tasks at launch, though Mistral's subsequent releases (Mistral Nemo, Mistral Small) have moved the comparison forward. Phi-3's MIT license versus Mistral's Apache 2.0 license makes both permissive for commercial deployment.
Qwen 2 7B (Alibaba Qwen). Released mid-2024, Qwen 2 7B represents a comparable-sized open-weights model from a different lab that benchmarks competitively with Phi-3-small on general tasks and shows stronger multilingual performance. For multilingual or Asian-language deployments at 7B scale, Qwen 2 7B is the more natural peer.

Outlook

Phi-3's future trajectory as of April 2026:

Legacy production stability. Phi-3 weights are MIT-licensed and broadly cached across deployment infrastructure globally. The model will remain in production use -- particularly Phi-3-mini for constrained-device workloads -- for years regardless of Microsoft's forward roadmap, simply because changing an embedded inference stack carries switching costs.
Phi-4 as the migration target. Microsoft has made clear that Phi-4 and its mini variants are the forward family for new deployments. Developers with existing Phi-3 integrations face a straightforward migration: Phi-4 is architecturally compatible at similar sizes and substantially more capable. The primary migration barrier is not technical but operational -- retesting and validating outputs for production-critical applications.
The data-quality thesis confirmed. Phi-3's lasting contribution is less its current benchmark standing than its role in validating the synthetic data approach at scale. Subsequent small-model releases across the industry -- including Google's Gemma series and Meta's smaller Llama variants -- have adopted similar data-curation strategies. The 2023-2024 Phi research line is now a widely cited reference point for the premise that training data quality, not raw parameter count, is the primary lever for small-model capability.

Sources

Introducing Phi-3: Redefining what's possible with SLMs. Microsoft Azure blog announcement, April 2024.
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219, Microsoft Research, April 2024.
Hugging Face: microsoft/Phi-3-mini-128k-instruct. Model card with architecture details, benchmark results, and license.
Hugging Face: microsoft/Phi-3-vision-128k-instruct. Model card for the multimodal Phi-3-vision variant.
New models added to the Phi-3 family, available on Microsoft Azure. Microsoft Azure blog, covering Phi-3-small and Phi-3-medium release.
Phi-2: The surprising power of small language models. Microsoft Research blog, December 2023. Context for the Phi-3 lineage.
Textbooks Are All You Need. arXiv:2306.11644, Microsoft Research, 2023. Original Phi-1 paper establishing the synthetic data thesis.
Tiny but mighty: The Phi-3 small language models with big potential. Microsoft Source feature, April 2024.

Phi-3

Phi-3

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

Phi-3

Phi-3

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

Yi-Large

Phi-4

Kimi K2

Hunyuan

Command R+

AI Research Lab Intelligence