gpt-oss

gpt-oss is OpenAI's open-weights mixture-of-experts (MoE) language model family released in August 2025, the company's first major open-weights model release since GPT-2 in 2019. It is available in two sizes, gpt-oss-20b and gpt-oss-120b (also identified in some documentation as the 117B variant), both distributed through Hugging Face and GitHub under a permissive license for self-hosted use. As of April 2026, gpt-oss occupies an unusual position in the AI landscape: a frontier lab's open-weights contribution to a tier that had been ceded entirely to Meta, DeepSeek, and Alibaba for the preceding six years.

At a glance

Lab: OpenAI
Released: August 2025
Modality: Text
Open weights: Yes. Distributed under a permissive license. Both variants available on Hugging Face and GitHub.
Variants: gpt-oss-20b (20B parameters, MoE) and gpt-oss-120b (approximately 117B parameters, MoE)
Pricing: Free for self-hosting; partner inference endpoints set their own per-token pricing
Distribution channels: Hugging Face openai organization (gpt-oss-20b and gpt-oss-120b), GitHub openai, partner inference endpoints including Together AI, Fireworks AI, and Groq

Origins

OpenAI released GPT-2 in February 2019, initially in staged form due to concerns about misuse -- a decision that generated considerable debate at the time about whether the lab was withholding capable models from the research community. The staged release eventually concluded with full public availability of GPT-2, but subsequent OpenAI models, including GPT-3 (2020), InstructGPT (2022), GPT-3.5, GPT-4, and the o-series reasoning models, were released solely through OpenAI's API, with no public weights.

During this period, the open-weights frontier developed independently. Meta released Llama in February 2023, initially leaked and then formally open-weighted, establishing a large community of derivative models. Llama 2 (July 2023) and Llama 3 (April 2024) brought successive capability improvements under open-weights licensing. DeepSeek released V3 in December 2024 at dramatically lower reported training cost than contemporaneous Western models, and R1 in January 2025 with reinforcement-learning reasoning capabilities matching OpenAI's o1. Alibaba's Qwen line grew from a niche multilingual option into a general frontier-tier competitor by 2025. By mid-2025, multiple open-weights models in the tens-of-billions to hundreds-of-billions parameter range were broadly comparable to or approaching the performance of mid-tier closed frontier models.

OpenAI's position through this period was of the company with the leading closed-weights frontier model line and no open-weights offering beyond the aging GPT-2 weights. The company described its reasoning for staying closed as related to safety evaluation and responsible deployment, though critics noted the commercial incentive to keep frontier capabilities exclusively behind the API.

The August 2025 release of gpt-oss marked the reversal of that posture. OpenAI released two MoE variants simultaneously: gpt-oss-20b and gpt-oss-120b (internally referenced as the 117B variant). Both were described as "a contribution to the open-source ecosystem," positioning the release as aligned with the open-weights community rather than as a strategic concession. The timing came roughly six months after DeepSeek R1's public success had drawn sustained attention to the cost and capability profile of open-weights models from non-US labs.

Capabilities

Both gpt-oss variants use a mixture-of-experts architecture, in which only a subset of the model's total parameters are active during any given inference step. This allows the models to achieve quality levels associated with larger dense models while keeping per-token compute requirements lower, at the cost of higher total memory for storing the full parameter set.

gpt-oss-20b is the smaller variant, suited for deployment contexts with tighter compute constraints. Its parameter count in the tens-of-billions places it in a similar tier to Llama 3 70B or Qwen 3 72B for self-hosted use, and its MoE design is intended to make that quality level tractable on multi-GPU server configurations without requiring the largest available hardware.

gpt-oss-120b, also documented at approximately 117 billion parameters, targets the higher end of the open-weights self-hosted tier. In text reasoning, coding, and mathematics, it is intended to represent OpenAI's best open-weights offering. The MoE architecture means its active-parameter count during inference is substantially below the full 120B, reducing inference cost compared to a dense model of comparable total size.

Both models support instruction following, multi-turn dialogue, code generation, document summarization, and question answering. OpenAI's pre-release safety evaluation pipeline was applied before the release, a point the company has highlighted as differentiating gpt-oss from open-weights releases from labs with different safety processes.

Benchmarks and standing

Specific benchmark scores for gpt-oss variants were not reported in the main Artificial Analysis Intelligence Index composite rankings as of April 2026, which focuses primarily on API-available models. On the benchmarks reported at launch, gpt-oss-120b showed competitive performance against the leading open-weights models from the same period.

The relevant comparison benchmark set for gpt-oss is the open-weights tier rather than the closed frontier. The closed frontier leaders -- GPT-5.5 at 60.24, Claude Opus 4.7 at 57.28, and Gemini 3.1 Pro at 57.18 on the Artificial Analysis Intelligence Index -- represent the capability level that gpt-oss does not attempt to match on a per-parameter basis. The practical comparison is against Llama 4 Maverick (mid-40s on the index), DeepSeek V4 Pro (51.51), and Qwen 3 in the same self-hosted parameter range.

On GPQA Diamond (graduate-level scientific reasoning) and AIME 2025 (advanced mathematics), gpt-oss-120b was positioned as competitive with the leading open-weights models in its parameter class. On coding benchmarks such as HumanEval+ and SWE-bench Verified, results were not independently verified and reproduced at comparable scale to Llama 4 or DeepSeek evaluations as of the April 2026 review of available data.

Benchmark positions are point-in-time. The open-weights tier has been moving quickly through 2025 and 2026, with new releases from Meta, DeepSeek, and Alibaba arriving on timescales of months.

Access and pricing

gpt-oss model weights are available through the openai organization on Hugging Face:

gpt-oss-20b: https://huggingface.co/openai/gpt-oss-20b
gpt-oss-120b: https://huggingface.co/openai/gpt-oss-120b

Both models are also available through the OpenAI GitHub organization, with reference inference code and documentation. The license permits broad self-hosted use, including commercial applications, without per-token payment.

For organizations that prefer managed inference without self-hosting, partner platforms provide API access:

Together AI and Fireworks AI offer gpt-oss-20b and gpt-oss-120b via REST API with per-token pricing set by each platform.
Groq provides gpt-oss inference using its LPU architecture for low-latency throughput.

These partner endpoints operate independently from OpenAI's standard API pricing. Per-token rates vary by platform and are not set by OpenAI directly.

There is no consumer product surface for gpt-oss comparable to ChatGPT. The models are available only through the weights distribution and partner inference APIs, not through OpenAI's own consumer or enterprise products.

Comparison

Direct competitors to gpt-oss in the open-weights text tier, as of April 2026:

Llama 4 (Meta AI). The highest-profile open-weights model family as of April 2026. Llama 4 Maverick uses 109 billion total parameters with 17 billion active, also on a MoE architecture. Both gpt-oss-120b and Maverick target the same self-hosted deployment tier. Llama 4 benefits from a larger ecosystem of inference tooling, fine-tuned derivatives, and consumer hardware optimization, built over the three prior Llama generations. gpt-oss has the advantage of OpenAI's safety-evaluation pipeline and brand recognition in enterprise contexts.
DeepSeek V4 (DeepSeek). A 1.6-trillion-parameter MoE with 49 billion active parameters per step, released April 2026. V4 Pro leads most open-weights models on the Artificial Analysis Intelligence Index at 51.51. It also offers an API with pricing substantially below Western frontier labs. The principal consideration against DeepSeek V4 in regulated or US-federal-procurement contexts is its Chinese-lab origin and associated supply-chain and data-residency policy review.
Qwen 3 (Alibaba Qwen). Alibaba's open-weights line benchmarks competitively with Llama 4 Maverick and shows particular multilingual strength across Asian-language tasks. The same origin-consideration framing applies as for DeepSeek.
Mistral Large 2 (Mistral). A commercial open-weights model from the French frontier lab. Mistral Large 2 is available under a permissive license and targets similar enterprise self-hosted deployments. It operates at a smaller total parameter count than gpt-oss-120b and DeepSeek V4. Mistral's European origin is a consideration for organizations with EU data-residency requirements.

gpt-oss's distinctive position within this field: it is the only open-weights model in this tier released by a lab that also holds the leading closed-source frontier model (GPT-5.5). That association carries OpenAI's safety evaluation credibility and enterprise brand trust, though it also raises the question of how deliberately OpenAI calibrates the capability gap between its open and closed releases.

Outlook

Open questions for gpt-oss and OpenAI's open-weights strategy over the next 6 to 18 months:

Successor cadence. Whether OpenAI releases a gpt-oss-2 or analogous next-generation open-weights model, and on what timeline. The six-week release cadence for GPT-5.x closed flagships has not been mirrored on the open-weights side. A single gpt-oss release with no successor would position the August 2025 release as a strategic experiment rather than a sustained open-weights commitment.
The deliberate capability gap. gpt-oss was not released at the capability level of GPT-5.5 or even GPT-5. OpenAI controls where it draws the line between what the open-weights variant can do and what requires the closed API. Whether that line shifts in successor releases -- whether gpt-oss-2 is meaningfully closer to GPT-5.5 than gpt-oss-120b is to GPT-5 -- will be one of the more closely watched signals in the open-weights tier.
Licensing dynamics. The permissive license on gpt-oss is a departure from the Llama Community License, which restricts providers above approximately 700 million monthly active users. Whether OpenAI introduces similar restrictions in future releases, or keeps the license fully permissive, will affect how third-party platforms and large-scale deployments treat gpt-oss weights.
Whether other frontier labs follow. Anthropic and Google DeepMind have not released open-weights frontier-tier models. OpenAI's return to open weights could create pressure on other closed labs, particularly if gpt-oss proves commercially valuable in terms of ecosystem goodwill and indirect API adoption. The precedent from DeepSeek R1's success -- a Chinese open-weights model generating substantial Western developer adoption -- may be the more relevant case study.
Relationship to OpenAI's enterprise strategy. Sam Altman identified enterprise as OpenAI's top priority in 2026. Open-weights models create a potential substitution path away from OpenAI's API for cost-sensitive enterprise customers. How OpenAI manages that tension -- releasing open weights to compete with DeepSeek and Llama while preserving the closed-API revenue base -- is a structural question for the business model.

Sources

Hugging Face: openai/gpt-oss-20b. Official model card and weights for the 20B variant.
Hugging Face: openai/gpt-oss-120b. Official model card and weights for the 120B variant.
GitHub: openai organization. Reference code and documentation for OpenAI open-source releases including gpt-oss.
OpenAI: openai.com. Company announcements and research releases.
Wikipedia: OpenAI. Company history, including GPT-2 release and the subsequent closed-weights period through 2025.
Wikipedia: GPT-2. Documentation of the 2019 open-weights release that preceded gpt-oss by six years.
Artificial Analysis Intelligence Index. Composite frontier benchmark scores; April 2026 data used in this profile for closed-frontier comparison points.
LMArena leaderboard. Head-to-head human preference ELO evaluations for open-weights peer comparisons.

gpt-oss

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

gpt-oss

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

QwQ-32B

Qwen3 Coder 480B-A35B

MiniMax M2

Kimi K2.5

Qwen 3.6

AI Research Lab Intelligence