AFM-4.5B

AFM-4.5B is a 4.5-billion-parameter dense open-weights language model from Arcee AI, trained on 8 trillion tokens curated by DatologyAI for enterprise-grade compliance and released under Apache 2.0 with an extended 64,000-token context length.
AFM-4.5B

AFM-4.5B

AFM-4.5B is the inaugural model in the Arcee Foundation Model family, a 4.5-billion-parameter dense language model developed by Arcee AI and released open-weights under the Apache 2.0 license in 2025. The model was trained on 8 trillion tokens of curated data through a partnership with DatologyAI, with a deliberate focus on Western compliance through exclusion of copyrighted books and unclear-license material. AFM-4.5B is positioned as an enterprise-grade small foundation model for instruction-following workloads on commodity infrastructure, with reported throughput of 200-plus CPU tokens per second and benchmark performance characterized by Arcee as outperforming comparably sized peers including Qwen3-4B and Gemma3-4B.

At a glance

  • Lab: Arcee AI
  • Released: 2025. Initial preview followed by general availability of the base and instruction-tuned variants. Context-extension to 64,000 tokens released as a follow-on milestone.
  • Modality: Text. Base and instruction-tuned variants for general-purpose language modeling, instruction-following, and enterprise inference workloads.
  • Open weights: Yes. AFM-4.5B-Base and the instruction-tuned variants are released under the Apache 2.0 license through the Arcee AI organization on Hugging Face.
  • Context window: 4,096 tokens at native pretraining length. Extended to 64,000 tokens through advanced model-merging and distillation techniques applied post-pretraining.
  • Pricing: Free for self-hosted deployment under Apache 2.0. Hosted access through Together AI and through Arcee Cloud direct enterprise contracts.
  • Distribution channels: Arcee AI organization on Hugging Face, Together AI hosted API, and Arcee Cloud direct enterprise platform.

Origins

Arcee AI was founded in June 2023 in Miami, Florida, by Mark McQuade (former Hugging Face monetization and Roboflow Field Engineering), Jacob Solawetz, and Brian Benedict. The company's founding thesis was that enterprise adoption of generative AI was being held back by security, compliance, and deployment concerns that horizontal frontier models from OpenAI and Anthropic could not address by API alone. The first product line centered on domain-adaptive fine-tuning of third-party open-weights models. The February 2024 merger with the open-source mergekit toolkit, created by Charles Goddard, gave Arcee a research surface and an open-source community presence.

The strategic shift toward training foundation models from scratch came in 2025. The Arcee Foundation Model family announcement framed the program as a deliberate move from selling fine-tuned variants of third-party models toward owning the training stack end-to-end. AFM-4.5B was the first production release in the family, with a sparse-activated mixture-of-experts variant (in the 120-to-140-billion-parameter total range) following as the data-center-scale entry.

The training pipeline was built in partnership with DatologyAI, whose data-curation methodology applies a suite of proprietary algorithms including model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data generation. The curation was specifically aimed at producing a training corpus suitable for Western enterprise use, with deliberate exclusion of copyrighted books and material with unclear licensing.

The training infrastructure used Amazon SageMaker HyperPod orchestrated across 512 Nvidia H200 GPUs. Pretraining ran for approximately 5 trillion tokens on the main DatologyAI-curated corpus, followed by an additional 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation, and a final 10 percent that mixed targeted distributions of math, code, and reasoning samples while maintaining roughly 1:1 split with general web data. Total training amounted to 8 trillion tokens.

The instruction-tuned variant underwent supervised fine-tuning on high-quality instruction datasets, followed by reinforcement learning on verifiable rewards and human preference. The context-extension milestone, expanding from 4k to 64k tokens, was achieved through aggressive experimentation, model merging, distillation, and "model soup" techniques rather than through additional long-context pretraining.

Capabilities

AFM-4.5B is built specifically for enterprise inference workloads on commodity infrastructure. Three capability features distinguish it from peer small open-weights models.

The first is the data-curation provenance. The 8-trillion-token training corpus was curated by DatologyAI with explicit focus on Western enterprise compliance: copyrighted books and unclear-license material were deliberately excluded. The curation provenance is the principal differentiator that Arcee positions against peer small models such as Qwen and Llama, where data-licensing disclosure is more limited. The compliance-oriented training data targets buyers in regulated industries (financial services, healthcare, government) where data-licensing exposure is a procurement-blocking concern.

The second is the throughput-optimized inference profile. Arcee reports the model achieves 200-plus CPU tokens per second on standard server-class CPU configurations, which makes the model practical for deployment without GPU acceleration. The CPU-throughput target matters for enterprise environments that lack GPU infrastructure or that prefer CPU-based deployment for cost or compliance reasons.

The third is the extended context length. Native pretraining at 4,096 tokens was extended to 64,000 tokens through model-merging and distillation, applied post-pretraining as a separate context-extension milestone. The 64k-token context supports document analysis, multi-turn dialogue with extended history, and retrieval-augmented generation workflows that exceed the native pretraining length.

The Arcee mergekit toolkit underpins the context-extension pipeline and is publicly available as an open-source utility, which positions AFM-4.5B's context-extension methodology as reproducible by other practitioners in the open-weights community.

Benchmarks and standing

AFM-4.5B reports state-of-the-art performance among comparably sized peer models at the 4-billion-parameter dense scale.

By the 2-trillion-token mark of training, AFM-4.5B was already outperforming competing models trained on dramatically larger but noisier datasets. The Arcee benchmarks cite the model as outperforming Qwen3-4B and Gemma3-4B across instruction-following and reasoning evaluations.

On standard evaluations relevant to the 4-billion-parameter dense model class, AFM-4.5B reports competitive scores on MMLU, HumanEval (code generation), GSM8K (mathematical reasoning), MMLU-Pro, and adjacent benchmarks. The instruction-tuned variant additionally reports competitive performance on instruction-following evaluations including IFEval and adjacent benchmarks.

The benchmark profile is positioned for the enterprise-deployment segment rather than for the frontier-tier comparison set. Standard horizontal language model benchmarks (Artificial Analysis Intelligence Index, LMArena, GPQA Diamond, AIME, SWE-bench Verified) place small dense models including AFM-4.5B in the small-model tier rather than at the frontier. The principal comparison set is Microsoft Phi-3 and Phi-4, Qwen-2.5 and Qwen-3 small variants, Llama-3.2-3B and adjacent peers, and the Zamba2 family from Zyphra.

Industry coverage has consistently grouped AFM-4.5B with the enterprise small-language-model segment alongside Mistral, Cohere, and AI21 Labs, with the data-curation provenance and Western compliance positioning as the principal differentiator versus Chinese open-weights peers.

Benchmark leadership in the small-and-efficient segment rotates rapidly. AFM-4.5B's positions are representative of the 2025 enterprise small-model landscape; subsequent successors and peer releases through 2026 continue to compress the segment.

Access and pricing

AFM-4.5B weights are distributed through the Arcee AI organization on Hugging Face under the Apache 2.0 license. Both base (AFM-4.5B-Base) and instruction-tuned variants are available, with adjacent variants including AFM-4.5B-Base-KDA-NoPE for specific positional-encoding configurations.

Self-hosted deployment is free under the open license. The 4.5-billion-parameter dense profile is feasible to deploy on a wide range of commodity hardware, including server-class CPU configurations (consistent with Arcee's reported 200-plus CPU tokens per second throughput) and consumer-grade GPU configurations.

Hosted API access is available through Together AI at standard Together pricing for the parameter scale. Direct enterprise deployment is available through Arcee Cloud, which provides managed hosting, fine-tuning, and compliance controls for regulated-industry buyers.

The Arcee Cloud commercial platform integrates with Microsoft Azure (through the Microsoft M12 strategic-investor relationship), Samsung Next portfolio relationships, and additional corporate-strategic partner channels.

Comparison

Direct competitors and adjacent open-weights small language models:

  • Microsoft Phi-3 and Phi-4 (Phi-3). The principal small-model peer from a frontier-adjacent lab. Phi-4 is the leading entry in the small-model class on aggregate benchmarks through 2025 and 2026. AFM-4.5B competes on data-curation provenance and Apache 2.0 licensing flexibility versus Phi's Microsoft-tied distribution.
  • Qwen-3 small variants (Alibaba Qwen). Chinese open-weights peers at adjacent parameter scales. AFM-4.5B's positioning emphasizes Western enterprise compliance versus Qwen's Chinese-origin training data and supply chain.
  • Llama-3.2-3B and Llama-3.2-1B (Meta AI). US open-weights peers at adjacent parameter scales. Broad community ecosystem and longer-running tooling support; AFM-4.5B's data-curation transparency is the differentiator.
  • Gemma-2 and Gemma-3 small variants (Google DeepMind). Google open-weights small-model peers. Adjacent benchmark positioning.
  • Mistral AI Ministral and small Mistral variants. Direct European enterprise small-model peer.
  • Zamba2 (Zyphra). Hybrid Mamba2-transformer small-model family at adjacent parameter scale (2.7B and 7B). Different architectural positioning but overlapping enterprise deployment market.
  • DeepSeek small variants (DeepSeek). Chinese open-weights small-model peers.

AFM-4.5B's distinctive position among 2025 vintage open-weights small language models: Apache 2.0 licensing, the DatologyAI-curated 8-trillion-token training corpus with explicit Western compliance focus, the 64,000-token context length achieved through model-merging techniques, and the throughput-optimized inference profile suitable for CPU-based enterprise deployment.

Outlook

Open questions for AFM-4.5B over the next 6 to 18 months:

  • Successor model cadence. AFM-4.5B is the first dense entry in the Arcee Foundation Model family. The sparse-activated mixture-of-experts variant (120 to 140 billion total parameters) extends the line at data-center scale. The reported one-trillion-parameter open-weights training plan, contingent on the rumored 2026 Series B close, would mark a substantial scale-up in the model family if executed.
  • Enterprise deployment evidence. Arcee Cloud and the corporate-strategic partner channels (Microsoft, Samsung, Hitachi, Wipro) are the principal commercial distribution surfaces. Named enterprise reference customers and adoption depth are watchable signals.
  • Competitive position against Phi-4 and Qwen. Microsoft Phi-4 and Qwen-3 small variants continue to lead aggregate benchmark performance in the small-model segment. Whether AFM-4.5B's compliance-and-licensing positioning translates into commercial wins where benchmark leadership is held by peers is the central commercial test.
  • Mergekit and post-training tooling differentiation. Arcee's open-source mergekit ecosystem has been a structural advantage for the company's positioning in the post-training community. Continued investment in the toolkit and adjacent post-training tooling is one of the few open-source-community-anchored differentiation vectors available to enterprise small-model labs.
  • Context-extension generalization. The 4k-to-64k context extension via model-merging is a methodologically interesting result. Whether the technique generalizes to larger models or to other architectures remains an open research question.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.