StepFun

StepFun is a Chinese artificial intelligence company founded in 2023 by former Microsoft Research Asia chief scientist Jiang Daxin, focused on multimodal foundation models and known for the Step family of text, vision, audio, and video models.
StepFun

StepFun

StepFun, known in Chinese as Shanghai Jieyue Xingchen Intelligent Technology (上海阶跃星辰智能科技), is a Chinese artificial intelligence company founded in April 2023 by Jiang Daxin, the former Global Vice President and Chief Scientist at Microsoft Software Technology Center Asia. The company is headquartered in Shanghai and develops the Step family of multimodal foundation models spanning text, vision, audio, and video, with a research focus on architectural innovations including Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD). StepFun closed an approximately $700 million Series B+ funding round in January 2026 and was reported to be exploring a Hong Kong IPO in early 2026 alongside the broader Chinese AI Tiger public-listing wave.

At a glance

  • Founded: April 6, 2023 in Shanghai by Jiang Daxin.
  • Status: Private. Reported to be exploring a Hong Kong IPO as of February 2026.
  • Funding: Approximately $1.7 billion cumulative through April 2026, including a January 2026 Series B+ round of approximately $700 million (RMB 5 billion-plus).
  • CEO: Jiang Daxin (founder; former Global Vice President and Chief Scientist at Microsoft Software Technology Center Asia).
  • Other notable leadership: Senior research and engineering team includes alumni from OpenAI and ByteDance.
  • Open weights: Yes, partial. Selected Step-series releases including Step-3 and component multimodal variants have been distributed open-weights through Hugging Face. Closed-weights commercial flagships are gated through StepFun's API.
  • Flagship models: Step-3 (late 2025, 321-billion-parameter mixture-of-experts multimodal reasoning model with 38 billion active parameters), Step-Video (November 2024), Step-Audio (October 2024), Step-1V (multimodal, September 2024).

Origins

StepFun was founded on April 6, 2023, by Jiang Daxin and a small founding team drawn from Chinese research universities and from senior AI engineering positions at international and domestic technology companies. Jiang Daxin had been Global Vice President and Chief Scientist at Microsoft Software Technology Center Asia, with research credentials in natural-language processing, search, and large-scale machine-learning systems. The founding team included alumni of OpenAI and ByteDance among the senior research and engineering ranks.

The company's distinguishing technical bet from the founding period was on multimodal foundation models. Where peer Chinese AI Insurgents in the 2023 cohort (Z.ai / Zhipu AI, Moonshot AI, 01.AI, MiniMax) ranged across text-first and multimodal positions, StepFun's framing emphasized end-to-end multimodal capability across text, image, audio, and video as a single foundation model line.

The 2024 release sequence built out the multimodal portfolio. Step-1V launched in September 2024 as the company's first publicly released multimodal model. Step-Audio and Step-Video followed in October and November 2024, expanding the modality coverage. The closed-weights Step text models in 2024 to 2025 anchored the commercial language-model line.

Step-3, released in late 2025, was the company's most technically ambitious release. The 321-billion-parameter mixture-of-experts model with 38 billion active parameters introduced two architectural innovations that received attention in the research community. Multi-Matrix Factorization Attention (MFA) was reported to reduce KV-cache demands to approximately 22 percent of DeepSeek V3's per-token attention cost, a inference-economics improvement. Attention-FFN Disaggregation (AFD) decoupled attention and feed-forward layers into specialized subsystems for better hardware utilization.

The funding trajectory through 2026 included an early Tencent-led investment that anchored the company's strategic-investor base, follow-on rounds bringing total private funding to approximately $1 billion by late 2024, and the January 2026 Series B+ of approximately $700 million. By February 2026, StepFun was reported to be exploring a Hong Kong IPO alongside the broader Chinese AI Tiger public-listing wave initiated by Z.ai and MiniMax.

Mission and strategy

StepFun's stated mission is to build artificial general intelligence through multimodal foundation models. Jiang Daxin has articulated the strategic position publicly through interviews including a widely cited 2024 SCMP piece in which he argued that Chinese AI development can benefit from larger models and more training data, against the framing that algorithmic-efficiency advances alone are sufficient for Chinese labs.

The strategy combines three threads. First, multimodal foundation-model research, with the Step series providing capability across text, image, audio, and video in a unified architecture. Second, architectural-research investment in inference and training efficiency, with Step-3's MFA and AFD innovations as the most recent example. Third, enterprise and developer distribution through the StepFun API platform, with selective open-weights releases for community adoption.

The competitive premise is that multimodal capability will become a primary capability differentiator in 2026 and beyond, and that StepFun's combined architectural-research depth and multimodal portfolio breadth produces a structural advantage over text-first peer Chinese AI Insurgents. The architectural innovations in Step-3 are the most publicly visible technical contribution from the company and are positioned as evidence that StepFun can produce inference-economics wins at scale comparable to DeepSeek's published efficiency advances.

Models and products

  • Step-3. Released late 2025. Multimodal reasoning model with 321 billion total parameters and 38 billion active parameters in a mixture-of-experts architecture. Introduced Multi-Matrix Factorization Attention and Attention-FFN Disaggregation as architectural innovations.
  • Step-Video. Released November 2024. Video-generation model, part of the multimodal Step series.
  • Step-Audio. Released October 2024. Audio-generation and speech model.
  • Step-1V. Released September 2024. Multimodal vision-language model.
  • Step text language models. Closed-weights commercial language-model line covering 2024 and 2025 releases at multiple scales.
  • StepFun API platform. Developer-and-enterprise commercial distribution channel.

The commercial channels are the StepFun API platform, selective Hugging Face open-weights distribution, and enterprise services for paying customers in China and reportedly for some international partners.

Benchmarks and standing

StepFun's Step-3 release in late 2025 was characterized in industry coverage as competitive on multimodal-reasoning benchmarks against contemporary Chinese open-weights releases. Standardized benchmark coverage of the Step family on the most prominent leaderboards (Artificial Analysis Intelligence Index, LMArena, SWE-bench Verified) has been more limited than for DeepSeek, Alibaba Qwen, and MiniMax releases, partly because the closed-weights commercial flagships are not directly evaluable on open leaderboards.

The architectural-research contribution from Step-3, particularly the MFA attention-cost reduction, has been cited in research-paper coverage and in industry analyses comparing Chinese AI labs' inference-efficiency innovations. The AFD framework has been characterized in industry coverage as a meaningful contribution to mixture-of-experts hardware-utilization research.

StepFun's standing in the Chinese AI landscape rests on the multimodal portfolio breadth, the architectural-research outputs from Step-3, the founding-team credentials at Microsoft Research Asia, and the strategic-investor support from Tencent and other backers.

Leadership

As of April 2026, StepFun's senior leadership includes:

  • Jiang Daxin, Chief Executive Officer and founder. Former Global Vice President and Chief Scientist at Microsoft Software Technology Center Asia. Public face for StepFun on the multimodal-AI strategic vision and the company's research direction. Has commented publicly through SCMP and other media on the broader Chinese AI development trajectory.

The senior research and engineering team includes alumni from OpenAI and ByteDance among others, though specific senior-leadership profiles beyond the founder have been less broadly covered in international media than for some peer Chinese AI Insurgents.

Funding and backers

StepFun's funding history through April 2026 includes approximately $1.7 billion cumulative across multiple rounds. The most recent round was the January 2026 Series B+ of approximately RMB 5 billion (approximately $700 million), preceded by earlier rounds bringing the company to a unicorn-threshold valuation in October 2024.

The investor base includes Tencent as a prominent strategic-investor anchor, alongside other Chinese venture and strategic capital. Specific lead-investor identities for individual rounds have been disclosed selectively. The reported Hong Kong IPO consideration in February 2026 would represent the third Chinese AI Tiger public listing after Z.ai and MiniMax.

Industry position

StepFun occupies a structurally distinctive position among Chinese AI labs. The combination of the multimodal portfolio breadth, the architectural-research outputs from Step-3, the founding-team Microsoft Research credentials, and the strategic-investor support from Tencent produces a profile that no other Chinese AI Insurgent matches at the same combination of attributes. Industry coverage has frequently characterized StepFun as the most technically distinctive of the Chinese AI Tigers, with the multimodal-first positioning as the principal differentiation.

Strategic risks include the relatively lower public profile and developer-community presence compared to DeepSeek, Alibaba Qwen, Moonshot AI, and MiniMax, the operational complexity of executing a multimodal portfolio with limited engineering scale relative to incumbents, and the open question of whether multimodal-first capability translates to commercial wins against text-first competitors with stronger consumer-and-enterprise distribution. Strategic strengths include the architectural-research outputs, the multimodal portfolio coverage, the founder credentials, and the Tencent strategic relationship.

Competitive landscape

StepFun competes with several Chinese and international AI labs:

  • MiniMax. Direct Chinese multimodal competitor with broader consumer-application distribution. The Hailuo, Speech-02, and Music-01 lines compete with Step-Video, Step-Audio, and the text Step models respectively.
  • DeepSeek, Alibaba Qwen, Moonshot AI, Z.ai / Zhipu AI, 01.AI. Peer Chinese AI Tigers and Insurgents. StepFun's distinguishing features are the multimodal-first portfolio, the architectural-research contributions, and the Microsoft Research Asia founding lineage.
  • Baidu, Tencent Hunyuan, ByteDance Seed. Chinese internet-incumbent AI labs. Less direct overlap given StepFun's startup operating scale.
  • OpenAI, Google DeepMind, Anthropic. International frontier-model competitors. Less direct overlap on Chinese-domestic distribution.
  • Kuaishou Kling, Alibaba Wan. Chinese video-generation competitors against Step-Video.

Outlook

Several open questions affect StepFun's trajectory in 2026 and 2027:

  • The Step-4 or successor flagship release timing and capability profile. Step-3 set a credible architectural-research bar; sustaining release cadence is the central technical question.
  • The Hong Kong IPO progression. Listing terms, lead investors, and post-listing capital deployment would be a structurally significant signal.
  • Continued research outputs around MFA, AFD, and successor architectural innovations. The technical-research credibility is a key competitive asset.
  • Open-weights versus closed-weights distribution balance for the commercial flagships.
  • Continued senior-talent recruitment, particularly from Microsoft Research Asia, OpenAI, and ByteDance alumni networks.
  • US export-control developments affecting StepFun's compute infrastructure and the broader Chinese AI hardware-and-software ecosystem.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

Keep track of what's happening from cutting edge AI Research institutions.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.