Databricks

Databricks is an American enterprise data and AI platform founded in 2013 by the creators of Apache Spark, developer of the Mosaic AI platform and the DBRX open-weights model, and at $134 billion valuation one of the largest private enterprise-software companies globally.
Databricks

Databricks

Databricks is an American enterprise data and artificial intelligence platform company founded in 2013 by the original creators of Apache Spark, the open-source distributed-computing framework. The company is headquartered in San Francisco and develops the Lakehouse data platform, the Mosaic AI platform for enterprise generative AI, and the DBRX open-weights large language model. Databricks acquired MosaicML in July 2023 for approximately $1.3 billion, integrating MosaicML's large-language-model training capabilities into the Mosaic AI product line. As of December 2025, Databricks closed a $4 billion Series L funding round at a reported $134 billion valuation, making the company one of the largest private enterprise-software organizations globally and one of the most-watched candidates for a 2026 or 2027 IPO.

At a glance

  • Founded: 2013 in Berkeley, California, by Ali Ghodsi, Matei Zaharia, Andy Konwinski, Reynold Xin, Patrick Wendell, Ion Stoica, and Arsalan Tavakoli, the original creators of Apache Spark at UC Berkeley's AMPLab.
  • Status: Private. Reported $134 billion valuation as of December 2025. Active IPO preparation as of January 2026 with $1.8 billion JPMorgan-led debt financing.
  • Funding: Approximately $19 billion cumulative private capital across Series A through Series L. Most recent major rounds: Series J of $10 billion in November 2024 at $62 billion valuation, Series L of $4 billion in December 2025 at $134 billion valuation.
  • CEO: Ali Ghodsi (co-founder; Chief Executive Officer since 2016).
  • Other notable leadership: Matei Zaharia (co-founder, Chief Technologist; Apache Spark creator and Stanford CS professor), Ion Stoica (co-founder, Executive Chairman; UC Berkeley professor and AMPLab co-founder), Reynold Xin (co-founder, Chief Architect).
  • Open weights: Yes, partial. DBRX (March 2024) was released open-weights through Hugging Face. Other Mosaic AI commercial flagships are closed-weights and gated through the Databricks platform.
  • Flagship products: DBRX (132-billion-parameter mixture-of-experts model with 36 billion active, March 2024), Mosaic AI (the enterprise GenAI platform), Databricks Lakehouse (the foundational data platform), Databricks SQL, Mosaic AI Agent Framework.

Origins

Databricks was founded in 2013 in Berkeley, California, by the original creators of Apache Spark at UC Berkeley's AMPLab. The seven co-founders (Ali Ghodsi, Matei Zaharia, Andy Konwinski, Reynold Xin, Patrick Wendell, Ion Stoica, and Arsalan Tavakoli) had built Apache Spark as the principal next-generation distributed-computing framework after Hadoop, and Databricks was structured to commercialize Spark and other data-infrastructure technology. The company moved from Berkeley to San Francisco shortly after founding.

The 2013 to 2018 period saw Databricks grow as the principal Apache Spark commercial vendor, with enterprise adoption among data-engineering and data-science customers. The Lakehouse architectural framing emerged through 2017 to 2019, integrating data-warehouse and data-lake capabilities into a single platform model. Ali Ghodsi became CEO in 2016 after serving as VP of Engineering and earlier roles.

The 2019 to 2022 period saw Databricks scale revenue substantially, with multiple large funding rounds bringing the company to the multi-billion-dollar private-valuation tier. The company expanded internationally, broadened the platform to cover SQL analytics (Databricks SQL), machine learning operations (MLflow), and other capabilities, and built enterprise relationships including major financial-services, healthcare, retail, and other customers.

The MosaicML acquisition in July 2023 for approximately $1.3 billion was the principal AI-product acquisition for Databricks. MosaicML had built a leading large-language-model training and deployment platform, with the MPT model line as the flagship open-source-LLM contribution. The acquisition integrated MosaicML's training capabilities into Databricks's enterprise data platform, with the combined organization operating as Mosaic AI within Databricks.

The DBRX release in March 2024 was the principal Mosaic AI open-source flagship. The 132-billion-parameter mixture-of-experts model with 36 billion active parameters was released open-weights through Hugging Face and was characterized at release as one of the leading open-source LLMs at scale, with reported strong performance on reading comprehension, general knowledge, and other benchmarks.

The 2024 to 2026 period saw Databricks launch additional Mosaic AI products including the Mosaic AI Agent Framework, Mosaic AI Agent Evaluation, Mosaic AI Tools Catalog, Mosaic AI Model Training, and Mosaic AI Gateway. The platform expansion has been the principal product development focus, with enterprise adoption of the Mosaic AI agentic-and-genai capability.

The funding trajectory through this period was substantial. Series J of $10 billion in November 2024 at $62 billion valuation, Series L of $4 billion in December 2025 at $134 billion valuation, and a $1.8 billion JPMorgan-led debt financing in January 2026 are widely interpreted as IPO preparation. As of February 2026, Databricks reported a $5.4 billion annualized revenue run rate with over 65 percent year-over-year growth and AI product revenue surpassing $1.4 billion (approximately 26 percent of total revenue).

Mission and strategy

Databricks's stated mission is to help enterprises succeed with data and AI. The Lakehouse-and-AI strategic framing has been remarkably consistent since the company's pivot from pure Spark commercialization, with the Lakehouse providing the data foundation and Mosaic AI providing the enterprise generative AI capability layered on top.

The strategy combines four threads. First, the Lakehouse data platform as the foundational enterprise data infrastructure, integrating data-warehouse and data-lake capabilities. Second, Mosaic AI as the enterprise generative AI platform, including foundation-model training (Mosaic AI Model Training), agentic AI infrastructure (Agent Framework, Agent Evaluation, Gateway), and other capabilities. Third, open-source contributions including Apache Spark, MLflow, Delta Lake, Unity Catalog, and selected open-weights model releases like DBRX. Fourth, ecosystem partnerships across cloud providers (Databricks runs on AWS, Azure, and Google Cloud), AI-tooling vendors, and enterprise consulting firms.

The competitive premise is that enterprise customers benefit from a unified data-and-AI platform that integrates the underlying data infrastructure with the AI-application layer. The combination of Lakehouse data infrastructure, Mosaic AI training and deployment capabilities, and the broader Databricks platform produces a structurally distinct enterprise offering that pure-play AI companies and pure-play data companies cannot match.

The post-2023 Mosaic AI integration has positioned Databricks as one of the principal enterprise AI providers, alongside OpenAI and Anthropic on the API tier and alongside Snowflake AI, Salesforce AI Research, and other enterprise platforms on the data-and-AI tier.

Models and products

  • DBRX. Released March 2024. 132-billion-parameter mixture-of-experts model with 36 billion active parameters. Open-weights through Hugging Face. The principal Mosaic AI open-source contribution.
  • Mosaic AI Agent Framework. Enterprise platform for building and deploying agentic AI applications.
  • Mosaic AI Agent Evaluation. Evaluation infrastructure for agentic AI quality and reliability.
  • Mosaic AI Tools Catalog. Catalog of AI tools and integrations for the agentic platform.
  • Mosaic AI Model Training. Foundation-model training capability inherited and substantially expanded from the MosaicML acquisition.
  • Mosaic AI Gateway. API gateway and management infrastructure for enterprise AI deployments.
  • Databricks Lakehouse. Foundational data platform integrating data-warehouse and data-lake capabilities.
  • Databricks SQL. SQL analytics platform built on the Lakehouse foundation.
  • MLflow. Open-source machine-learning lifecycle platform.
  • Delta Lake. Open-source storage layer that brings reliability to data lakes.
  • Unity Catalog. Unified governance solution for data and AI.

The principal commercial channels are direct enterprise sales for the Databricks platform, cloud-marketplace distribution through AWS, Azure, and Google Cloud, and partner-and-system-integrator channels.

Benchmarks and standing

DBRX was characterized at March 2024 release as one of the leading open-source LLMs at scale, with reported strong performance on reading comprehension, general knowledge, and other benchmarks. The model has been less prominent on subsequent benchmarks given the rapid evolution of open-source LLM capability through 2024 to 2026 and Databricks's strategic emphasis on enterprise platform capability rather than continued open-weights leadership.

Mosaic AI's commercial standing is anchored by Databricks's enterprise customer base, the $1.4 billion AI product revenue (approximately 26 percent of $5.4 billion total revenue as of February 2026), and the integrated platform value proposition. Industry coverage frequently characterizes Databricks as one of the principal enterprise AI platforms globally, alongside Snowflake on the data-and-AI tier and alongside the frontier-lab API providers on the model-capability tier.

The company's standing in the global AI ecosystem is anchored on the Apache Spark research lineage, the founding-team Berkeley research credentials, the commercial revenue scale, and the Mosaic AI integrated platform.

Leadership

As of April 2026, Databricks's senior leadership includes:

  • Ali Ghodsi, Co-Founder and Chief Executive Officer since 2016. Original Apache Spark creator. Public face for Databricks on platform strategy, AI strategy, and the IPO trajectory.
  • Matei Zaharia, Co-Founder and Chief Technologist. Original Apache Spark creator. Concurrent appointment as Stanford Computer Science professor.
  • Ion Stoica, Co-Founder and Executive Chairman. UC Berkeley professor and AMPLab co-founder. Also co-founder of Anyscale (the Ray distributed-computing company).
  • Reynold Xin, Co-Founder and Chief Architect. Senior technical leadership for the Databricks platform.
  • Naveen Rao, GM of Generative AI at Databricks (post-MosaicML acquisition). Former MosaicML CEO; brought the Mosaic AI commercial leadership into Databricks.

The company has hired aggressively across enterprise sales, AI engineering, platform engineering, and other functions, with executive recruitment from peer enterprise-software organizations.

Funding and backers

Databricks's funding history through April 2026 includes approximately $19 billion in cumulative private capital. The most recent major rounds are the Series J of $10 billion in November 2024 at a $62 billion valuation and the Series L of $4 billion in December 2025 at a $134 billion valuation. The $1.8 billion JPMorgan-led debt financing in January 2026 is widely interpreted as IPO preparation.

The investor base is unusually broad, including senior US venture and growth-equity firms, sovereign-wealth funds, and other strategic investors. Lead investors across recent rounds include Andreessen Horowitz, Tiger Global, Insight Partners, Iconiq Growth, Capital Group, Wellington, T. Rowe Price, Singapore's GIC, the Ontario Teachers' Pension Plan, and other organizations.

The reported $5.4 billion annualized revenue run rate as of February 2026, with over 65 percent year-over-year growth, supports the $134 billion private valuation and the IPO trajectory. Industry coverage has characterized the most likely IPO timeline as an S-1 filing in Q3 2026 with a public listing in late 2026 or early 2027, though Ali Ghodsi has not publicly committed to specific timing.

Industry position

Databricks occupies a structurally distinctive position in the global enterprise software and AI landscape. The combination of the Apache Spark research lineage, the integrated Lakehouse-and-Mosaic-AI platform, the commercial revenue scale, the Berkeley founding-team continuity, and the post-MosaicML integration produces a profile that no other enterprise data-and-AI organization matches at the same combination of attributes.

Industry coverage has frequently characterized Databricks as the principal private enterprise-software company in the AI era, with the post-MosaicML integration positioning the company at the intersection of data infrastructure and generative AI. The company's competitive position relative to Snowflake (the publicly listed data-cloud peer) and to the frontier-AI labs is widely tracked and analyzed.

Strategic risks include the operational complexity of managing the data platform, the AI platform, and the broader enterprise organization simultaneously, intensifying competition from cloud-platform AI services (AWS Bedrock, Azure AI, Google Vertex AI) and from frontier-AI labs expanding enterprise capabilities, and the broader macroeconomic environment for the planned 2026-2027 IPO. Strategic strengths include the founding-team continuity, the Lakehouse-and-Mosaic-AI integrated platform, the revenue scale and growth trajectory, and the Apache Spark and broader open-source ecosystem positioning.

Competitive landscape

Databricks competes with several enterprise software and AI organizations:

  • Snowflake AI. Direct enterprise data-and-AI competitor. Snowflake's Cortex AI and other platform capabilities compete with Mosaic AI on the enterprise GenAI tier.
  • OpenAI, Anthropic, Google DeepMind. Frontier-AI labs whose enterprise APIs compete with Mosaic AI's foundation-model training and deployment capabilities. Databricks's Mosaic AI Gateway integrates third-party API providers including the frontier labs.
  • Salesforce AI Research. Enterprise-AI competitor through the Einstein platform; less direct overlap given Salesforce's CRM-and-business-application focus.
  • AWS, Microsoft Azure, Google Cloud. Cloud-platform AI services compete with Databricks on managed-AI infrastructure. Databricks runs on all three clouds and integrates with cloud-native AI services through Mosaic AI Gateway.
  • Together AI, NVIDIA Research. AI infrastructure providers competing on the foundation-model training and deployment tier.
  • Cohere. Enterprise-AI provider with adjacent positioning. Less direct overlap given Cohere's enterprise-AI-only focus versus Databricks's data-and-AI integrated platform.

Outlook

Several open questions affect Databricks's trajectory in 2026 and 2027:

  • The IPO timing and post-IPO trading. The most likely scenarios include an S-1 filing in Q3 2026 with a late-2026 or early-2027 listing.
  • Continued Mosaic AI product expansion, including the trajectory of the Agent Framework, Agent Evaluation, Gateway, and other platform capabilities.
  • The competitive dynamic with Snowflake on the enterprise data-and-AI platform tier.
  • Continued AI product revenue growth (currently $1.4 billion, approximately 26 percent of total revenue).
  • The strategic relationship with frontier-AI labs through Mosaic AI Gateway and other integration channels.
  • The post-IPO M&A trajectory; Databricks has been an active acquirer (MosaicML, Tabular, BladeBridge, Lilac, Einblick, Mooncake Labs, and other organizations) and the post-IPO capital base would support continued acquisition activity.
  • Continued senior research-and-engineering talent recruitment, particularly in the Mosaic AI organization.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.