Datology AI

Datology AI is the American foundation-model-data-curation research startup founded in 2023 by former Google DeepMind and Meta researcher Ari Morcos alongside Matthew Leavitt and Bogdan Gaza, focused on automated training-data selection and quality optimization for foundation-model training.
Datology AI

Datology AI

Datology AI is an American artificial intelligence research startup headquartered in Berkeley, California, founded in 2023 by Ari Morcos (Co-Founder and Chief Executive Officer; former senior research scientist at Meta AI / FAIR and Google DeepMind), Matthew Leavitt (Co-Founder; former research scientist at MosaicML and Meta), and Bogdan Gaza (Co-Founder; former senior engineering leader at Twitter, ROBLOX, and other consumer-internet organizations). The company develops automated training-data-curation tooling for foundation-model training, with focus on training-data selection, quality optimization, and the data-pipeline software that frontier-model training requires. As of April 2026, Datology AI is one of the principal data-curation-research startups in the AI ecosystem, with active research output and commercial-customer traction with frontier-AI labs and enterprise foundation-model trainers.

At a glance

  • Founded: 2023 in Berkeley, California, by Ari Morcos, Matthew Leavitt, and Bogdan Gaza.
  • Status: Private. Series A of $46 million in September 2024 led by Radical Ventures with Felicis, Amplify Partners, and other investor participation.
  • Funding: Approximately $58 million in cumulative private capital across the seed round in 2023 and the Series A in 2024.
  • CEO: Ari Morcos, Co-Founder and Chief Executive Officer.
  • Other notable leadership: Matthew Leavitt, Co-Founder. Bogdan Gaza, Co-Founder.
  • Open weights: N/A. Datology AI is a data-tooling and research company.
  • Flagship outputs: Automated training-data-curation tooling; published research output on training-data selection, dataset quality, and foundation-model-training-data optimization; commercial-customer engagements with frontier-AI labs and enterprise foundation-model trainers.

Origins

Datology AI was founded in 2023 by Ari Morcos, Matthew Leavitt, and Bogdan Gaza with the founding research thesis that the bottleneck in foundation-model training is not model architecture or compute but the quality and selection of training data. Morcos brought senior research credibility from prior roles at Meta AI / FAIR and Google DeepMind with research output on training-dynamics, neural-network interpretability, and the role of training data in foundation-model capability. Leavitt joined from MosaicML where he had worked on training-efficiency research before MosaicML's June 2023 acquisition by Databricks. Gaza brought senior engineering-leadership experience from Twitter, ROBLOX, and other consumer-internet organizations.

The 2023 founding seed-round capital provided the base for early research-and-engineering team build-out and initial customer engagement. The September 2024 Series A of $46 million led by Radical Ventures with Felicis, Amplify Partners, and other investor participation provided growth-stage capital for commercial expansion. Industry coverage characterized the round as one of the principal AI-data-tooling fundraises of the year, with the Morcos-Leavitt research-credibility lineage as the principal validating data point.

The 2024 to 2026 period has seen continued research-publication output alongside commercial-customer engagement with frontier-AI labs and enterprise foundation-model trainers. The training-data-selection and dataset-quality research has been characterized in industry coverage as one of the structurally consequential research areas in the post-scaling-laws foundation-model environment, where compute and parameter count are no longer the principal capability differentiators.

Mission and strategy

Datology AI's stated mission is to make data the principal lever for foundation-model capability, with automated data-curation tooling that frontier-model trainers and enterprise customers can integrate into their training pipelines. The strategy combines two threads. First, research output on training-data selection, dataset quality, and foundation-model-training-data optimization, contributing to the broader academic-research community. Second, commercial tooling that translates the research into enterprise-deployable software, with frontier-AI labs and enterprise foundation-model trainers as the principal customer base.

The competitive premise is that automated data-curation methodology can produce foundation-model capability gains that compute-scaling alone cannot match, and that the research methodology required to discover and operationalize the data-curation interventions is structurally more sustainable than alternative training-data-tooling approaches.

Models and products

  • Automated training-data-curation tooling. The principal commercial offering. Software platform for foundation-model trainers to integrate automated data-selection-and-quality-optimization into existing training pipelines.
  • Published research output. Active publication program at major AI venues including NeurIPS, ICML, and ICLR. Research areas include training-data selection, dataset deduplication, dataset-quality scoring, and the role of data in foundation-model capability.
  • Commercial-customer engagement. With frontier-AI labs and enterprise foundation-model trainers across the global AI ecosystem.

Distribution channels are predominantly direct enterprise engagement with frontier-AI labs and large enterprise foundation-model trainers, alongside selected academic-publication output.

Benchmarks and standing

Datology AI's evaluation framework focuses on the foundation-model capability improvements that the data-curation tooling produces (typically measured via held-out benchmark performance after training on Datology-curated vs. baseline datasets), commercial-customer adoption, and academic-publication impact.

Industry coverage has characterized Datology AI as one of the principal data-curation research startups in the AI ecosystem, with the Morcos-anchored research credibility and the published-research output as differentiators against direct AI-data-tooling competitors. The research direction has been characterized in industry coverage as structurally consequential given the broader industry shift toward post-scaling-law training-paradigm research.

Leadership

As of April 2026, Datology AI's senior leadership includes:

  • Ari Morcos, Co-Founder and Chief Executive Officer.
  • Matthew Leavitt, Co-Founder.
  • Bogdan Gaza, Co-Founder.
  • Senior research, engineering, and commercial leadership across the data-curation tooling and customer-engagement organizations.

Funding and backers

Approximately $58 million in cumulative private capital across the 2023 seed round and the September 2024 Series A of $46 million led by Radical Ventures with Felicis, Amplify Partners, and other investor participation.

Industry position

Datology AI occupies a distinctive position among AI-data-tooling startups, with the Meta-AI-FAIR-and-Google-DeepMind research lineage of the founding leadership, the Morcos-anchored research credibility, and the data-curation methodological focus that aligns with the broader post-scaling-law foundation-model research direction. Industry coverage has characterized Datology AI as one of the structurally consequential AI-data-tooling startups in the broader research-product landscape.

Competitive landscape

  • Snorkel AI. Programmatic-labeling competitor with focus on enterprise data-development workflows. Different methodological approach (programmatic weak supervision) and different customer base (enterprise vs. frontier-AI-lab).
  • Scale AI. Expert-marketplace AI-data competitor. Different methodological approach (human-expert labeling) and different customer base structure.
  • Hugging Face Datasets, Common Crawl, LAION. Open-source training-data alternatives that reduce frontier-AI-lab dependence on commercial data-tooling vendors.
  • DataComp, MosaicML (now part of Databricks). Adjacent data-curation research initiatives.
  • OpenAI, Anthropic, Google DeepMind, Meta AI / FAIR internal data-curation teams. In-house alternatives at the frontier-AI labs.

Outlook

  • The continued research-publication output on training-data selection and dataset quality through 2026 to 2027.
  • The commercial-customer-base expansion across frontier-AI labs and enterprise foundation-model trainers.
  • The competitive dynamic against in-house data-curation organizations at the frontier-AI labs.
  • Potential subsequent funding rounds and the trajectory toward sustained commercial scale.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

Keep track of what's happening from cutting edge AI Research institutions.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.