Datology AI

Datology AI is an American artificial intelligence research startup headquartered in Berkeley, California, founded in 2023 by Ari Morcos (Co-Founder and Chief Executive Officer; former senior research scientist at Meta AI / FAIR and Google DeepMind), Matthew Leavitt (Co-Founder; former research scientist at MosaicML and Meta), and Bogdan Gaza (Co-Founder; former senior engineering leader at Twitter, ROBLOX, and other consumer-internet organizations). The company develops automated training-data-curation tooling for foundation-model training, with focus on training-data selection, quality optimization, and the data-pipeline software that frontier-model training requires. As of April 2026, Datology AI is one of the principal data-curation-research startups in the AI ecosystem, with active research output and commercial-customer traction with frontier-AI labs and enterprise foundation-model trainers.

At a glance

Founded: 2023 in Berkeley, California, by Ari Morcos, Matthew Leavitt, and Bogdan Gaza.
Status: Private. Series A of $46 million in September 2024 led by Radical Ventures with Felicis, Amplify Partners, and other investor participation.
Funding: Approximately $58 million in cumulative private capital across the seed round in 2023 and the Series A in 2024.
CEO: Ari Morcos, Co-Founder and Chief Executive Officer.
Other notable leadership: Matthew Leavitt, Co-Founder. Bogdan Gaza, Co-Founder.
Open weights: N/A. Datology AI is a data-tooling and research company.
Flagship outputs: Automated training-data-curation tooling; published research output on training-data selection, dataset quality, and foundation-model-training-data optimization; commercial-customer engagements with frontier-AI labs and enterprise foundation-model trainers.

Origins

Datology AI was founded in 2023 by Ari Morcos, Matthew Leavitt, and Bogdan Gaza with the founding research thesis that the bottleneck in foundation-model training is not model architecture or compute but the quality and selection of training data. Morcos brought senior research credibility from prior roles at Meta AI / FAIR and Google DeepMind with research output on training-dynamics, neural-network interpretability, and the role of training data in foundation-model capability. Leavitt joined from MosaicML where he had worked on training-efficiency research before MosaicML's June 2023 acquisition by Databricks. Gaza brought senior engineering-leadership experience from Twitter, ROBLOX, and other consumer-internet organizations.

The 2023 founding seed-round capital provided the base for early research-and-engineering team build-out and initial customer engagement. The September 2024 Series A of $46 million led by Radical Ventures with Felicis, Amplify Partners, and other investor participation provided growth-stage capital for commercial expansion. Industry coverage characterized the round as one of the principal AI-data-tooling fundraises of the year, with the Morcos-Leavitt research-credibility lineage as the principal validating data point.

The 2024 to 2026 period has seen continued research-publication output alongside commercial-customer engagement with frontier-AI labs and enterprise foundation-model trainers. The training-data-selection and dataset-quality research has been characterized in industry coverage as one of the structurally consequential research areas in the post-scaling-laws foundation-model environment, where compute and parameter count are no longer the principal capability differentiators.

Mission and strategy

Datology AI's stated mission is to make data the principal lever for foundation-model capability, with automated data-curation tooling that frontier-model trainers and enterprise customers can integrate into their training pipelines. The strategy combines two threads. First, research output on training-data selection, dataset quality, and foundation-model-training-data optimization, contributing to the broader academic-research community. Second, commercial tooling that translates the research into enterprise-deployable software, with frontier-AI labs and enterprise foundation-model trainers as the principal customer base.

The competitive premise is that automated data-curation methodology can produce foundation-model capability gains that compute-scaling alone cannot match, and that the research methodology required to discover and operationalize the data-curation interventions is structurally more sustainable than alternative training-data-tooling approaches.

Models and products

Automated training-data-curation tooling. The principal commercial offering. Software platform for foundation-model trainers to integrate automated data-selection-and-quality-optimization into existing training pipelines.
Published research output. Active publication program at major AI venues including NeurIPS, ICML, and ICLR. Research areas include training-data selection, dataset deduplication, dataset-quality scoring, and the role of data in foundation-model capability.
Commercial-customer engagement. With frontier-AI labs and enterprise foundation-model trainers across the global AI ecosystem.

Distribution channels are predominantly direct enterprise engagement with frontier-AI labs and large enterprise foundation-model trainers, alongside selected academic-publication output.

Benchmarks and standing

Datology AI's evaluation framework focuses on the foundation-model capability improvements that the data-curation tooling produces (typically measured via held-out benchmark performance after training on Datology-curated vs. baseline datasets), commercial-customer adoption, and academic-publication impact.

Industry coverage has characterized Datology AI as one of the principal data-curation research startups in the AI ecosystem, with the Morcos-anchored research credibility and the published-research output as differentiators against direct AI-data-tooling competitors. The research direction has been characterized in industry coverage as structurally consequential given the broader industry shift toward post-scaling-law training-paradigm research.

Leadership

As of April 2026, Datology AI's senior leadership includes:

Ari Morcos, Co-Founder and Chief Executive Officer.
Matthew Leavitt, Co-Founder.
Bogdan Gaza, Co-Founder.
Senior research, engineering, and commercial leadership across the data-curation tooling and customer-engagement organizations.

Funding and backers

Approximately $58 million in cumulative private capital across the 2023 seed round and the September 2024 Series A of $46 million led by Radical Ventures with Felicis, Amplify Partners, and other investor participation.

Industry position

Datology AI occupies a distinctive position among AI-data-tooling startups, with the Meta-AI-FAIR-and-Google-DeepMind research lineage of the founding leadership, the Morcos-anchored research credibility, and the data-curation methodological focus that aligns with the broader post-scaling-law foundation-model research direction. Industry coverage has characterized Datology AI as one of the structurally consequential AI-data-tooling startups in the broader research-product landscape.

Competitive landscape

Snorkel AI. Programmatic-labeling competitor with focus on enterprise data-development workflows. Different methodological approach (programmatic weak supervision) and different customer base (enterprise vs. frontier-AI-lab).
Scale AI. Expert-marketplace AI-data competitor. Different methodological approach (human-expert labeling) and different customer base structure.
Hugging Face Datasets, Common Crawl, LAION. Open-source training-data alternatives that reduce frontier-AI-lab dependence on commercial data-tooling vendors.
DataComp, MosaicML (now part of Databricks). Adjacent data-curation research initiatives.
OpenAI, Anthropic, Google DeepMind, Meta AI / FAIR internal data-curation teams. In-house alternatives at the frontier-AI labs.

Outlook

The continued research-publication output on training-data selection and dataset quality through 2026 to 2027.
The commercial-customer-base expansion across frontier-AI labs and enterprise foundation-model trainers.
The competitive dynamic against in-house data-curation organizations at the frontier-AI labs.
Potential subsequent funding rounds and the trajectory toward sustained commercial scale.

Sources

Datology AI official site. Company reference.
Ari Morcos LinkedIn. Co-Founder and CEO reference.
Matthew Leavitt LinkedIn. Co-Founder reference.
Radical Ventures. Series A lead investor.
DataComp benchmark. Adjacent research initiative reference.

Datology AI

At a glance

Origins

Mission and strategy

Models and products

Benchmarks and standing

Leadership

Funding and backers

Industry position

Competitive landscape

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

Datology AI

At a glance

Origins

Mission and strategy

Models and products

Benchmarks and standing

Leadership

Funding and backers

Industry position

Competitive landscape

Outlook

Sources

Nextomoro

QwQ-32B

Qwen3 Coder 480B-A35B

MiniMax M2

Kimi K2.5

Qwen 3.6

AI Research Lab Intelligence