Datology AI
Datology AI is an American artificial intelligence research startup headquartered in Berkeley, California, founded in 2023 by Ari Morcos (Co-Founder and Chief Executive Officer; former senior research scientist at Meta AI / FAIR and Google DeepMind), Matthew Leavitt (Co-Founder; former research scientist at MosaicML and Meta), and Bogdan Gaza (Co-Founder; former senior engineering leader at Twitter, ROBLOX, and other consumer-internet organizations). The company develops automated training-data-curation tooling for foundation-model training, with focus on training-data selection, quality optimization, and the data-pipeline software that frontier-model training requires. As of April 2026, Datology AI is one of the principal data-curation-research startups in the AI ecosystem, with active research output and commercial-customer traction with frontier-AI labs and enterprise foundation-model trainers.
At a glance
- Founded: 2023 in Berkeley, California, by Ari Morcos, Matthew Leavitt, and Bogdan Gaza.
- Status: Private. Series A of $46 million in September 2024 led by Radical Ventures with Felicis, Amplify Partners, and other investor participation.
- Funding: Approximately $58 million in cumulative private capital across the seed round in 2023 and the Series A in 2024.
- CEO: Ari Morcos, Co-Founder and Chief Executive Officer.
- Other notable leadership: Matthew Leavitt, Co-Founder. Bogdan Gaza, Co-Founder.
- Open weights: N/A. Datology AI is a data-tooling and research company.
- Flagship outputs: Automated training-data-curation tooling; published research output on training-data selection, dataset quality, and foundation-model-training-data optimization; commercial-customer engagements with frontier-AI labs and enterprise foundation-model trainers.
Origins
Datology AI was founded in 2023 by Ari Morcos, Matthew Leavitt, and Bogdan Gaza with the founding research thesis that the bottleneck in foundation-model training is not model architecture or compute but the quality and selection of training data. Morcos brought senior research credibility from prior roles at Meta AI / FAIR and Google DeepMind with research output on training-dynamics, neural-network interpretability, and the role of training data in foundation-model capability. Leavitt joined from MosaicML where he had worked on training-efficiency research before MosaicML's June 2023 acquisition by Databricks. Gaza brought senior engineering-leadership experience from Twitter, ROBLOX, and other consumer-internet organizations.
The 2023 founding seed-round capital provided the base for early research-and-engineering team build-out and initial customer engagement. The September 2024 Series A of $46 million led by Radical Ventures with Felicis, Amplify Partners, and other investor participation provided growth-stage capital for commercial expansion. Industry coverage characterized the round as one of the principal AI-data-tooling fundraises of the year, with the Morcos-Leavitt research-credibility lineage as the principal validating data point.
The 2024 to 2026 period has seen continued research-publication output alongside commercial-customer engagement with frontier-AI labs and enterprise foundation-model trainers. The training-data-selection and dataset-quality research has been characterized in industry coverage as one of the structurally consequential research areas in the post-scaling-laws foundation-model environment, where compute and parameter count are no longer the principal capability differentiators.
Mission and strategy
Datology AI's stated mission is to make data the principal lever for foundation-model capability, with automated data-curation tooling that frontier-model trainers and enterprise customers can integrate into their training pipelines. The strategy combines two threads. First, research output on training-data selection, dataset quality, and foundation-model-training-data optimization, contributing to the broader academic-research community. Second, commercial tooling that translates the research into enterprise-deployable software, with frontier-AI labs and enterprise foundation-model trainers as the principal customer base.
The competitive premise is that automated data-curation methodology can produce foundation-model capability gains that compute-scaling alone cannot match, and that the research methodology required to discover and operationalize the data-curation interventions is structurally more sustainable than alternative training-data-tooling approaches.
Models and products
- Automated training-data-curation tooling. The principal commercial offering. Software platform for foundation-model trainers to integrate automated data-selection-and-quality-optimization into existing training pipelines.
- Published research output. Active publication program at major AI venues including NeurIPS, ICML, and ICLR. Research areas include training-data selection, dataset deduplication, dataset-quality scoring, and the role of data in foundation-model capability.
- Commercial-customer engagement. With frontier-AI labs and enterprise foundation-model trainers across the global AI ecosystem.
Distribution channels are predominantly direct enterprise engagement with frontier-AI labs and large enterprise foundation-model trainers, alongside selected academic-publication output.
Benchmarks and standing
Datology AI's evaluation framework focuses on the foundation-model capability improvements that the data-curation tooling produces (typically measured via held-out benchmark performance after training on Datology-curated vs. baseline datasets), commercial-customer adoption, and academic-publication impact.
Industry coverage has characterized Datology AI as one of the principal data-curation research startups in the AI ecosystem, with the Morcos-anchored research credibility and the published-research output as differentiators against direct AI-data-tooling competitors. The research direction has been characterized in industry coverage as structurally consequential given the broader industry shift toward post-scaling-law training-paradigm research.
Leadership
As of April 2026, Datology AI's senior leadership includes:
- Ari Morcos, Co-Founder and Chief Executive Officer.
- Matthew Leavitt, Co-Founder.
- Bogdan Gaza, Co-Founder.
- Senior research, engineering, and commercial leadership across the data-curation tooling and customer-engagement organizations.
Funding and backers
Approximately $58 million in cumulative private capital across the 2023 seed round and the September 2024 Series A of $46 million led by Radical Ventures with Felicis, Amplify Partners, and other investor participation.
Industry position
Datology AI occupies a distinctive position among AI-data-tooling startups, with the Meta-AI-FAIR-and-Google-DeepMind research lineage of the founding leadership, the Morcos-anchored research credibility, and the data-curation methodological focus that aligns with the broader post-scaling-law foundation-model research direction. Industry coverage has characterized Datology AI as one of the structurally consequential AI-data-tooling startups in the broader research-product landscape.
Competitive landscape
- Snorkel AI. Programmatic-labeling competitor with focus on enterprise data-development workflows. Different methodological approach (programmatic weak supervision) and different customer base (enterprise vs. frontier-AI-lab).
- Scale AI. Expert-marketplace AI-data competitor. Different methodological approach (human-expert labeling) and different customer base structure.
- Hugging Face Datasets, Common Crawl, LAION. Open-source training-data alternatives that reduce frontier-AI-lab dependence on commercial data-tooling vendors.
- DataComp, MosaicML (now part of Databricks). Adjacent data-curation research initiatives.
- OpenAI, Anthropic, Google DeepMind, Meta AI / FAIR internal data-curation teams. In-house alternatives at the frontier-AI labs.
Outlook
- The continued research-publication output on training-data selection and dataset quality through 2026 to 2027.
- The commercial-customer-base expansion across frontier-AI labs and enterprise foundation-model trainers.
- The competitive dynamic against in-house data-curation organizations at the frontier-AI labs.
- Potential subsequent funding rounds and the trajectory toward sustained commercial scale.
Sources
- Datology AI official site. Company reference.
- Ari Morcos LinkedIn. Co-Founder and CEO reference.
- Matthew Leavitt LinkedIn. Co-Founder reference.
- Radical Ventures. Series A lead investor.
- DataComp benchmark. Adjacent research initiative reference.