Saining Xie
Saining Xie is a computer-vision researcher, co-founder and Chief Science Officer of AMI, and an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University. He is the first author of ResNeXt, the senior author of ConvNeXt, and the co-creator with William Peebles of the Diffusion Transformer architecture that underlies OpenAI's Sora and several frontier video-generation systems. As of May 2026, he is on academic leave from NYU and serves as AMI's Chief Science Officer following the company's $1.03 billion seed round announced March 9, 2026.
At a glance
- Education: Bachelor of Science in computer science, Shanghai Jiao Tong University (ACM Honors Class); PhD in computer science, University of California, San Diego (2018), advised by Zhuowen Tu.
- Current roles: Co-founder and Chief Science Officer of AMI since late 2025; Assistant Professor of Computer Science at the NYU Courant Institute since 2023, on leave through Spring and Summer 2026.
- Key contributions: ResNeXt (CVPR 2017, first author); Momentum Contrast (MoCo) (CVPR 2020, with Kaiming He and Ross Girshick); Masked Autoencoders (MAE) (CVPR 2022, with Kaiming He, Xinlei Chen, and others); ConvNeXt (CVPR 2022, senior author); Diffusion Transformers (DiT) (ICCV 2023, with William Peebles); Cambrian-1 (NeurIPS 2024, senior author with Yann LeCun).
- Awards: Marr Prize Honorable Mention, ICCV 2015 (for "Deeply-Supervised Nets"); NSF CAREER Award; PAMI Young Researcher Award (CVPR 2025); AISTATS Test-of-Time Award.
- X / Twitter: @sainingxie
- LinkedIn: sainxie
- Personal site: sainingxie.com
- Google Scholar: Saining Xie
Origins
Xie was born in China and completed his undergraduate degree in computer science at Shanghai Jiao Tong University as a member of the ACM Honors Class. He worked as a research assistant in the Computational Intelligence Lab under Hongtao Lu, his earliest exposure to computer-vision and machine-learning research.
He moved to the United States for doctoral study, joining the University of California, San Diego computer science department in 2013. He was advised by Zhuowen Tu, whose group was active in deep representation learning and computer vision. His ICCV 2015 paper "Deeply-Supervised Nets" with Tu and others received the Marr Prize Honorable Mention. He completed his PhD in 2018 with a dissertation titled "Deep Representation Learning with Induced Structural Priors."
Career
Xie joined Facebook AI Research (FAIR) in Menlo Park as a research scientist in 2018, immediately after his PhD. He spent four years at FAIR within the computer-vision group alongside Kaiming He, Ross Girshick, Piotr Dollár, and Trevor Darrell. The FAIR years produced several of his most-cited papers: ResNeXt (CVPR 2017, first author), Momentum Contrast (MoCo, CVPR 2020), Masked Autoencoders (MAE, CVPR 2022), and ConvNeXt (CVPR 2022, senior corresponding author).
In 2022, while still affiliated with FAIR, Xie began the research project that produced Diffusion Transformers (DiT). The work with William Peebles, then a Berkeley PhD student interning at Meta, replaced the U-Net backbone of latent diffusion models with a transformer operating on latent patches. The paper was rejected at CVPR 2023, then accepted as an oral at ICCV 2023, and became the architectural foundation for OpenAI's Sora and Stable Diffusion 3. Peebles later joined OpenAI and co-led the Sora team.
Xie joined New York University as an Assistant Professor of Computer Science at the Courant Institute in 2023. He established the VISIONx group and continued publishing on multimodal learning and visual representation. The 2024 Cambrian-1 paper, senior-authored with Yann LeCun and led by NYU PhD student Shengbang Tong, presented a vision-centric exploration of multimodal large language models at NeurIPS 2024. He taught the graduate computer-vision course (CSCI-GA 2271) and a course on Learning with Large Language and Vision Models (CSCI-GA 3033-102).
In parallel, Xie joined Google DeepMind's GenAI team as a research scientist on the Nano Banana image and vision research line. The DeepMind period produced "Image Generators are Generalist Vision Learners" (Vision Banana), showing that a generalist multimodal model could match specialist computer-vision systems on segmentation, depth, and surface-normal tasks.
In late 2025, Xie joined AMI as co-founder and Chief Science Officer alongside Yann LeCun and Alexandre LeBrun. The AMI $1.03 billion seed round at a $3.5 billion pre-money valuation was announced March 9, 2026. His NYU page records academic leave through Spring and Summer 2026, with the AMI role taking primary focus.
Affiliations
- Facebook AI Research (FAIR): Research Scientist, 2018 to 2022.
- NYU Courant Institute: Assistant Professor of Computer Science, 2023 to present (on leave Spring and Summer 2026).
- Google DeepMind: Research Scientist (GenAI / Nano Banana team), 2024 to 2025.
- AMI: Co-founder and Chief Science Officer, late 2025 to present.
Notable contributions
Xie's published work spans visual representation learning, generative models, and multimodal systems.
- ResNeXt (CVPR 2017). First-author paper with Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Introduced "cardinality" as a third design dimension beyond depth and width by aggregating residual transformations across parallel branches. ResNeXt became one of the standard backbone families for image classification and downstream vision tasks.
- Momentum Contrast (MoCo) (CVPR 2020). With Kaiming He, Haoqi Fan, Yuxin Wu, and Ross Girshick. Built a dynamic dictionary of negative examples with a queue and a momentum encoder, enabling large-scale contrastive self-supervised learning that surpassed supervised pre-training on several downstream benchmarks.
- Masked Autoencoders (MAE) (CVPR 2022). With Kaiming He, Xinlei Chen, Yanghao Li, Piotr Dollár, and Ross Girshick. Applied masked-image modeling to vision transformers, masking 75 percent of patches and reconstructing them with an asymmetric encoder-decoder. Became the standard self-supervised pre-training recipe for ViT-family models.
- ConvNeXt (CVPR 2022). Senior corresponding author with first author Zhuang Liu plus Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, and Trevor Darrell. Revisited convolutional design with the lessons of vision transformers, producing a pure ConvNet that matches or exceeds Swin Transformer accuracy on ImageNet, COCO, and ADE20K.
- Diffusion Transformers (DiT) (ICCV 2023). With William Peebles. Replaced the U-Net backbone of latent diffusion models with a pure transformer operating on latent patches, demonstrating clean compute-versus-quality scaling for image generation. Became the foundation for OpenAI's Sora and Stable Diffusion 3. Xie has cited the paper's CVPR 2023 rejection publicly as a reminder that frontier-impact judgement is hard even for expert reviewers.
- Cambrian-1 (NeurIPS 2024). Senior author with Yann LeCun and a team of NYU students. Introduced the Spatial Vision Aggregator connector and the CV-Bench evaluation. Places the vision representation, rather than the language model, at the center of multimodal-system design.
- Vision Banana (Google DeepMind, 2025). With Kaiming He and others on the Nano Banana team. An instruction-tuned generalist multimodal model that outperforms specialist computer-vision systems on segmentation, depth estimation, and surface-normal prediction.
- Public commentary. Xie's research-philosophy talk "Research as an Infinite Game" (CVPR 2025) and his TUM AI lecture "The Multimodal Future: Why Visual Representation Still Matters" (March 2025) are widely circulated as statements of his research outlook.
Investments and boards
- AMI (AI): Co-founder and Chief Science Officer, late 2025 to present. AMI announced a $1.03 billion seed round at a $3.5 billion pre-money valuation on March 9, 2026.
No public personal angel-investor activity is on record in AI, semiconductors, datacenters, software, or energy as of May 2026.
Network
Xie's strongest professional relationships sit in the computer-vision lineage at FAIR. Kaiming He is his most-frequent senior collaborator from the FAIR years and a co-author on ResNeXt, MoCo, MAE, and Vision Banana. Ross Girshick, Piotr Dollár, Yanghao Li, Xinlei Chen, and Haoqi Fan are recurring FAIR co-authors. Trevor Darrell at UC Berkeley collaborated on ConvNeXt. His PhD advisor Zhuowen Tu at UCSD remains a long-running co-author.
The mentee relationship with William Peebles, the Berkeley PhD student who interned at Meta during the DiT project and now co-leads Sora at OpenAI, is the most-watched of his research-supervision relationships. Other former students include Zhuang Liu (assistant professor at Princeton), Sanghyun Woo (Google DeepMind), Eric Mintun (Sora team at OpenAI), and Shengbang Tong (NYU PhD student and Cambrian-1 lead author).
The AMI founding cohort places Xie alongside Yann LeCun, Alexandre LeBrun, Pascale Fung (chief research and innovation officer), Michael Rabbat (VP, world models), and Laurent Solly (COO). The LeCun connection through both NYU and AMI is the structural anchor of his current career position.
Position in the field
Xie occupies a distinctive position among computer-vision researchers of his cohort. The combination of a frontier-tier publication record across architecture (ResNeXt, ConvNeXt), self-supervised representation learning (MoCo, MAE), and generative modeling (DiT) is unusual; most peers concentrate on one or two of those three lines. DiT in particular has had measurable downstream impact on commercial video-generation systems including Sora.
The AMI Chief Science Officer role places him in a senior research-leadership position at the most-watched non-LLM frontier-research bet of 2026. Industry coverage has placed Xie's recruitment alongside Pascale Fung's as one of the principal validating data points for AMI's $4.5 billion post-money seed valuation, given that the company is pre-product. His research record provides credibility on visual representation and multimodal architecture, areas central to the JEPA world-model thesis AMI is pursuing.
The dual NYU and AMI structure mirrors the model used by Yann LeCun at Meta and AMI, and several other senior researchers who have continued academic appointments alongside frontier-lab leadership. Whether Xie returns to full-time NYU teaching after the AMI launch period is an open question.
Outlook
Open questions over the next 6 to 18 months:
- First AMI publications. Whether AMI will publish papers under Xie's name, and whether those papers extend the JEPA family or produce new architectural directions.
- NYU return. Whether Xie returns to NYU teaching in Fall 2026 or extends academic leave further into the AMI launch period.
- Vision in world models. AMI's stated focus on world models built on continuous sensory input aligns directly with Xie's research on visual representation. Whether the company produces a vision-first world model artifact in 2026, and at what scale, is a watchable signal.
- DiT-family commercial deployment. As Sora and other diffusion-transformer systems scale through 2026 and 2027, whether Xie continues to make architectural contributions to the line, or pivots fully to AMI's world-model direction, will indicate the breadth of his research focus.
- Mentee trajectory. William Peebles, Zhuang Liu, and other former students continue to occupy senior research-leadership roles at frontier labs. The pattern of former Xie collaborators in senior positions is itself a watchable indicator of his research influence.
- Public-commentary positioning. Xie's X account and conference talks have consistently emphasized the role of visual representation in multimodal AI. Whether the AMI period sharpens that public position or shifts it toward LeCun-style critique of the LLM paradigm is an open question.
Sources
- Saining Xie personal site. Personal academic and research home page.
- Saining Xie NYU Center for Data Science page. NYU faculty profile.
- Aggregated Residual Transformations for Deep Neural Networks. The 2016 ResNeXt paper, first-authored by Xie with Girshick, Dollár, Tu, and He.
- A ConvNet for the 2020s. The 2022 ConvNeXt paper, senior-authored by Xie with first author Zhuang Liu and collaborators.
- Scalable Diffusion Models with Transformers. The 2022 DiT paper with William Peebles, ICCV 2023 oral.
- Masked Autoencoders Are Scalable Vision Learners. The 2021 MAE paper with Kaiming He, Xinlei Chen, Yanghao Li, Piotr Dollár, and Ross Girshick.
- Momentum Contrast for Unsupervised Visual Representation Learning. The 2019 MoCo paper with Kaiming He, Haoqi Fan, Yuxin Wu, and Ross Girshick.
- Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. The 2024 Cambrian-1 paper, senior-authored by Xie with first author Shengbang Tong and Yann LeCun.
- Yann LeCun's AMI Labs raises $1.03B to build world models. TechCrunch on the March 2026 AMI seed announcement and leadership team.
- TUM AI Lecture Series - The multimodal future: Why visual representation still matters (Saining Xie). Saining Xie's TUM AI guest lecture, March 2025.
- Diffusion transformers are the key behind OpenAI's Sora -- and they're set to upend GenAI. TechCrunch on the DiT-to-Sora architectural lineage.
- Advanced Machine Intelligence (AMI) is Enabling the Next AI Revolution. Cathay Innovation announcement of the AMI seed and founding leadership.
- Photo: NYU Courant AI faculty page, NYU Courant Institute Saining Xie portrait.