John Schulman

John Schulman is an American computer scientist, co-founder of OpenAI, lead author of the Trust Region Policy Optimization and Proximal Policy Optimization reinforcement learning papers, and Chief Scientist of Thinking Machines Lab.
John Schulman

John Schulman

John Schulman is an American computer scientist and reinforcement-learning researcher, born in 1987 or 1988 and raised in Great Neck, New York. He is the Chief Scientist of Thinking Machines Lab, a co-founder of OpenAI, and the lead author of the Trust Region Policy Optimization and Proximal Policy Optimization papers that became standard policy-gradient algorithms in modern reinforcement-learning libraries. As of May 2026, Schulman leads research at Thinking Machines Lab following his February 2025 arrival from a brief tenure at Anthropic, and is a frequent public voice on reinforcement learning from human feedback and post-training methodology.

At a glance

Origins

Schulman was born in 1987 or 1988 and grew up in Great Neck, a town on the North Shore of Long Island in New York. According to his Berkeley News profile, he had an early interest in science fiction, particularly the work of Isaac Asimov, and traced his entry into engineering to a seventh-grade fascination with the BattleBots television program, which featured combat between remote-controlled robots. He has described the period as the source of his first self-directed engineering study, in service of a robot he and a group of school friends planned but did not complete.

He attended William A. Shine Great Neck South High School and competed on the United States Physics Olympiad team in 2005. He completed a bachelor's degree in physics at the California Institute of Technology in 2010 before moving to Berkeley for graduate work.

Career

Schulman's PhD work at the University of California, Berkeley under Pieter Abbeel concentrated on policy-gradient methods for deep reinforcement learning. The two foundational papers from this period are Trust Region Policy Optimization, submitted to arXiv in February 2015, and Proximal Policy Optimization, submitted to arXiv in July 2017 from his subsequent position at OpenAI. Both papers list Schulman as the lead author. PPO in particular became one of the most widely adopted reinforcement-learning algorithms in research and engineering practice, and is the default policy-gradient method in many modern RL libraries.

In December 2015, before completing his Berkeley PhD, Schulman joined the founding cohort of OpenAI as one of eleven publicly named co-founders alongside Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Pamela Vagata, and Wojciech Zaremba, with Altman and Musk as co-chairs. He completed and defended his Berkeley thesis in 2016 while continuing at OpenAI. Over the next nine years he led the reinforcement-learning team and was a central technical figure in the company's post-training and reinforcement-learning-from-human-feedback (RLHF) program.

The most widely cited research artifact from his OpenAI period beyond PPO is the March 2022 InstructGPT paper, "Training language models to follow instructions with human feedback," on which he is one of twenty co-authors behind lead author Long Ouyang. The InstructGPT methodology was the immediate precursor to ChatGPT, which OpenAI released as a research preview on November 30, 2022. Industry coverage including the Berkeley News alumni feature in April 2023 has referred to Schulman as the "architect" of ChatGPT in connection with his role on the post-training research that produced the consumer-facing model.

On August 5, 2024, Schulman announced via X that he was leaving OpenAI after nearly nine years to join Anthropic. In the post he framed the move as a way to deepen his focus on AI alignment and return to more hands-on technical work. The Reuters report on the move on August 6, 2024 noted his arrival at Anthropic without specifying a title. The Anthropic period was brief: on February 6, 2025, Fortune reported that Schulman was leaving Anthropic to join Mira Murati's newly launched Thinking Machines Lab as Chief Scientist. The total Anthropic tenure was approximately six months. Schulman has stated publicly through 2025 and 2026 that Thinking Machines plans to release its own models in 2026.

In 2025 Schulman received the Mark Bingham Award for Excellence in Achievement by Young Alumni from UC Berkeley, recognizing his contributions to reinforcement-learning research and the foundation of OpenAI.

Affiliations

  • OpenAI: Co-founder and senior research scientist on reinforcement learning, December 2015 to August 2024.
  • Anthropic: Senior research leader, August 2024 to February 2025.
  • Thinking Machines Lab: Chief Scientist, February 2025 to present.

Notable contributions

Schulman's body of public work centers on policy-gradient reinforcement-learning methodology, the OpenAI co-founding cohort, and the post-training research program that produced InstructGPT and ChatGPT.

  • Trust Region Policy Optimization (February 2015). Lead author with Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. The paper introduced a constrained policy-update procedure that became a foundational approach in deep reinforcement learning.
  • Proximal Policy Optimization (July 2017). Lead author with Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov at OpenAI. PPO simplified the TRPO procedure into a clipped-objective method that is the default policy-gradient algorithm in most modern reinforcement-learning libraries including OpenAI Spinning Up, Stable Baselines3, and CleanRL.
  • OpenAI co-founding (December 2015). One of eleven publicly named co-founders of the lab incorporated as a non-profit research organization in San Francisco. Schulman led the reinforcement-learning team through the company's first nine years.
  • InstructGPT (March 2022). Co-author of the OpenAI paper that introduced the RLHF methodology underlying ChatGPT. Schulman is one of twenty co-authors behind lead author Long Ouyang.
  • ChatGPT (November 2022). Senior research leadership on the post-training program that produced the consumer-facing model. Industry coverage including the Berkeley News alumni feature in April 2023 referred to him as the "architect" of ChatGPT in connection with this work.
  • OpenAI RLHF leadership (2017 to 2024). Continuous research leadership on reinforcement-learning-from-human-feedback methodology across the period that produced InstructGPT, ChatGPT, and successor models.
  • Thinking Machines Lab Chief Scientist (February 2025 to present). Senior research leadership at the AI research and product company founded by Mira Murati in 2024.
  • Mark Bingham Award for Excellence in Achievement by Young Alumni (2025). UC Berkeley alumni recognition for early-career achievement.
  • Public-talk record. Deep Reinforcement Learning lecture from the OpenAI period; Reinforcement Learning from Human Feedback: Progress and Challenges talk; Dwarkesh Patel interview on reasoning, RLHF, and AGI timelines from 2024.

Investments and boards

No public personal investor activity on record in AI, semiconductors, datacenters, software, or energy as of May 2026. Schulman's footprint in this section is concentrated in his research and operating roles at OpenAI, Anthropic, and Thinking Machines Lab rather than a parallel investing program.

Network

Schulman's foundational professional relationship is with his Berkeley PhD advisor Pieter Abbeel, with whom he is a co-author on the TRPO paper and on subsequent reinforcement-learning research. Sergey Levine, also at Berkeley and a TRPO co-author, was part of the same Abbeel-led research environment.

His OpenAI co-founding cohort, with whom he worked from December 2015 through August 2024, includes Sam Altman, the chief executive and co-chair; Ilya Sutskever, the chief scientist and fellow co-founder, who founded Safe Superintelligence in June 2024; Andrej Karpathy, founding research scientist; Greg Brockman, president and fellow co-founder; and Wojciech Zaremba, fellow co-founder and reinforcement-learning researcher. Mira Murati, Chief Technology Officer of OpenAI from 2022 to 2024, is now his current employer at Thinking Machines Lab, where the founding cohort also includes Barret Zoph, Lilian Weng, and Bob McGrew, all former senior OpenAI staff.

The brief Anthropic period from August 2024 through February 2025 placed Schulman alongside Dario Amodei, Daniela Amodei, Tom Brown, Sam McCandlish, Jared Kaplan, Jack Clark, and Chris Olah, the seven public Anthropic co-founders, all of whom were senior OpenAI researchers in the period before Anthropic's 2021 founding. Among broader frontier-research peers, his InstructGPT and OpenAI alignment-research network includes Paul Christiano, Jan Leike, and Ryan Lowe, all co-authors on the InstructGPT paper.

Position in the field

As of May 2026, Schulman occupies a structurally distinctive position among reinforcement-learning researchers in industry. Lead authorship of TRPO and PPO places him among the small group of researchers whose named contributions are core methodology used in nearly every modern reinforcement-learning library and many large-scale post-training programs across frontier AI labs. Industry coverage frequently characterizes PPO as the default policy-gradient algorithm in deep reinforcement-learning practice.

The OpenAI-Anthropic-Thinking Machines pattern is unusual among senior frontier researchers. Schulman is one of a small group of OpenAI co-founders who departed in the 2024 senior-departure cohort that also included Mira Murati, Ilya Sutskever, and Bob McGrew. He is the only one in that group to have spent an interim period at a different frontier lab before joining or founding a new company; the brief Anthropic tenure of approximately six months between OpenAI and Thinking Machines Lab is the shortest documented frontier-lab employment in his record.

His public profile concentrates on research-focused podcasts, conference talks, and academic lectures rather than mainstream policy commentary or media interviews. The 2024 Dwarkesh Patel interview on reasoning, RLHF, and AGI timelines was widely circulated in the AI engineering community, and the Berkeley News alumni feature in April 2023 is the most-cited general-audience profile of his work.

Outlook

Open questions over the next 6 to 18 months:

  • First Thinking Machines model release. Schulman has publicly stated 2026 as the release year for an in-house model. The capability profile, technical direction, and reinforcement-learning methodology of the first model are central to his public record at the new lab.
  • Tinker evolution. The trajectory of the Tinker fine-tuning platform released in October 2025 and any role Schulman's research takes in shaping its post-training-as-a-service positioning.
  • RLHF research direction. Whether Schulman publishes follow-on work to PPO, InstructGPT, and the OpenAI RLHF program that materially advances the methodology beyond the current frontier-lab default.
  • Public commentary. Frequency and substance of his public-talk and podcast appearances on reinforcement learning, alignment, and frontier-lab strategy as Thinking Machines moves from a pre-product phase into model releases.
  • Senior-talent recruitment. Continued movement of reinforcement-learning researchers and post-training specialists from OpenAI, Anthropic, and Google DeepMind into Thinking Machines Lab.
  • Long-term lab tenure. Whether the February 2025 Thinking Machines Lab move proves more durable than the August 2024 Anthropic move, given the unusually short Anthropic tenure.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.