Tom McGrath

Tom McGrath is the co-founder and chief scientist of Goodfire, the San Francisco mechanistic-interpretability company; previously a senior research scientist at Google DeepMind, where he founded the interpretability team.
Tom McGrath

Tom McGrath

Tom McGrath is a British machine-learning researcher and the co-founder and chief scientist of Goodfire, the San Francisco mechanistic-interpretability company he established in June 2024 with Eric Ho and Daniel Balsam. The company develops Ember, a hosted interpretability platform that decodes the internal computations of large neural networks and exposes them as programmable features. As of May 2026, McGrath leads the research direction at Goodfire after a $150 million Series B in February 2026 at a $1.25 billion valuation, following a five-year tenure at Google DeepMind where he founded and led the mechanistic-interpretability team.

At a glance

Origins

McGrath is British and trained as a physicist before transitioning into machine-learning research. He completed his PhD at Imperial College London under Nick Jones, a professor of mathematical physics whose group covers complex systems, biological information processing, and statistical mechanics. McGrath's doctoral research sat at the intersection of statistical physics and information theory, including the 2017 Physical Review Letters paper "Biochemical Machines for the Interconversion of Mutual Information and Work" with Jones, Pieter Rein ten Wolde, and Thomas E. Ouldridge.

The PhD-era physics-of-computation work provided a methodological grounding that McGrath later carried into neural-network interpretability research, where trained networks are treated as physical systems whose internal mechanisms can be reverse-engineered through controlled experiments.

Career

McGrath joined Google DeepMind as a research scientist in 2019, shortly after completing his PhD at Imperial College London. The DeepMind tenure ran through late 2023, with his work centered on interpretability for reinforcement-learning agents and, later, large language models. The longest-running DeepMind project was the analysis of AlphaZero, the self-play reinforcement-learning system trained to master chess, shogi, and Go without human gameplay data. McGrath was lead author of the resulting PNAS paper in 2022, with co-authors including Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, and former chess world champion Vladimir Kramnik. The paper used linear probes on AlphaZero's internal state to identify when and where chess concepts emerged in the network, making it one of the first published interpretability studies of a self-trained reinforcement-learning agent.

Within DeepMind, McGrath founded the mechanistic-interpretability team, the institutional anchor for the lab's interpretability research output. Goodfire's official press materials, the Lightspeed Venture Partners profile, and the Contrary Research dossier each describe him as the founder of the team. The team's published output included the lead-author paper "The Hydra Effect: Emergent Self-repair in Language Model Computations" in July 2023 with Matthew Rahtz, János Kramár, Vladimir Mikulik, and Shane Legg, which identified a phenomenon by which language-model layers compensate for ablations elsewhere in the computation, and the co-authored "Tracr: Compiled Transformers as a Laboratory for Interpretability" in January 2023, a compiler from RASP programs into transformer weights used as ground-truth for evaluating interpretability methods.

McGrath left DeepMind in early 2024 and moved from London to San Francisco to join South Park Commons, the founder community where he committed to making interpretability "useful" by starting a company. He connected with Eric Ho and Daniel Balsam after circulating a document that aligned with their thesis on commercial interpretability tooling. The three founded Goodfire in June 2024 with Ho as chief executive officer, Balsam as chief technology officer, and McGrath as chief scientist. The founding team paired Ho's and Balsam's operating and engineering experience from RippleMatch with McGrath's research credentials from DeepMind.

In January 2024, during the transition out of DeepMind, McGrath published Safety as a Scientific Pursuit on his Banburismus Substack. The essay argued that the AI safety community had largely failed to convince empirically-minded engineers of the risks it identified, and that mechanistic interpretability could close the gap by producing rigorous evidence about neural-network internals.

Goodfire's fundraising through May 2026 included a $7 million seed in August 2024 led by Lightspeed Venture Partners, a $50 million Series A in April 2025 led by Menlo Ventures with Anthropic as a strategic participant, and a $150 million Series B in February 2026 at a $1.25 billion valuation led by B Capital. Cumulative funding through the Series B is approximately $209 million.

Affiliations

  • Google DeepMind: Senior research scientist and founder of the mechanistic-interpretability team, 2019 to early 2024.
  • South Park Commons: Member, March 2024 to June 2024.
  • Goodfire: Co-founder and Chief Scientist, June 2024 to present.

Notable contributions

McGrath's published output is concentrated on mechanistic interpretability, with a chronology that runs from reinforcement-learning agents at DeepMind through transformer language models and on into the Goodfire commercial research direction. Many of the most-cited papers below are co-authored, with McGrath in either the lead-author or senior-author position depending on the project.

  • Biochemical Machines for the Interconversion of Mutual Information and Work (Physical Review Letters, January 2017). Co-authored statistical-physics paper with Nick Jones, Pieter Rein ten Wolde, and Thomas E. Ouldridge on the thermodynamic conversion between information and free energy in biochemical systems. Representative of the PhD-era physics-of-computation work.
  • Acquisition of Chess Knowledge in AlphaZero (PNAS, November 2022). Lead-author paper with Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, and Vladimir Kramnik analyzing how chess concepts emerge inside AlphaZero's neural network during self-play training. The most-cited paper in McGrath's record.
  • "Tracr: Compiled Transformers as a Laboratory for Interpretability" (NeurIPS 2023, posted January 2023). Co-authored paper with David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, and Vladimir Mikulik introducing a compiler from RASP programs into transformer weights, used as a ground-truth testbed for interpretability methods. The open-source implementation has been adopted as a standard tool by the broader mechanistic-interpretability research community.
  • "The Hydra Effect: Emergent Self-repair in Language Model Computations" (July 2023). Lead-author paper with Matthew Rahtz, János Kramár, Vladimir Mikulik, and Shane Legg identifying a self-repair phenomenon in transformer computations, where ablating one attention layer prompts compensation in another. Cited as a structural finding for circuit-level analysis in language models.
  • Safety as a Scientific Pursuit (Banburismus Substack, January 2024). Long-form essay arguing for empirical mechanistic interpretability as the path through which AI safety can produce rigorous evidence rather than abstract arguments, with implications for the design of open-source releases and full-data access for safety research.
  • Goodfire co-founding (June 2024). Established the mechanistic-interpretability company with Eric Ho as chief executive and Daniel Balsam as chief technology officer; carried the academic-research credentials and the team's interpretability methodology into the commercial founding team.
  • "Understanding and Steering Llama 3 with Sparse Autoencoders" (Goodfire Research, December 2024). Co-authored research preview with Daniel Balsam, Mengyu Deng, and Eric Ho applying sparse-autoencoder feature decomposition to Llama 3.3 70B and Llama 3.1 8B, the technical foundation for the Ember platform launched in April 2025.
  • Public-talk record. Recurring appearances on the Nathan Labenz Cognitive Revolution podcast with Daniel Balsam, including the August 2024 launch episode "Popular Mechanistic Interpretability," the May 2025 episode "Philosophy, Practice & Progress," and the March 2026 post-Series B episode "Don't Fight Backprop." Conference keynote appearances are limited relative to peer figures at frontier laboratories.

Investments and boards

  • Goodfire (AI): Co-founder and Chief Scientist, June 2024 to present. Privately held mechanistic-interpretability company; approximately $209 million in cumulative funding through three rounds, with the $150 million Series B in February 2026 at a $1.25 billion valuation.

No public personal angel-investor activity on record outside the Goodfire founder role in AI, semiconductors, datacenters, software, or energy as of May 2026.

Network

McGrath's longest-running professional relationships sit inside the Google DeepMind interpretability and language-model research network. The Hydra Effect co-authors include Matthew Rahtz, János Kramár, and Vladimir Mikulik, all DeepMind researchers in the period when he ran the interpretability team, and Shane Legg, the DeepMind chief AGI scientist and co-founder. The AlphaZero work brought him into a co-authoring relationship with Demis Hassabis, the DeepMind chief executive, alongside Been Kim and Ulrich Paquet on the Google research staff and Vladimir Kramnik as the chess-domain consultant. PhD-era collaborators include Nick Jones at Imperial College London and Pieter Rein ten Wolde and Thomas E. Ouldridge as co-authors on the Physical Review Letters paper.

The closest current professional relationship is with the two Goodfire co-founders, Eric Ho and Daniel Balsam, with whom McGrath has worked daily since the June 2024 founding. The Goodfire advisor relationship with Chris Olah of Anthropic places McGrath in regular contact with the researcher most associated with the mechanistic-interpretability program; the Series A investment from Anthropic, the company's first equity investment in another startup, deepened the institutional relationship beyond the advisor link. The broader interpretability community in which McGrath is a frequent reference point includes Neel Nanda, the independent researcher who overlapped with the DeepMind team during McGrath's tenure, and Lee Sharkey, the Apollo Research co-founder cited alongside McGrath in Fast Company coverage as a fellow pioneer in the field.

Position in the field

As of May 2026, McGrath is the chief scientist of the only commercial mechanistic-interpretability laboratory to have crossed the unicorn-valuation threshold. The structural distinction in his profile is the combination of the founder-of-the-team credential at DeepMind, the lead-author position on the AlphaZero paper, and the chief-scientist seat at the post-Series B Goodfire, placing him at the intersection of academic research and commercial product in the interpretability category.

The career trajectory is unusual relative to peers in the field. McGrath's path runs from a PhD at Imperial College London in statistical physics through industrial research at DeepMind to a co-founded commercial venture, rather than the more common route of post-doctoral academic appointments followed by lab-internal research staff positions. The Imperial-to-DeepMind-to-startup arc has fewer parallels than the Anthropic and OpenAI alumni networks that anchor most of the comparator companies in mechanistic interpretability.

Industry coverage including the Lightspeed Venture Partners profile and the Sequoia Capital Training Data podcast episode with Eric Ho characterizes Goodfire's founding-team structure as a deliberate combination of an operating chief executive with research credentials concentrated in McGrath as chief scientist and Chris Olah as advisor, rather than as a research-led company with the chief executive as the principal scientific voice.

Outlook

Open questions over the next 6 to 18 months:

  • Goodfire research output. Whether the company's interpretability papers continue at the cadence of the Llama 3 sparse-autoencoder preview and the Intentional Design research pillar announced in February 2026, and whether the published work shapes the field's methodology in the way the DeepMind interpretability output did during McGrath's tenure there.
  • Conference and academic profile. Whether McGrath's keynote and long-form-interview record expands beyond the Cognitive Revolution podcast appearances into NeurIPS, ICML, or ICLR keynote engagements, the standard public-credentialing track for senior interpretability researchers.
  • DeepMind interpretability team continuity. Whether the Google DeepMind mechanistic-interpretability team McGrath founded continues at the research output of the McGrath-era period under successor leadership, and whether DeepMind maintains the published-research output that competes with the Anthropic Transformer Circuits Thread.
  • Commercial-research alignment. Whether the chief-scientist role at Goodfire continues to support fundamental research output, or whether the commercial pull of the post-Series B scaling phase compresses the published-research cadence as the company's product-engineering scope expands.
  • Founding-team stability. Whether the Ho, Balsam, and McGrath team remains intact through the post-Series B scaling phase, given the 2025 to 2026 pattern of senior researchers leaving frontier laboratories to found independent ventures.
  • Bridge to Anthropic interpretability. Whether the structural relationship between Goodfire and the Chris Olah advisor role at Anthropic produces published research collaborations or remains a strategic-investor and advisory relationship.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.