TML-Interaction-Small

TML-Interaction-Small is a real-time multimodal interaction model released by Thinking Machines Lab in May 2026 as the first entry in the company's Interaction Models family. The model handles audio, video, and text input and output concurrently rather than sequentially, with a 200-millisecond micro-turn chunking architecture and a reported 0.40-second turn-taking latency on the company's internally developed FD-bench V1 evaluation. It is positioned against OpenAI's GPT Realtime line (GPT-Realtime-2), Google DeepMind's Gemini Live, and Alibaba Qwen's Qwen 3.5 Omni-plus-realtime as the principal competitive set in the real-time interaction category as of May 2026.

At a glance

Lab: Thinking Machines Lab.
Released: May 2026 as a limited research preview, with wider availability planned later in 2026.
Modality: Concurrent audio, video, and text, with native speech interruption, interjection, and simultaneous-speech handling.
Open weights: None. Closed weights with research-preview access.
Architecture: Mixture-of-experts. Approximately 276 billion total parameters with 12 billion active parameters per token, time-aligned micro-turn structure, encoder-free early-fusion across modalities, flow-head audio decoder, and a separate asynchronous background reasoning model for tool calls and long-context grounding.
Pricing: Not disclosed at preview.
Distribution channels: Research preview only at launch. No public API, no demo URL, no consumer surface. Access via direct outreach to Thinking Machines.

Origins

TML-Interaction-Small is the first in-house model release from Thinking Machines Lab, the San Francisco research company founded in 2024 by Mira Murati, the former chief technology officer of OpenAI. Thinking Machines launched publicly in February 2025 and released its first product, the Tinker fine-tuning platform, in October 2025. Across late 2025 and early 2026 the company's chief scientist John Schulman had publicly stated that in-house models would arrive in 2026; the Interaction Models announcement on the company's blog in May 2026 is the realization of that commitment.

The strategic framing in the announcement post organises the model around what Thinking Machines describes as a "collaboration bottleneck" in current AI products. Existing voice and chat interfaces, the post argues, force users into a stop-and-wait turn-taking loop that breaks the flow of natural human collaboration, where two humans regularly speak over each other, interject, pause for thought, and overlap their channels of communication. The Interaction Models family is positioned as the structural response: an architecture where audio, video, and text channels run concurrently at human-conversation latency rather than being serialised behind a single text or speech turn.

The release lands into a maturing real-time interaction segment. OpenAI shipped the original GPT Realtime API in October 2024 and the production-grade GPT-Realtime-2 in May 2026. Google DeepMind's Gemini Live entered general availability across 2025 and progressed through the Gemini 3.0 and 3.1 model generations. Alibaba Qwen's Qwen 2.5 Omni and Qwen 3.5 Omni-plus-realtime defined the open-weights end of the category. TML-Interaction-Small enters this competitive set as the most-anticipated Insurgent-lab entry, with the additional weight of being Thinking Machines's first publicly disclosed model.

Capabilities

The model's capability profile is built around four characteristics that the company's announcement post identifies as the differentiators against turn-based realtime systems.

Concurrent multimodal streams. Audio, video, and text run as parallel input and output channels rather than a single multiplexed stream. The audio path uses dMel signals at conversational quality. The video path tokenises frames as 40 by 40 patches. The text path operates as a standard transformer token stream. The three are early-fused inside the model rather than being routed through separate per-modality encoders.

Time-aligned micro-turn chunking. The model processes interaction in 200-millisecond chunks, with turn-taking decisions made at the chunk boundary rather than at the end of a complete utterance. This is the architectural mechanism that produces the reported 0.40-second turn-taking latency on FD-bench V1, materially below the 0.8-to-1.5-second range that turn-based realtime systems typically report.

Native interruption, interjection, and simultaneous speech. The model is designed to handle the cases that break turn-based architectures: a user interrupting mid-sentence, a model interjecting with a clarification, the brief overlap when two parties end a turn at almost the same moment. The encoder-free early-fusion design allows the model to maintain state across the interruption point rather than discarding context and restarting.

Tool calls, web search, and generative UI mid-conversation. The Interaction Models architecture includes a separate asynchronous background reasoning model that handles long-horizon decisions (tool calls, web search, generative-UI composition) while the foreground interaction model maintains the real-time conversational loop. This split is the structural answer to a long-standing trade-off in realtime agents: real-time latency and deep reasoning are typically in tension, and previous realtime systems forced users to wait while the model thought. The split-stack design lets background reasoning proceed concurrently with foreground conversation.

The published technical details extend to a number of systems-engineering elements: streaming sessions are designed to avoid frequent memory reallocations during long conversations, batch-invariant kernels keep trainer and inference sampler outputs aligned, NVLS communication kernels reduce inter-device latency on the inference cluster, and a custom attention implementation with consistent Split-KV accumulation maintains determinism across batched and unbatched paths.

Benchmarks and standing

The Interaction Models announcement introduced FD-bench, an internally developed evaluation suite for real-time interaction quality. On FD-bench V1.5 the company reported TML-Interaction-Small at a score of 77.8, against published comparison settings for OpenAI GPT-4 Realtime, Google DeepMind Gemini 3.1 Flash Live, and Alibaba Qwen 3.5 Omni-plus-realtime in the 45.5 to 54.3 range.

The benchmark is internally developed by Thinking Machines and external reproductions had not been published as of the model's release. The structural caveat that applies to any first-party benchmark applies here: the gap to peer systems will need to be confirmed by independent evaluations (Artificial Analysis, academic researchers, or direct comparison work by the OpenAI realtime team, the Google DeepMind Gemini Live team, and the Alibaba Qwen Omni team) before the headline number can be treated as a category-defining lead. The breadth of the gap (approximately 23 to 32 points on a 100-point scale) is large enough that some part of it is plausibly genuine even after independent-evaluation correction, but the specific magnitude remains a first-party claim.

On the standard text-and-multimodal capability benchmarks used in the broader Atlas (Artificial Analysis Intelligence Index, LMArena, GPQA Diamond, SWE-bench Verified, ARC-AGI Challenge, AIME 2025, HumanEval+), no published TML-Interaction-Small results are available as of the release. The model is positioned for an interaction-quality dimension that those text-and-multimodal leaderboards do not directly measure, and Thinking Machines has not signalled that the model targets leadership on the standard reasoning or coding benchmarks.

Access and pricing

At the May 2026 release, TML-Interaction-Small was available only as a limited research preview. The company's announcement post directed interested users to the Thinking Machines hiring page (a signal that early access is concentrated among potential employees and research collaborators) and to a feedback email address. No public API, no signup form for developer access, no demo URL, and no pricing tier was disclosed.

The company indicated a wider release planned for later in 2026. The distribution mechanism for the wider release (API, embedded SDK, consumer product, partner integration through Google or NVIDIA) was not specified. The strategic-investor relationships Thinking Machines has built across NVIDIA's Vera Rubin compute partnership and the April 2026 Google deal both produce plausible distribution paths; whether the company runs distribution itself, partners on it, or splits across both is an unresolved question for the second half of 2026.

A research grant program for community benchmarking contributions was mentioned in the announcement, but the program structure, eligibility, and grant size were not specified.

Comparison

GPT-Realtime-2 (OpenAI). The principal commercial competitor on the same real-time speech-to-speech axis. GPT-Realtime-2 reached general availability in May 2026 with a published API, structured pricing, and integration into the OpenAI Realtime API surface. GPT-Realtime-2 is in production deployment; TML-Interaction-Small is in research preview. The capability comparison is currently a one-sided first-party claim; the customer-deployment-readiness comparison favours OpenAI substantially.
Gemini Live (Google DeepMind). The principal real-time interaction product from a Frontier lab, with integration across the Google product surface and the Gemini consumer app. Gemini 3.1 Flash Live is the variant used as the comparison setting in the Thinking Machines FD-bench V1.5 numbers. Gemini Live's structural advantage is the device-and-product integration (Android, Chrome, Workspace); TML-Interaction-Small's structural advantage as positioned is the concurrent-streams architecture.
Qwen 3.5 Omni-plus-realtime (Alibaba Qwen). The principal open-weights peer. Qwen Omni's strategic positioning is around developer access and self-hosting; TML-Interaction-Small's closed-weights positioning produces the opposite trade-off, with the company controlling the inference surface and the developer customisation pathway.
Sesame CSM (Sesame). A smaller-scale open-weights voice model focused on conversational warmth and naturalness rather than concurrent multimodality. Sesame and TML-Interaction-Small address overlapping use cases (companion voice agents) from very different architectural positions.

The competitive question for TML-Interaction-Small's wider 2026 release is which of these peers has shipped a comparable interaction-architecture upgrade by then. The realtime category has moved quickly through 2025 and 2026; the static-benchmark gap reported in the announcement is unlikely to remain static through year-end as the peer labs respond.

Outlook

Open questions for the next 6 to 18 months:

Successor variants in the Interaction Models family. The "Small" naming in TML-Interaction-Small implies a Medium or Large variant on the roadmap. The parameter-count scaling, the corresponding latency-versus-quality trade-offs, and the rollout sequencing are unannounced.
Wider release distribution channel. Whether the broader 2026 release ships as a standalone API, as an SDK embedded in partner products, as a consumer surface from Thinking Machines directly, or as integration into the Google product ecosystem will shape the model's competitive position.
External FD-bench reproductions. Independent reproductions of the FD-bench V1.5 results by Artificial Analysis, academic researchers, or peer-lab evaluation teams will indicate whether the reported gap holds against the most current GPT Realtime and Gemini Live releases at the time of independent evaluation.
Tool-use and agent capability. The asynchronous background reasoning model is the architectural element that enables tool calls and generative UI during conversation. The capability ceiling on what tools the background reasoner can call, and the latency-quality trade-off it introduces, are not fully described in the announcement.
Open-weights stance. Thinking Machines has not released open weights to date. The Interaction Models family is positioned as closed-weights research-preview. Whether any variant in the family follows the open-weights path that the company's founding-team alumni took at Mistral or DeepSeek is an unresolved strategic question.
Application-integration pilots. The realtime-interaction category's commercial value materialises at the application-layer integrations (customer-service voice agents, conversational education products, accessibility tools, in-vehicle assistants). The first major application-layer deployment of TML-Interaction-Small will be the first concrete commercial signal for the Interaction Models thesis.

Sources

Thinking Machines Lab: Interaction Models. Primary announcement of the Interaction Models family and TML-Interaction-Small, including the FD-bench V1.5 results and the architectural details.
VentureBeat: Thinking Machines shows off preview of near-realtime AI voice and video conversation with new interaction models. Industry coverage of the May 2026 preview.
Companion profile: Thinking Machines Lab for the broader company context including the founding-team alumni from OpenAI and Anthropic, the Tinker fine-tuning platform, and the strategic partnerships with NVIDIA and Google.
Companion model: GPT-Realtime-2 for the principal commercial competitor in the real-time speech-to-speech category.

TML-Interaction-Small

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

TML-Interaction-Small

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

QwQ-32B

Qwen3 Coder 480B-A35B

MiniMax M2

Kimi K2.5

Qwen 3.6

AI Research Lab Intelligence