DALL-E 3

DALL-E 3 is OpenAI's third-generation text-to-image model, released in October 2023 and notable for its strong prompt fidelity, improved text rendering inside images, and native integration with ChatGPT.
DALL-E 3

DALL-E 3

DALL-E 3 is OpenAI's third-generation text-to-image generation model, released in October 2023 and integrated into ChatGPT for broad consumer distribution beginning in November 2023. The model generates high-resolution images from natural-language prompts, with a particular focus on faithfully interpreting complex or nuanced prompt instructions and rendering legible text within generated images. As of April 2026, DALL-E 3 remains the image-generation model with the largest consumer install base by virtue of its ChatGPT distribution, though it trails newer image-generation systems from Black Forest Labs, Google DeepMind, and Midjourney on most quality dimensions.

At a glance

  • Lab: OpenAI
  • Released: October 2023 (integrated into ChatGPT November 2023)
  • Modality: Image (text-to-image generation)
  • Open weights: No (closed)
  • Output resolution: Up to 1024x1024 (square), 1792x1024 (landscape), and 1024x1792 (portrait) via the API; HD quality option available for sharper detail
  • Pricing: OpenAI API per-image pricing at https://openai.com/api/pricing/ (standard and HD tiers); bundled within ChatGPT Plus ($20/month), Pro ($200/month), Business, and Enterprise tiers; free tier via Bing Image Creator (limited daily credits)
  • Distribution channels: ChatGPT (web and mobile), Microsoft Bing Image Creator (https://www.bing.com/images/create), Microsoft Copilot, OpenAI API (https://platform.openai.com)

Origins

The DALL-E name traces back to January 2021, when OpenAI released the original DALL-E, a 12-billion-parameter transformer model trained jointly on image-text pairs. The name combined Salvador Dali and the Pixar character WALL-E. DALL-E demonstrated that a large model could generate plausible images from text descriptions, including novel object combinations not present in any training image.

DALL-E 2 followed in April 2022, switching from a joint image-text transformer to a diffusion-model architecture guided by CLIP image-text embeddings, producing sharper images at higher resolutions. DALL-E 2 was released with a standalone web product and a separate prompt-engineering workflow: users typed prompts into a dedicated interface and iterated in a specialized tool.

DALL-E 3 changed the distribution model. OpenAI partnered with Microsoft and integrated the new model into ChatGPT and Bing Image Creator from the start, making ChatGPT the primary user interface for image generation. The consequence was structural: rather than maintaining a separate prompt-engineering workflow, users could describe what they wanted in ordinary conversational language and the chat model would translate the description into an optimized image prompt before passing it to DALL-E 3. This meant that the difficulty of prompt engineering was abstracted away from end users, broadening access to image generation to non-specialist audiences.

The technical motivation for the ChatGPT integration drew on work by Aditya Ramesh and collaborators, who trained DALL-E 3 on more detailed image captions generated by a vision-language model rather than the short web-scraped captions used previously. The result was a model substantially better at following detailed or multi-part text descriptions.

OpenAI worked with Microsoft on parallel distribution through Bing Image Creator, which launched in March 2023 on an earlier DALL-E model and was upgraded to DALL-E 3 in October 2023. Microsoft Copilot received DALL-E 3 integration as well.

Capabilities

DALL-E 3's most widely noted capability at launch was prompt fidelity: the model's ability to produce images that closely match complex, specific, or multi-element text prompts. Earlier image-generation systems, including Midjourney v5 and Stable Diffusion at that time, were often better at generating aesthetically striking outputs for short stylistic prompts but could struggle to faithfully represent all elements of a detailed description. DALL-E 3's improved caption training led to consistently better adherence to longer and more precise prompts.

Text rendering inside images was a meaningful differentiator at launch. Prior image-generation models frequently produced illegible or garbled text within generated scenes, such as signs, labels, or titles included in the image. DALL-E 3 improved text rendering substantially, making it possible to reliably generate images containing readable words and short phrases. This capability had direct practical applications for marketing materials, social content, and slide graphics.

Stylistic range is broad. DALL-E 3 handles photorealistic rendering, flat illustration, painterly styles, cartoon aesthetics, and abstract compositions. The model does not produce a single recognizable house style of the kind Midjourney v5 and v6 were known for; outputs tend to look more neutral or versatile depending on prompt phrasing.

Post-launch, OpenAI extended DALL-E 3's capabilities within ChatGPT to include image editing. Inpainting, which allows users to specify a region of an existing image for targeted modification while preserving the rest, became available within the ChatGPT interface. Users can describe a specific change to an image and have the model apply it to the designated area. Outpainting, which extends an existing image beyond its borders by generating new surrounding content consistent with the original, has also been available in some form within the ChatGPT product surface.

DALL-E 3 incorporates content safeguards that decline photorealistic depictions of real named individuals, graphic violence, adult content, and images usable for political misinformation. These restrictions are more conservative than those of some open-weights alternatives.

Benchmarks and standing

Image-generation benchmarking is substantially less standardized than text-model benchmarking. Evaluations typically take the form of human-preference side-by-side comparisons, FID (Frechet Inception Distance) scores measuring distributional similarity to training data, and specialized capability tests such as prompt-fidelity scoring and text-within-image legibility. There is no single widely-adopted composite leaderboard equivalent to the Artificial Analysis Intelligence Index.

The Hugging Face Text-to-Image Leaderboard (https://huggingface.co/spaces/google/image-gen-eval) evaluates models on prompt adherence and image quality across standardized prompt sets. As of 2024 and into 2025, newer models from Black Forest Labs (FLUX.1 Pro and FLUX.2) and Midjourney (V6 and V7) consistently ranked above DALL-E 3 on composite quality scores. DALL-E 3 remained competitive on prompt-fidelity tasks for complex multi-element prompts, where its chat-driven prompt translation gave it a practical advantage in real-world consumer settings.

LMArena operates an image arena alongside its text leaderboards. DALL-E 3's standing, as of data available prior to April 2026, placed it behind FLUX.1 Pro and Midjourney V7 on aesthetic quality comparisons, though ahead of Stable Diffusion 3.5 on several task categories. Coverage of the image arena is sparser than the text arena.

On text rendering within images, DALL-E 3 retains a competitive position as a result of its training focus on this capability, though FLUX.2 and Ideogram have narrowed the gap since 2023. Benchmark leadership in image generation is more provisional than in text modeling: methodologies are not standardized and results vary with prompt selection and evaluator population.

Access and pricing

DALL-E 3 is available through several distribution channels.

ChatGPT is the primary consumer surface. Free-tier ChatGPT users have access to image generation with usage limits. ChatGPT Plus subscribers ($20/month) receive more image-generation capacity. Pro subscribers ($200/month) receive higher generation limits and, in many cases, priority access. Business and Enterprise tiers include image generation as part of the full ChatGPT product surface, with administrative controls and compliance features for organizational use. Within ChatGPT, image generation is conversational: users describe what they want, the model proposes a prompt interpretation, and images are returned in the chat thread.

Microsoft Bing Image Creator (https://www.bing.com/images/create) provides free access to DALL-E 3 with daily credit limits that reset. Bing Image Creator requires a Microsoft account but is otherwise free to use, making it the widest-access distribution path for DALL-E 3 outside of the ChatGPT free tier. Microsoft Copilot (https://copilot.microsoft.com) also incorporates DALL-E 3 image generation alongside text assistance, and DALL-E 3 is available through Microsoft 365 Copilot for enterprise customers.

The OpenAI API at https://platform.openai.com provides programmatic access to DALL-E 3 through the images endpoint. Per-image pricing is published at https://openai.com/api/pricing/ and varies by resolution and quality setting (standard versus HD). The standard 1024x1024 tier is priced lower than the HD variant and the larger-format outputs. Developers building image generation into products can call the API directly and apply their own system prompts and content filtering.

Comparison

The peer set for DALL-E 3 in April 2026 is the leading text-to-image generation systems:

  • Midjourney V7. The dominant prosumer image-generation system as of 2025, with a distinct aesthetic quality and a subscriber base that generates approximately $500 million in annual revenue. Midjourney V7 generally leads on raw aesthetic quality in human-preference evaluations, particularly for stylized and artistic outputs. DALL-E 3's advantage is ChatGPT's conversational interface and the breadth of distribution through Bing Image Creator and Copilot. Midjourney distributes through Discord and its own web product.
  • FLUX.2 (Black Forest Labs). The leading open-weights-derived image-generation system, developed by the original Stable Diffusion team. FLUX.2 is available through the Black Forest Labs API and through partner integrations including Adobe Firefly. On image quality evaluations, FLUX.2 currently ranks among the top systems on most benchmarks. DALL-E 3 retains an advantage in accessibility via ChatGPT; FLUX.2 requires API access or a partner integration.
  • Imagen 4 (Google DeepMind). Google's frontier image-generation model, available through Google's products including Gemini and Google Workspace. Imagen 4 competes on photorealism and prompt fidelity. As of early 2026, Google DeepMind has continued to advance the Imagen line, which is not independently profiled here yet.
  • Stable Diffusion 3.5 (Stability AI). Stability AI's current open-weights flagship. Available on Hugging Face and through the Stability AI API. SD3.5 is competitive on prompt fidelity for simple to mid-complexity prompts and remains widely used in the open-source ecosystem. It generally trails FLUX.2 and Midjourney V7 on composite quality scores.

DALL-E 3's distinctive position across this peer set is its distribution moat: no other image-generation model has the reach of ChatGPT's user base. The model continues to ship a large volume of generated images daily even as its relative quality leadership has shifted to other systems. For users whose primary interface is ChatGPT, DALL-E 3 is the default image-generation system regardless of how it ranks in pairwise comparisons with models they would need to access separately.

Outlook

Several open questions shape DALL-E 3's trajectory in 2026 and beyond:

  • DALL-E 4 timing and form. OpenAI has not publicly committed to a DALL-E 4 release. The question of whether OpenAI pursues a standalone image-generation model upgrade or moves toward a unified multimodal generation capability that subsumes image generation is unresolved. Sora, OpenAI's video generation model, can also generate still images, and the boundary between DALL-E and Sora's image-generation capabilities may blur in subsequent generations.
  • Integration with Sora. Sora (OpenAI's video model) and DALL-E 3 coexist within ChatGPT, but their relationship at the architectural level is not publicly described. Whether OpenAI consolidates image and video generation into a single system or maintains separate specialized models will affect the DALL-E product line's future.
  • Closing the quality gap. As of April 2026, Midjourney V7 and FLUX.2 lead DALL-E 3 on most aesthetic-quality dimensions. Whether OpenAI prioritizes closing this gap with a new image-generation release or focuses image-generation improvement through multimodal model updates (such as those integrated into GPT-5.5's vision stack) is unclear.
  • Regulatory and provenance pressures. Synthetic image generation faces increasing regulatory attention around watermarking, provenance disclosure, and training data rights. OpenAI has implemented the C2PA (Coalition for Content Provenance and Authenticity) standard in DALL-E 3 outputs distributed through certain channels, and further regulatory requirements in the EU AI Act and comparable frameworks may impose additional provenance obligations on all commercial image-generation systems.
  • Bing Image Creator as a standalone channel. Microsoft's free DALL-E 3 integration via Bing Image Creator is a significant distribution agreement. The terms and duration of this arrangement, and whether Microsoft moves to internally developed image-generation capability under its own model programs, will affect DALL-E 3's reach.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.