OpenAIPremium

GPT Image 1.5

Name: GPT Image 1.5
Brand: OpenAI

OpenAI's flagship natively multimodal image model with industry-leading instruction following, precise region-aware editing, and best-in-class text rendering — now up to 4x faster than its predecessor.

10クレジットから

1536x1024 / 1024x1536

10–30 seconds

今すぐ試すクレジット料金

GPT Image 1.5でできること

Autoregressive Multimodal Architecture

Built on a unified transformer backbone that processes text and image tokens natively — not a diffusion model — enabling superior reasoning and instruction following.

Region-Aware Editing

Modify specific parts of an image while preserving faces, logos, lighting, and composition exactly as they are. Accepts up to 16 input images per request.

Advanced Text Rendering

Generates legible, correctly styled text at small point sizes with multi-line support up to 800 characters — ideal for posters, banners, and branded graphics.

サンプルギャラリー

About GPT Image 1.5

GPT Image 1.5 is OpenAI's flagship image generation model and the successor to GPT Image 1, released in December 2025. Unlike traditional diffusion-based models such as DALL-E 3 or Stable Diffusion, GPT Image 1 and 1.5 use a natively multimodal autoregressive architecture — the same transformer backbone processes both text and image tokens together. This means the model genuinely reasons over prompts rather than simply conditioning a diffusion process, which translates into dramatically better instruction adherence, spatial composition, and layout control. The 1.5 version brings generation speeds up to 4x faster than GPT Image 1, costs approximately 20% less per API call, and introduces region-aware editing that can surgically alter one element while keeping everything else pixel-perfect.

What Sets It Apart

Instruction following is where GPT Image 1.5 truly shines. The model can handle intricate, multi-step prompts — such as "create a 6×6 grid of specific icons and symbols" — and follow them accurately, a task where most competing models fail. Text rendering has been substantially improved over both GPT Image 1 and earlier generation models: the model supports dense, small-point-size text with correct font weight and style, making it suitable for newspaper layouts, poster typography, and UI screenshots. Facial and logo consistency across iterative edits is another standout: when you modify one element of an image, the model preserves lighting, composition, and likeness in the untouched areas — addressing the common "slot machine" problem where older models would regenerate everything with every edit.

GPT Image 1 vs. GPT Image 1.5

Feature	GPT Image 1	GPT Image 1.5
Architecture	Autoregressive multimodal	Autoregressive multimodal
Generation speed	~30–60 seconds	10–30 seconds (up to 4x faster)
API pricing	Baseline	~20% cheaper
Text rendering	Strong	Improved — denser, smaller text
Editing precision	Good	Region-aware, element-specific
Max input images	16	16
Output resolutions	1024x1024, 1024x1536, 1536x1024	1024x1024, 1024x1536, 1536x1024
Quality tiers	Low / Medium / High	Low / Medium / High
Transparent backgrounds	Yes (PNG)	Yes (PNG)
C2PA provenance metadata	Yes	Yes

Both variants available here — GPT Image 1.5 (text-to-image) and GPT Image 1.5 I2I (image-to-image) — are powered by the 1.5 model. Use text-to-image for new creations and I2I for editing or style-transferring an existing image.

Tips for Best Results

Be a specification writer, not a poet. Detailed, structured prompts outperform vague creative descriptions. Include lighting direction, color palette, compositional rules, and style references explicitly.
For text in images, spell out every word, specify font style (e.g., "bold serif"), size (e.g., "large headline"), and location (e.g., "centered at the top"). The model can render up to ~800 characters of legible text.
For editing, use the I2I variant and describe precisely which elements to change and which to preserve (e.g., "change the background to a sunset scene, keep the person's face and clothing identical"). The model accepts up to 16 reference images per request.
Choose quality tier wisely: Low quality at 1024x1024 costs around $0.011 per image and is suitable for rapid iteration; High quality at 1024x1536 costs up to $0.25 and is intended for final production assets.

技術仕様

最大解像度1536x1024 / 1024x1536

アスペクト比1:1 (1024x1024), 3:2 (1536x1024), 2:3 (1024x1536)

生成速度10–30 seconds

出力形式PNG / JPEG (transparent backgrounds supported)

Model Variants

GPT Image 1.5

text to image

GPT Image 1.5 I2I

image to image

クレジット料金

Variant	クレジット
GPT Image 1.5	10
GPT Image 1.5 I2I	10

1クレジット = $0.012

ユースケース

Brand & Marketing Assets

Generate consistent on-brand graphics, ad creatives, and product visuals at scale with accurate logo and color preservation across edits.

E-commerce Catalog Generation

Produce product image variants — different scenes, angles, and backgrounds — from a single source image without reshoots.

Graphic Design with Text

Create posters, banners, UI mockups, and infographics where readable, correctly styled text is embedded directly in the image.

類似モデル

Popular

image

Black Forest Labs

Flux 2

Black Forest Labs' production-grade image generation model family delivering 4MP photorealistic output, multi-reference consistency across up to 10 images, and reliable text rendering — all in sub-10-second generation speeds.

text-to-imageimage-to-imagephotorealistic

3クレジットから

Fast

image

Google

Nano Banana

Google's Gemini Flash-powered image generation and editing model that went viral for its speed, real-world knowledge, and AI-assisted editing capabilities.

text-to-imageimage-to-imagefast

2クレジットから

image

Google

Imagen 4

Google DeepMind's leading text-to-image model delivering up to 2K resolution, superior text rendering, and diverse art styles — engineered for professional creative work.

text-to-imagehigh-quality

2クレジットから

GPT Image 1.5で作成する準備はできましたか？

GPT Image 1.5で素晴らしいコンテンツの作成を始めましょう

GPT Image 1.5を今すぐ試す

サンプルギャラリー

About GPT Image 1.5

What Sets It Apart

GPT Image 1 vs. GPT Image 1.5

Feature	GPT Image 1	GPT Image 1.5
Architecture	Autoregressive multimodal	Autoregressive multimodal
Generation speed	~30–60 seconds	10–30 seconds (up to 4x faster)
API pricing	Baseline	~20% cheaper
Text rendering	Strong	Improved — denser, smaller text
Editing precision	Good	Region-aware, element-specific
Max input images	16	16
Output resolutions	1024x1024, 1024x1536, 1536x1024	1024x1024, 1024x1536, 1536x1024
Quality tiers	Low / Medium / High	Low / Medium / High
Transparent backgrounds	Yes (PNG)	Yes (PNG)
C2PA provenance metadata	Yes	Yes

Tips for Best Results

Be a specification writer, not a poet. Detailed, structured prompts outperform vague creative descriptions. Include lighting direction, color palette, compositional rules, and style references explicitly.

For text in images, spell out every word, specify font style (e.g., "bold serif"), size (e.g., "large headline"), and location (e.g., "centered at the top"). The model can render up to ~800 characters of legible text.

For editing, use the I2I variant and describe precisely which elements to change and which to preserve (e.g., "change the background to a sunset scene, keep the person's face and clothing identical"). The model accepts up to 16 reference images per request.

Choose quality tier wisely: Low quality at 1024x1024 costs around $0.011 per image and is suitable for rapid iteration; High quality at 1024x1536 costs up to $0.25 and is intended for final production assets.

Variant

クレジット

GPT Image 1.5

GPT Image 1.5 I2I