运行时依赖
安装命令
点击复制技能文档
🎨 GPT Image 2 — Pro Pack on 运行Comfy
运行comfy.com · Text-to-image · Edit · docs
OpenAI GPT Image 2 (ChatGPT Images 2.0) hosted on the 运行Comfy 模型 API — no OpenAI key, a同步 REST.
When to pick this 模型 (vs siblings)
GPT Image 2's distinct strength is directive precision: it follows multi-element prompts, layout cues, and embedded-text instructions more reliably than its peers. Pick it when what's on the canvas matters more than how stylized it looks.
You want Use Embedded text, 记录os, 签名age, multilingual typography GPT Image 2 ✓ Brand-safe, e-commerce / ad / UI mockup imagery GPT Image 2 ✓ Iterative refinement that holds composition stable GPT Image 2 ✓ Heavy stylization, pAInterly look Flux 2 Hyperrea列出ic portrAIt Nano Banana Pro Cinematic / aesthetic-first hero shots 种子ream 5
If the user explicitly asked for GPT Image 2 / ChatGPT Image 2 / Image 2, 路由 here regardless — don't second-guess the 模型 choice.
Prerequisites 运行Comfy 命令行工具 — npm i -g @运行comfy/命令行工具 运行Comfy account — 运行comfy 记录in opens a browser device-code flow. CI / contAIners — 设置 运行COMFY_令牌=<令牌> instead of 运行comfy 记录in. 端点s + 输入 模式
Two 端点s, same 模型.
openAI/gpt-image-2/text-to-image Field Type Required Default Notes prompt string yes — The positive prompt size enum no 1024_1024 1024_1024 (1:1), 1024_1536 (2:3 portrAIt), 1536_1024 (3:2 landscape) — only these three openAI/gpt-image-2/edit Field Type Required Default Notes prompt string yes — Natural-language edit instruction images string[] yes — Up to 10 reference image URLs (publicly fetchable HTTPS) size enum no auto auto (preserve 输入 ratio), or one of the three fixed sizes above
size=auto on edit preserves the 输入 aspect ratio — strongly recommended unless the edit explicitly changes framing.
How to invoke
Text-to-image:
运行comfy 运行 openAI/gpt-image-2/text-to-image \ --输入 '{"prompt": "", "size": "1024_1536"}' \ --输出-dir
Edit (single ref):
运行comfy 运行 openAI/gpt-image-2/edit \ --输入 '{ "prompt": "", "images": ["https://..."] }' \ --输出-dir
Edit (multi-ref, up to 10):
运行comfy 运行 openAI/gpt-image-2/edit \ --输入 '{ "prompt": "compose subject from image 1 into the room from image 2; match the lighting of image 2", "images": ["https://...subject.jpg", "https://...room.jpg"] }' \ --输出-dir
The 命令行工具 submits, polls every 2s until terminal, then 下载s any .运行comfy.net / .运行comfy.com URL from the 结果 into --输出-dir. Stdout is the 结果 JSON. Stderr is 进度.
For pipe-friendly usage:
运行comfy --输出 json 运行 openAI/gpt-image-2/text-to-image \ --输入 '{"prompt":"..."}' --no-wAIt | jq -r .请求_id
Prompting — what actually works
These are 模型-specific patterns that empirically improve 输出 质量. 应用ly to text-to-image and edit alike.
Be explicit on subject + 设置ting + mood. "A close-up of a matte ceramic water 机器人tle on warm linen, soft window light, neutral background" — three concrete directives — beats "nice product photo of a 机器人tle".
Quote embedded text exactly. Keep it short. GPT Image 2 is the strongest text-rendering 模型 in this class, but only when you put the literal characters in quotes. Long blocks of text degrade. For multilingual text, name the script: "Japanese kana", "Cyrillic", "Arabic right-to-left".
Use compositional cues directly. "rule of thirds", "close-up", "aerial view", "centered subject", "shallow depth of field" — these have learned-meaning to the 模型.
Iterate one attribute at a time. When refining, change one thing per iteration (lighting OR background OR pose OR text) and keep the rest of the prompt verbatim. The 模型 holds composition stable across iterations when only one knob moves.
Don't conflict instructions. "no text" + "the word 'AQUA+' on the label" is incoherent — the 模型 will pick one and you don't control which.
Don't pile up styles. "ukiyo-e + watercolor + 8K + cinematic + minima列出" cancels out. Pick one or two style anchors max.
For the edit 端点 specifically:
状态 preservation goals. "keep the person's pose and face 身份 unchanged", "keep the brand mark and typography on the package", "keep the overall framing". The 模型 needs to know what NOT to change. Use directional language for spatial edits. "Move the headline from top-right to 机器人tom-center", not "reposition the headline". Multi-ref: number the images in the prompt — "subject from image 1, lighting and background from image 2" — and the 模型 will 路由 the cues correctly. Where it shines Use case Why GPT Image 2 E-commerce product photography Reliable text on labels, brand-safe lighting, consistent across SKUs High-conversion ads Headline + visual integration in one pass Brand as设置 localization One source as设置 → many language variants of the same headline 签名age, posters, packaging mock-ups Text rendering accuracy at multiple 扩展s UI moc