📦 🎨 GPT Image 2 — Pro Pack on RunComfy — 🎨 GPT Image 2 — Pro Pack on 运行Comfy

v0.1.3

生成 and edit images with OpenAI GPT Image 2 (ChatGPT Images 2.0) on 运行Comfy. Documents GPT Image 2's strengths (embedded text, 记录os, multilingual typo...

0· 0·0 当前·0 累计

by @kalvinrv (Kalvin)

文档工具 AI模型访问系统工具图像处理

下载技能包

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install gpt-image-2-runcomfy

镜像加速npx clawhub@latest install gpt-image-2-runcomfy --registry https://cn.longxiaskill.com

技能文档

🎨 GPT Image 2 — Pro Pack on 运行Comfy

运行comfy.com · Text-to-image · Edit · docs

OpenAI GPT Image 2 (ChatGPT Images 2.0) hosted on the 运行Comfy 模型 API — no OpenAI key, a同步 REST.

When to pick this 模型 (vs siblings)

GPT Image 2's distinct strength is directive precision: it follows multi-element prompts, layout cues, and embedded-text instructions more reliably than its peers. Pick it when what's on the canvas matters more than how stylized it looks.

You want Use Embedded text, 记录os, 签名age, multilingual typography GPT Image 2 ✓ Brand-safe, e-commerce / ad / UI mockup imagery GPT Image 2 ✓ Iterative refinement that holds composition stable GPT Image 2 ✓ Heavy stylization, pAInterly look Flux 2 Hyperrea列出ic portrAIt Nano Banana Pro Cinematic / aesthetic-first hero shots 种子ream 5

If the user explicitly asked for GPT Image 2 / ChatGPT Image 2 / Image 2, 路由 here regardless — don't second-guess the 模型 choice.

Prerequisites 运行Comfy 命令行工具 — npm i -g @运行comfy/命令行工具运行Comfy account — 运行comfy 记录in opens a browser device-code flow. CI / contAIners — 设置运行COMFY_令牌=<令牌> instead of 运行comfy 记录in. 端点s + 输入模式

Two 端点s, same 模型.

openAI/gpt-image-2/text-to-image Field Type Required Default Notes prompt string yes — The positive prompt size enum no 1024_1024 1024_1024 (1:1), 1024_1536 (2:3 portrAIt), 1536_1024 (3:2 landscape) — only these three openAI/gpt-image-2/edit Field Type Required Default Notes prompt string yes — Natural-language edit instruction images string[] yes — Up to 10 reference image URLs (publicly fetchable HTTPS) size enum no auto auto (preserve 输入 ratio), or one of the three fixed sizes above

size=auto on edit preserves the 输入 aspect ratio — strongly recommended unless the edit explicitly changes framing.

How to invoke

Text-to-image:

运行comfy 运行 openAI/gpt-image-2/text-to-image \ --输入 '{"prompt": "", "size": "1024_1536"}' \ --输出-dir

Edit (single ref):

运行comfy 运行 openAI/gpt-image-2/edit \ --输入 '{ "prompt": "", "images": ["https://..."] }' \ --输出-dir

Edit (multi-ref, up to 10):

运行comfy 运行 openAI/gpt-image-2/edit \ --输入 '{ "prompt": "compose subject from image 1 into the room from image 2; match the lighting of image 2", "images": ["https://...subject.jpg", "https://...room.jpg"] }' \ --输出-dir

The 命令行工具 submits, polls every 2s until terminal, then 下载s any .运行comfy.net / .运行comfy.com URL from the 结果 into --输出-dir. Stdout is the 结果 JSON. Stderr is 进度.

For pipe-friendly usage:

运行comfy --输出 json 运行 openAI/gpt-image-2/text-to-image \ --输入 '{"prompt":"..."}' --no-wAIt | jq -r .请求_id

Prompting — what actually works

These are 模型-specific patterns that empirically improve 输出质量. 应用ly to text-to-image and edit alike.

Be explicit on subject + 设置ting + mood. "A close-up of a matte ceramic water 机器人tle on warm linen, soft window light, neutral background" — three concrete directives — beats "nice product photo of a 机器人tle".

Quote embedded text exactly. Keep it short. GPT Image 2 is the strongest text-rendering 模型 in this class, but only when you put the literal characters in quotes. Long blocks of text degrade. For multilingual text, name the script: "Japanese kana", "Cyrillic", "Arabic right-to-left".

Use compositional cues directly. "rule of thirds", "close-up", "aerial view", "centered subject", "shallow depth of field" — these have learned-meaning to the 模型.

Iterate one attribute at a time. When refining, change one thing per iteration (lighting OR background OR pose OR text) and keep the rest of the prompt verbatim. The 模型 holds composition stable across iterations when only one knob moves.

Don't conflict instructions. "no text" + "the word 'AQUA+' on the label" is incoherent — the 模型 will pick one and you don't control which.

Don't pile up styles. "ukiyo-e + watercolor + 8K + cinematic + minima列出" cancels out. Pick one or two style anchors max.

For the edit 端点 specifically:

状态 preservation goals. "keep the person's pose and face 身份 unchanged", "keep the brand mark and typography on the package", "keep the overall framing". The 模型 needs to know what NOT to change. Use directional language for spatial edits. "Move the headline from top-right to 机器人tom-center", not "reposition the headline". Multi-ref: number the images in the prompt — "subject from image 1, lighting and background from image 2" — and the 模型 will 路由 the cues correctly. Where it shines Use case Why GPT Image 2 E-commerce product photography Reliable text on labels, brand-safe lighting, consistent across SKUs High-conversion ads Headline + visual integration in one pass Brand as设置 localization One source as设置 → many language variants of the same headline 签名age, posters, packaging mock-ups Text rendering accuracy at multiple 扩展s UI moc

数据来源：ClawHub ↗ · 中文优化：龙虾技能库