Image Gen — 基于文本提示生成 AI 图像

Name: Image Gen — 基于文本提示生成 AI 图像
Author: 0xFango

0xFango

Image Gen — 基于文本提示生成 AI 图像

v0.1.0

使用 Labnana API 从文本提示生成 AI 图像，支持交互式多步流程、提示丰富化、多分辨率和纵横比。输出可内联显示、下载或两者兼而有之。

0· 379·0 当前·0 累计

by @0xfango (0xFango)·MIT-0

代码生成 AI模型访问 API工具自动化网络工具

下载技能包

License

MIT-0

最后更新

2026/4/8

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

该技能主要用于图像生成，但存在提供商和 API 密钥命名不一致、引用未包含的共享文档以及会创建本地配置/生成文件。安装前应验证提供商和 LISTENHUB_API_KEY。

评估建议

["验证提供商和凭据：指令调用 api.labnana.com 但环境变量为 LISTENHUB_API_KEY — 询问作者关于服务和 API 键范围。","确认本地文件存储：技能将创建 .listenhub/image-gen/config.json 和保存图像。","查询未包含的共享文档：了解其内容，特别是认证和配置模式。","首先提供有限范围或测试 API 密钥。","如果提供私有参考图像 URL，注意安全性。"]...

详细分析 ▾

⚠ 用途与能力

该技能声称调用 Labnana 图像 API 但需要 LISTENHUB_API_KEY 凭据。

ℹ 指令范围

SKILL.md 详细，范围限定为图像生成。

✓ 安装机制

仅指令的技能，无安装规格和捆绑代码。

⚠ 凭证需求

该技能需要单一环境变量 LISTENHUB_API_KEY。

ℹ 持久化与权限

该技能在 .listenhub/image-gen 下创建和写入配置和输出文件。

安装前注意事项

验证提供商和凭据：指令调用 api.labnana.com 但环境变量为 LISTENHUB_API_KEY — 询问作者关于服务和 API 键范围。
确认本地文件存储：技能将创建 .listenhub/image-gen/config.json 和保存图像。
查询未包含的共享文档：了解其内容，特别是认证和配置模式。
首先提供有限范围或测试 API 密钥。
如果提供私有参考图像 URL，注意安全性。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.1.02026/3/14

marswave-image-gen 0.1.0 - 首次发布，支持交互式多步流程、提示丰富化、Labnana API 两个模型选项（pro 和 flash）、内联输出、下载或两者兼而有之、结构化配置处理和严格错误/重试策略。

● 无害

安装命令点击复制

官方npx clawhub@latest install marswave-image-gen

镜像加速npx clawhub@latest install marswave-image-gen --registry https://cn.clawhub-mirror.com

技能文档

请参见以下中文 SKILL.md 文档（仅翻译非代码部分，保留原始代码块和 Markdown 格式）...

当使用时

用户想要从文本描述生成 AI 图像...

不使用时

用户想要创建音频内容（使用 /podcast， /speech）...

目的生成使用 Labnana API 的 AI 图像，支持文本提示、可选参考图像、多分辨率和纵横比。图像保存为本地文件。

硬性约束

无 shell 脚本。从 API 参考文件构造 curl 命令...

步骤

步骤 -1：API 密钥检查...

交互流程

步骤 1：图像描述...

工作流

构建请求：构造包含提供商、模型、提示、图像配置和可选参考图像的 JSON...

示例用户： “生成图像：夜晚的赛博朋克城市”...

提示处理 默认：直接传递用户的提示 без 修改。

API 参考

图像生成： shared/api-image.md...

组合性

调用：无（直接 API 调用）...

注意：由于原始 cn_skill_md_content 部分过长且包含大量代码块和特定格式，以上仅提供了一个简略的翻译示例。实际翻译应保留所有原始代码块、命令行指令、Markdown 格式，不翻译 YAML frontmatter 部分。

When to Use

User wants to generate an AI image from a text description
User says "generate image", "draw", "create picture", "配图"
User says "生成图片", "画一张", "AI图"
User needs a cover image, illustration, or concept art

When NOT to Use

User wants to create audio content (use /podcast, /speech)
User wants to create a video (use /explainer)
User wants to edit an existing image (not supported)
User wants to extract content from a URL (use /content-parser)

Purpose

Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files.

Hard Constraints

No shell scripts. Construct curl commands from the API reference files listed in Resources
Always read shared/authentication.md for API key and headers
Follow shared/common-patterns.md for error handling
Image generation uses a different base URL: https://api.labnana.com/openapi/v1
Always read config following shared/config-pattern.md before any interaction
Output saved to .listenhub/image-gen/YYYY-MM-DD-{jobId}/ — never ~/Downloads/

Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call the image generation API until the user has explicitly confirmed.

Step -1: API Key Check

Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0.

If file doesn't exist — ask location, then create immediately:

mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
# (or $HOME/.listenhub/image-gen/config.json for global)

Then run Setup Flow below.

If file exists — read config, display summary, and confirm:

当前配置 (image-gen)：
  输出方式：{inline / download / both}

Ask: "使用已保存的配置？" → 确认，直接继续 / 重新配置

Setup Flow (first run or reconfigure)

outputMode: Follow shared/output-mode.md § Setup Flow Question.

Save immediately:

# Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

Step 1: Image Description

Free text input. Ask the user:

Describe the image you want to generate.

If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.

Step 2: Model

Ask:

Question: "Which model?"
Options:
  - "pro (recommended)" — gemini-3-pro-image-preview, higher quality
  - "flash" — gemini-3.1-flash-image-preview, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)

Step 3: Resolution and Aspect Ratio

Ask both together (independent parameters):

Question: "What resolution?"
Options:
  - "1K" — Standard quality
  - "2K (recommended)" — High quality, good balance
  - "4K" — Ultra high quality, slower generation

Question: "What aspect ratio?"
Options (all models):
  - "16:9" — Landscape, widescreen
  - "1:1" — Square
  - "9:16" — Portrait, phone screen
  - "Other" — 2:3, 3:2, 3:4, 4:3, 21:9

If flash model was selected, also offer: 1:4 (narrow portrait), 4:1 (wide landscape), 1:8 (extreme portrait), 8:1 (panoramic)

Step 4: Reference Images (optional)

Question: "Any reference images for style guidance?"
Options:
  - "Yes, I have URL(s)" — Provide reference image URLs
  - "No references" — Generate from prompt only

If yes, collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build:

{ "fileData": { "fileUri": "", "mimeType": "" } }

Suffix mapping: .jpg/.jpeg → image/jpeg, .png → image/png, .webp → image/webp, .gif → image/gif

Step 5: Confirm & Generate

Summarize all choices:

Ready to generate image:
  Prompt: {prompt text}
  Model: {pro / flash}
  Resolution: {1K / 2K / 4K}
  Aspect ratio: {ratio}
  References: {yes (N URLs) / no}  Proceed?

Wait for explicit confirmation before calling the API.

Workflow

Build request: Construct JSON with provider, model, prompt, imageConfig, and optional referenceImages
Submit: POST https://api.labnana.com/openapi/v1/images/generation with timeout of 600s
Extract image: Parse base64 data from response
Decode and present result

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

inline or both: Decode base64 to a temp file, then use the Read tool.

JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg

Then use the Read tool on /tmp/image-gen-{jobId}.jpg. The image displays inline in the conversation.

Present:

图片已生成！

download or both: Save to the artifact directory.

JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

Present:

图片已生成！已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/：
  {jobId}.jpg

Base64 decoding (cross-platform):

# Linux
echo "$BASE64_DATA" | base64 -d > output.jpg# macOS
echo "$BASE64_DATA" | base64 -D > output.jpg
# or
echo "$BASE64_DATA" | base64 --decode > output.jpg

Retry logic: On 429 (rate limit), wait 15 seconds and retry. Max 3 retries.

Prompt Handling

Default: Pass the user's prompt directly without modification.

When to offer optimization:

Prompt is very short (a few words) AND user hasn't requested verbatim
Ask: "Would you like help enriching the prompt with style/lighting/composition details?"

When to never modify:

Long, detailed, or structured prompts — treat the user as experienced
User says "use this prompt exactly"

Optimization techniques (if user agrees):

Style: "cyberpunk" → add "neon lights, futuristic, dystopian"
Scene: time of day, lighting, weather
Quality: "highly detailed", "8K quality", "cinematic composition"
Always use English keywords (models trained on English)
Show optimized prompt before submitting

API Reference

Image generation: shared/api-image.md
Error handling: shared/common-patterns.md § Error Handling

Composability

Invokes: nothing (direct API call)
Invoked by: platform skills for cover images (Phase 2)

Example

User: "Generate an image: cyberpunk city at night"

Agent workflow:

Prompt is short → offer enrichment → user declines
Ask model → "pro"
Ask resolution → "2K"
Ask ratio → "16:9"
No references

RESPONSE=$(curl -sS -X POST "https://api.labnana.com/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "cyberpunk city at night",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
  }')BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

Decode the base64 data per outputMode (see shared/output-mode.md).

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

步骤 1：图像描述...

When to Use

When NOT to Use

Purpose

Hard Constraints

Step -1: API Key Check

Step 0: Config Setup

Setup Flow (first run or reconfigure)

Interaction Flow

Step 1: Image Description

Step 2: Model

Step 3: Resolution and Aspect Ratio

Step 4: Reference Images (optional)

Step 5: Confirm & Generate

Workflow

Prompt Handling

API Reference

Composability

Example

安装命令点击复制