请参见以下中文 SKILL.md 文档(仅翻译非代码部分,保留原始代码块和 Markdown 格式)...
当使用时
不使用时
- 用户想要创建音频内容(使用
/podcast, /speech)...
目的
生成使用 Labnana API 的 AI 图像,支持文本提示、可选参考图像、多分辨率和纵横比。图像保存为本地文件。
硬性约束
- 无 shell 脚本。从 API 参考文件构造 curl 命令...
步骤
交互流程
步骤 1:图像描述...
工作流
- 构建请求:构造包含提供商、模型、提示、图像配置和可选参考图像的 JSON...
示例
用户: “生成图像:夜晚的赛博朋克城市”...
提示处理
默认:直接传递用户的提示 без 修改。
API 参考
- 图像生成:
shared/api-image.md...
组合性
注意:由于原始 cn_skill_md_content 部分过长且包含大量代码块和特定格式,以上仅提供了一个简略的翻译示例。实际翻译应保留所有原始代码块、命令行指令、Markdown 格式,不翻译 YAML frontmatter 部分。
When to Use
- User wants to generate an AI image from a text description
- User says "generate image", "draw", "create picture", "配图"
- User says "生成图片", "画一张", "AI图"
- User needs a cover image, illustration, or concept art
When NOT to Use
- User wants to create audio content (use
/podcast, /speech)
- User wants to create a video (use
/explainer)
- User wants to edit an existing image (not supported)
- User wants to extract content from a URL (use
/content-parser)
Purpose
Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files.
Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read
shared/authentication.md for API key and headers
- Follow
shared/common-patterns.md for error handling
- Image generation uses a different base URL:
https://api.labnana.com/openapi/v1
- Always read config following
shared/config-pattern.md before any interaction
- Output saved to
.listenhub/image-gen/YYYY-MM-DD-{jobId}/ — never ~/Downloads/
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call the image generation API until the user has explicitly confirmed.
Step -1: API Key Check
Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0.
If file doesn't exist — ask location, then create immediately:
mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
# (or $HOME/.listenhub/image-gen/config.json for global)
Then run
Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (image-gen):
输出方式:{inline / download / both}
Ask: "使用已保存的配置?" →
确认,直接继续 /
重新配置Setup Flow (first run or reconfigure)
- outputMode: Follow
shared/output-mode.md § Setup Flow Question.
Save immediately:
# Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: Image Description
Free text input. Ask the user:
Describe the image you want to generate.
If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.
Step 2: Model
Ask:
Question: "Which model?"
Options:
- "pro (recommended)" — gemini-3-pro-image-preview, higher quality
- "flash" — gemini-3.1-flash-image-preview, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)
Step 3: Resolution and Aspect Ratio
Ask both together (independent parameters):
Question: "What resolution?"
Options:
- "1K" — Standard quality
- "2K (recommended)" — High quality, good balance
- "4K" — Ultra high quality, slower generation
Question: "What aspect ratio?"
Options (all models):
- "16:9" — Landscape, widescreen
- "1:1" — Square
- "9:16" — Portrait, phone screen
- "Other" — 2:3, 3:2, 3:4, 4:3, 21:9
If flash model was selected, also offer: 1:4 (narrow portrait), 4:1 (wide landscape), 1:8 (extreme portrait), 8:1 (panoramic)
Step 4: Reference Images (optional)
Question: "Any reference images for style guidance?"
Options:
- "Yes, I have URL(s)" — Provide reference image URLs
- "No references" — Generate from prompt only
If yes, collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build:
{ "fileData": { "fileUri": "", "mimeType": "" } }
Suffix mapping:
.jpg/
.jpeg →
image/jpeg,
.png →
image/png,
.webp →
image/webp,
.gif →
image/gifStep 5: Confirm & Generate
Summarize all choices:
Ready to generate image: Prompt: {prompt text}
Model: {pro / flash}
Resolution: {1K / 2K / 4K}
Aspect ratio: {ratio}
References: {yes (N URLs) / no}
Proceed?
Wait for explicit confirmation before calling the API.
Workflow
- Build request: Construct JSON with provider, model, prompt, imageConfig, and optional referenceImages
- Submit:
POST https://api.labnana.com/openapi/v1/images/generation with timeout of 600s
- Extract image: Parse base64 data from response
- Decode and present result
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Decode base64 to a temp file, then use the Read tool.
JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg
Then use the Read tool on
/tmp/image-gen-{jobId}.jpg. The image displays inline in the conversation.
Present:
图片已生成!
download or both: Save to the artifact directory.
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Present:
图片已生成!已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/:
{jobId}.jpg
Base64 decoding (cross-platform):
# Linux
echo "$BASE64_DATA" | base64 -d > output.jpg# macOS
echo "$BASE64_DATA" | base64 -D > output.jpg
# or
echo "$BASE64_DATA" | base64 --decode > output.jpg
Retry logic: On 429 (rate limit), wait 15 seconds and retry. Max 3 retries.
Prompt Handling
Default: Pass the user's prompt directly without modification.
When to offer optimization:
- Prompt is very short (a few words) AND user hasn't requested verbatim
- Ask: "Would you like help enriching the prompt with style/lighting/composition details?"
When to never modify:
- Long, detailed, or structured prompts — treat the user as experienced
- User says "use this prompt exactly"
Optimization techniques (if user agrees):
- Style: "cyberpunk" → add "neon lights, futuristic, dystopian"
- Scene: time of day, lighting, weather
- Quality: "highly detailed", "8K quality", "cinematic composition"
- Always use English keywords (models trained on English)
- Show optimized prompt before submitting
API Reference
- Image generation:
shared/api-image.md
- Error handling:
shared/common-patterns.md § Error Handling
Composability
- Invokes: nothing (direct API call)
- Invoked by: platform skills for cover images (Phase 2)
Example
User: "Generate an image: cyberpunk city at night"
Agent workflow:
- Prompt is short → offer enrichment → user declines
- Ask model → "pro"
- Ask resolution → "2K"
- Ask ratio → "16:9"
- No references
RESPONSE=$(curl -sS -X POST "https://api.labnana.com/openapi/v1/images/generation" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
--max-time 600 \
-d '{
"provider": "google",
"model": "gemini-3-pro-image-preview",
"prompt": "cyberpunk city at night",
"imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
}')BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Decode the base64 data per outputMode (see shared/output-mode.md).