🎙️ Subtitle Generator By Audio — 技能工具

v1.0.0

Turn a 3-minute interview recording in MP3 into 1080p captioned video files just by typing what you need. Whether it's adding auto-generated subtitles from a...

0· 33·0 当前·0 累计
roca-677 头像by @roca-677·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/15
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
The skill's behavior largely matches a cloud subtitle service but has small inconsistencies (declared config paths vs registry metadata) and asks the agent to obtain and persist an auth token and session—actions that warrant caution before installing.
评估建议
This skill uploads your audio/video to an external service (mega-api-prod.nemovideo.ai) and needs a NEMO_TOKEN. If you don't provide one, it will create an anonymous token and store a session/token (likely under ~/.config/nemovideo/). Before installing, consider: 1) Do you trust that remote domain to receive and store your media? 2) Do you want the skill to automatically create and persist credentials on your machine? 3) The registry metadata and the SKILL.md disagree about config paths—ask the ...
详细分析 ▾
用途与能力
The skill claims to generate subtitles via a cloud rendering pipeline and the instructions call out a single backend (mega-api-prod.nemovideo.ai) and a single credential (NEMO_TOKEN). Requesting an API token is coherent with the stated purpose. However, the SKILL.md frontmatter references a config path (~/.config/nemovideo/) while the registry metadata earlier reported no required config paths—this mismatch is inconsistent.
指令范围
Runtime instructions tell the agent to auto-create an anonymous token if NEMO_TOKEN is not present, persist the token/session_id, upload local files (multipart @/path) or URLs, and include attribution headers that may require inspecting install paths to determine 'X-Skill-Platform'. The instructions also explicitly say not to show raw API responses or token values to the user. Those items are within the service's needs (auth, upload, polling), but they broaden the agent's actions to token generation, storage, filesystem access for uploads, and potentially reading install path details — all of which expand the privacy surface beyond a purely read-only helper.
安装机制
This is an instruction-only skill with no install spec or bundled code. Nothing is written to disk by an installer step; risk from installation mechanism is low.
凭证需求
Only one credential is declared: NEMO_TOKEN (primary), which matches the described cloud API usage. But the skill will create and persist an anonymous token automatically if none is supplied, and the frontmatter suggests a config path for storage (~/.config/nemovideo/). Requiring or creating a token is proportionate to contacting the remote service, but automatic token creation/storage and the config path are behavioral details the user should be aware of.
持久化与权限
always:false (no forced global presence) and no explicit request to modify other skills or system-wide settings. The skill does instruct persisting session state and token (and references a per-user config path), which is normal for a cloud-backed tool but does create persistent credentials on the host.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/15

- Initial release of Subtitle Generator by Audio. - Upload audio or video files and automatically generate captioned 1080p videos with AI, no timeline editing required. - Fast processing: 30-90 seconds from upload to downloadable result for a typical 3-minute clip. - Simple authentication with free credits for new users; session and render jobs handled via cloud backend. - Supports common workflow actions: upload, generate subtitles, check status, view credits, and export in various formats. - Intuitive keyword-based command routing and clear cloud pipeline feedback for a streamlined user experience.

无害

安装命令

点击复制
官方npx clawhub@latest install subtitle-generator-by-audio
镜像加速npx clawhub@latest install subtitle-generator-by-audio --registry https://cn.longxiaskill.com

技能文档

Getting Started

Share your audio or video files and I'll get started on AI subtitle generation. Or just tell me what you're thinking.

Try saying:

  • "generate my audio or video files"
  • "export 1080p MP4"
  • "generate subtitles from the audio and"

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.

  • Obtain a free token: Generate a random UUID as client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
  • Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer , Content-Type: application/json, and body {"task_name":"project","language":""}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

# Subtitle Generator by Audio — Generate Subtitles from Audio

This tool takes your audio or video files and runs AI subtitle generation through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 3-minute interview recording in MP3 and want to generate subtitles from the audio and burn them into the video — the backend processes it in about 30-90 seconds and hands you a 1080p MP4.

Tip: cleaner audio with less background noise produces more accurate subtitles.

Matching Input to Actions

User prompts referencing subtitle generator by audio, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Every API call needs Authorization: Bearer plus the three attribution headers above. If any header is missing, exports return 402.

Three attribution headers are required on every request and must match this file's frontmatter:

HeaderValue
X-Skill-Sourcesubtitle-generator-by-audio
X-Skill-Versionfrontmatter version
X-Skill-Platformauto-detect: clawhub / cursor / unknown from install path
API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":""} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"","new_message":{"parts":[{"text":""}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/ — file: multipart -F "files=@/path", or URL: {"urls":[""],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me//latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_","sessionId":"","draft":,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/ every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend saysYou do
"click [button]" / "点击"Execute via API
"open [panel]" / "打开"Query session state
"drag/drop" / "拖拽"Send edit via SSE
"preview in timeline"Show track summary
"Export button" / "导出"Execute export workflow

SSE Event Handling

EventAction
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Common Workflows

Quick edit: Upload → "generate subtitles from the audio and burn them into the video" → Download MP4. Takes 30-90 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "generate subtitles from the audio and burn them into the video" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, MP3, WAV for the smoothest experience.

Export as MP4 for widest compatibility across platforms.

数据来源ClawHub ↗ · 中文优化:龙虾技能库