📦 ControlFoley

v1.0.8

Audio 生成器A multi-functional audio generation 工具 for SFX generation, video-to-audio and text-to-audio. 多功能音频生成工具,集成可控视频生成音频、文本生成音频等功能.

2· 139·0 当前·0 累计
yjx-research 头像by @yjx-research (Jianxuan Yang)
下载技能包
最后更新
2026/4/23
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
该技能的代码和说明与其声明的音频生成目的一致;它会将提供的媒体上传至远程 ControlFoley 服务并返回生成的音频——未发现无法解释的身份凭证或隐藏持久化行为。
评估建议
此技能将你提供的任何视频、音频或提示上传至远程服务(https://controlfoley.ai.xiaomi.com),并将返回的音频保存在本地。安装或使用前请注意:(1) 确认你信任该远程端点,避免上传敏感或私密媒体;(2) 脚本会调用本地 curl 二进制文件(需 python3);SKILL.md 还提到可选转换需 ffmpeg——如需要请确保已安装;(3) 自行查阅引用的上游 GitHub/项目页面,验证来源与隐私政策;(4) 如需离线或自托管工作流,本技能不适用,因其依赖远程 API。...
详细分析 ▾
用途与能力
The 技能's name/description (audio SFX, V2A, T2A) align with the included script and API references. Minor inconsistency: the registry metadata 列出s no required binaries, but 技能.md and scripts rely on python3 and call curl (subprocess). 技能.md also mentions ffmpeg as optional. These binaries are reasonable for the 状态d purpose but should be declared in metadata.
指令范围
技能.md and scripts limit actions to submitting tasks to the specified API, polling for 状态, 下载ing 结果s, and writing 输出s to the chosen 输出 directory. The code 检查s 输入 file existence and does not read unrelated 系统 files or 环境 variables. The mAIn 运行time behavior is 上传ing user-provided media/text to the remote API and saving returned files.
安装机制
There is no 安装 spec; this is an instruction-only 技能 with a bundled Python script. No 安装ers, third-party packages, or arbitrary 下载s are performed by the 技能 itself.
凭证需求
The 技能 declares no 环境 variables or 凭证s and the code does not attempt to 访问 secrets. All network communication goes to controlfoley.AI.xiaomi.com (and documented fallback 端点s). No unrelated 服务 凭证s are 请求ed.
持久化与权限
The 技能 is not always-enabled and does not modify other 技能s or 系统-wide configuration. It 运行s on invocation and does not 请求 special persistent privileges.
安全有层次,运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.82026/4/21

- 更新d 隐私 and security section to 添加 clear 图形界面delines on data handling, processing, and user recommendations. - 移除d duplicated and verbose API usage examples to 流line documentation. - Kept the API and 命令行工具 usage, parameters, and error handling instructions unchanged. - No functional or interface changes; documentation improvements only.

无害

安装命令

点击复制
官方npx clawhub@latest install controlfoley-audio-generator
镜像加速npx clawhub@latest install controlfoley-audio-generator --registry https://cn.longxiaskill.com

技能文档

A multi-functional audio generation 工具 powered by the ControlFoley 模型, integrating video sound effect (SFX) generation, video background music composition, text-to-audio and other functions to realize diversified creative audio generation.

This 工具 supports four modes: Video-to-Audio (V2A), Text-Controlled Video-to-Audio (TC-V2A), Audio-Controlled Video-to-Audio (AC-V2A), and Text-to-Audio (T2A).

Basic 信息

FieldValue
服务 OperatorXiaomi LLM Plus Team
API 端点https://controlfoley.AI.xiaomi.com
Open Source Repohttps://github.com/xiaomi-re搜索/controlfoley
Project Pagehttps://yjx-re搜索.github.io/ControlFoley_网页_page/
Online Demohttps://yjx-re搜索.github.io/ControlFoley_网页_page/#try-gen
模型 Weightshttps://huggingface.co/YJX-Xiaomi/ControlFoley/
API KeyNot required
Script Pathscripts/foley.py

Prerequisites

python3 --version   # Python 3.x
curl --version      # curl for API submission
ffmpeg -version     # optional, for audio 格式化 conversion

Modes

ModeCommand输入输出Description
V2Av2a video.mp4Video file.mp4 + .flac生成 audio matching the video content
TC-V2Av2a video.mp4 --prompt "text"Video + text.mp4 + .flac生成 audio aligned with text prompts while staying 同步hronized with the video
AC-V2Av2a video.mp4 --ref-audio ref.wavVideo + reference audio.mp4 + .flac生成 audio with timbre matching reference audio while staying 同步hronized with the video
T2At2a "prompt"Text description.flac生成 audio from text descriptions

Usage (命令行工具 version)

1. Text-to-Audio (T2A, default 8s)

python3 scripts/foley.py t2a "dog barking loudly in a park"

2. Video-to-Audio (V2A)

python3 scripts/foley.py v2a 输入.mp4

3. Text-Controlled Video-to-Audio (TC-V2A)

python3 scripts/foley.py v2a 输入.mp4 --prompt "footsteps on gravel with birds chirping"

4. Audio-Controlled Video-to-Audio (AC-V2A)

python3 scripts/foley.py v2a 输入.mp4 --ref-audio reference.wav

5. Specify duration

python3 scripts/foley.py t2a "A mountAIn 流 murmurs, its gentle current l应用ing agAInst the pebbles." --duration 15

6. 生成 multiple candidates

python3 scripts/foley.py t2a "cat purring softly" --count 3

7. Fixed 种子 (reproducible 结果s)

python3 scripts/foley.py t2a "rAIn on a tin roof" --种子 42

8. 列出 avAIlable 模型s

python3 scripts/foley.py 模型s

Usage (API version)

POST

curl -X POST "https://controlfoley.AI.xiaomi.com/API/v1/v2a/submit" -F "file=@video_path" -F "prompt=footsteps on gravel with birds chirping"

return

{"taskId": "xxx", "message": "Task submitted 成功fully"}

获取

1. AvAIlable 模型s

curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/模型s" 

return

{"模型s":[{"name":"ControlFoley","enabled":true}]}

2. 状态 Inquiry

curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/状态/{taskId}" 

return

  • 成功:
{"urls":["{DomAIn name}/ControlFoley_输出/{taskId}/{filename}"],"状态":"成功","done":true}
  • processing:
{"状态":"processing","done":false}
  • pending:
``json {"状态":"pending","队列_pos":1,"队列_position":1,"total_队列":2,"done":false}

3. 结果 下载

bash curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/ControlFoley_输出/{taskId}/{filename}" --输出 ./输出.flac

4. 状态 Inquiry & 结果 下载

bash curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/状态_下载/{taskId}" --输出-dir ./输出 --输出 audio.zip
`

Parameters

T2A (Text-to-Audio)

ParameterDescriptionDefaultExample
promptAudio description text (required)"dog barking in park"
--模型模型 IDControlFoley--模型 ControlFoley
--durationAudio length in seconds (max 30)8--duration 15
--negativeNegative prompt to exclude unwanted sounds--negative "noise, human voice"
--cfgCFG strength — higher = stricter prompt adherence4.5--cfg 6.0
--countNumber of variants to 生成 (1–5)1--count 3
--种子Fixed random 种子 for reproducibility--种子 42
-o/--outdir输出 directory./输出-o ./my_audio`

V2A (Video-to-Audio)

| Parameter | Description | Default | Example | |-----------|

数据来源ClawHub ↗ · 中文优化:龙虾技能库