ControlFoley

Name: ControlFoley
Rating: 2

v1.0.8

Audio 生成器A multi-functional audio generation 工具 for SFX generation, video-to-audio and text-to-audio. 多功能音频生成工具，集成可控视频生成音频、文本生成音频等功能.

2· 139·0 当前·0 累计

by @yjx-research (Jianxuan Yang)

开发工具代码生成视频处理图像处理

下载技能包

最后更新

2026/4/23

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能的代码和说明与其声明的音频生成目的一致；它会将提供的媒体上传至远程 ControlFoley 服务并返回生成的音频——未发现无法解释的身份凭证或隐藏持久化行为。

评估建议

此技能将你提供的任何视频、音频或提示上传至远程服务（https://controlfoley.ai.xiaomi.com），并将返回的音频保存在本地。安装或使用前请注意：(1) 确认你信任该远程端点，避免上传敏感或私密媒体；(2) 脚本会调用本地 curl 二进制文件（需 python3）；SKILL.md 还提到可选转换需 ffmpeg——如需要请确保已安装；(3) 自行查阅引用的上游 GitHub/项目页面，验证来源与隐私政策；(4) 如需离线或自托管工作流，本技能不适用，因其依赖远程 API。...

详细分析 ▾

ℹ 用途与能力

The 技能's name/description (audio SFX, V2A, T2A) align with the included script and API references. Minor inconsistency: the registry metadata 列出s no required binaries, but 技能.md and scripts rely on python3 and call curl (subprocess). 技能.md also mentions ffmpeg as optional. These binaries are reasonable for the 状态d purpose but should be declared in metadata.

✓ 指令范围

技能.md and scripts limit actions to submitting tasks to the specified API, polling for 状态, 下载ing 结果s, and writing 输出s to the chosen 输出 directory. The code 检查s 输入 file existence and does not read unrelated 系统 files or 环境 variables. The mAIn 运行time behavior is 上传ing user-provided media/text to the remote API and saving returned files.

✓ 安装机制

There is no 安装 spec; this is an instruction-only 技能 with a bundled Python script. No 安装ers, third-party packages, or arbitrary 下载s are performed by the 技能 itself.

✓ 凭证需求

The 技能 declares no 环境 variables or 凭证s and the code does not attempt to 访问 secrets. All network communication goes to controlfoley.AI.xiaomi.com (and documented fallback 端点s). No unrelated 服务凭证s are 请求ed.

✓ 持久化与权限

The 技能 is not always-enabled and does not modify other 技能s or 系统-wide configuration. It 运行s on invocation and does not 请求 special persistent privileges.

安全有层次，运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.82026/4/21

- 更新d 隐私 and security section to 添加 clear 图形界面delines on data handling, processing, and user recommendations. - 移除d duplicated and verbose API usage examples to 流line documentation. - Kept the API and 命令行工具 usage, parameters, and error handling instructions unchanged. - No functional or interface changes; documentation improvements only.

● 无害

安装命令

点击复制

官方npx clawhub@latest install controlfoley-audio-generator

镜像加速npx clawhub@latest install controlfoley-audio-generator --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

A multi-functional audio generation 工具 powered by the ControlFoley 模型, integrating video sound effect (SFX) generation, video background music composition, text-to-audio and other functions to realize diversified creative audio generation.

This 工具 supports four modes: Video-to-Audio (V2A), Text-Controlled Video-to-Audio (TC-V2A), Audio-Controlled Video-to-Audio (AC-V2A), and Text-to-Audio (T2A).

Basic 信息

Field	Value
服务 Operator	Xiaomi LLM Plus Team
API 端点	`https://controlfoley.AI.xiaomi.com`
Open Source Repo	`https://github.com/xiaomi-re搜索/controlfoley`
Project Page	`https://yjx-re搜索.github.io/ControlFoley_网页_page/`
Online Demo	`https://yjx-re搜索.github.io/ControlFoley_网页_page/#try-gen`
模型 Weights	`https://huggingface.co/YJX-Xiaomi/ControlFoley/`
API Key	Not required
Script Path	`scripts/foley.py`

Prerequisites

python3 --version   # Python 3.x
curl --version      # curl for API submission
ffmpeg -version     # optional, for audio 格式化 conversion

Modes

Mode	Command	输入	输出	Description
V2A	`v2a video.mp4`	Video file	.mp4 + .flac	生成 audio matching the video content
TC-V2A	`v2a video.mp4 --prompt "text"`	Video + text	.mp4 + .flac	生成 audio aligned with text prompts while staying 同步hronized with the video
AC-V2A	`v2a video.mp4 --ref-audio ref.wav`	Video + reference audio	.mp4 + .flac	生成 audio with timbre matching reference audio while staying 同步hronized with the video
T2A	`t2a "prompt"`	Text description	.flac	生成 audio from text descriptions

Usage (命令行工具 version)

1. Text-to-Audio (T2A, default 8s)

python3 scripts/foley.py t2a "dog barking loudly in a park"

2. Video-to-Audio (V2A)

python3 scripts/foley.py v2a 输入.mp4

3. Text-Controlled Video-to-Audio (TC-V2A)

python3 scripts/foley.py v2a 输入.mp4 --prompt "footsteps on gravel with birds chirping"

4. Audio-Controlled Video-to-Audio (AC-V2A)

python3 scripts/foley.py v2a 输入.mp4 --ref-audio reference.wav

5. Specify duration

python3 scripts/foley.py t2a "A mountAIn 流 murmurs, its gentle current l应用ing agAInst the pebbles." --duration 15

6. 生成 multiple candidates

python3 scripts/foley.py t2a "cat purring softly" --count 3

7. Fixed 种子 (reproducible 结果s)

python3 scripts/foley.py t2a "rAIn on a tin roof" --种子 42

8. 列出 avAIlable 模型s

python3 scripts/foley.py 模型s

Usage (API version)

POST

curl -X POST "https://controlfoley.AI.xiaomi.com/API/v1/v2a/submit" -F "file=@video_path" -F "prompt=footsteps on gravel with birds chirping"

return

{"taskId": "xxx", "message": "Task submitted 成功fully"}

获取

1. AvAIlable 模型s

curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/模型s"

return

{"模型s":[{"name":"ControlFoley","enabled":true}]}

2. 状态 Inquiry

curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/状态/{taskId}"

return

成功：

{"urls":["{DomAIn name}/ControlFoley_输出/{taskId}/{filename}"],"状态":"成功","done":true}

processing：

{"状态":"processing","done":false}

pending:

json 
{"状态":"pending","队列_pos":1,"队列_position":1,"total_队列":2,"done":false}

3. 结果 下载bash
curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/ControlFoley_输出/{taskId}/{filename}" --输出 ./输出.flac

4. 状态 Inquiry & 结果 下载bash
curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/状态_下载/{taskId}" --输出-dir ./输出 --输出 audio.zip


Parameters
T2A (Text-to-Audio)
Parameter Description Default Example
prompt Audio description text (required) — "dog barking in park"
--模型 模型 ID ControlFoley --模型 ControlFoley
--duration Audio length in seconds (max 30) 8 --duration 15
--negative Negative prompt to exclude unwanted sounds — --negative "noise, human voice"
--cfg CFG strength — higher = stricter prompt adherence 4.5 --cfg 6.0
--count Number of variants to 生成 (1–5) 1 --count 3
--种子 Fixed random 种子 for reproducibility — --种子 42
-o/--outdir 输出 directory ./输出 -o ./my_audio`
V2A (Video-to-Audio)
| Parameter | Description | Default | Example |
|-----------|

ControlFoley

运行时依赖

版本

安装命令

技能文档

Basic 信息

Prerequisites

Modes

Usage (命令行工具 version)

1. Text-to-Audio (T2A, default 8s)

2. Video-to-Audio (V2A)

3. Text-Controlled Video-to-Audio (TC-V2A)

4. Audio-Controlled Video-to-Audio (AC-V2A)

5. Specify duration

6. 生成 multiple candidates

7. Fixed 种子 (reproducible 结果s)

8. 列出 avAIlable 模型s

Usage (API version)

POST

获取

1. AvAIlable 模型s

2. 状态 Inquiry

3. 结果下载

4. 状态 Inquiry & 结果下载

Parameters

T2A (Text-to-Audio)

V2A (Video-to-Audio)

相关技能推荐

Parameter	Description	Default	Example
prompt	Audio description text (required)	—	"dog barking in park"
--模型	模型 ID	ControlFoley	--模型 ControlFoley
--duration	Audio length in seconds (max 30)	8	--duration 15
--negative	Negative prompt to exclude unwanted sounds	—	--negative "noise, human voice"
--cfg	CFG strength — higher = stricter prompt adherence	4.5	--cfg 6.0
--count	Number of variants to 生成 (1–5)	1	--count 3
--种子	Fixed random 种子 for reproducibility	—	--种子 42
-o/--outdir	输出 directory	./输出	-o ./my_audio`

运行时依赖

版本

安装命令

技能文档

Basic 信息

Prerequisites

Modes

Usage (命令行工具 version)

1. Text-to-Audio (T2A, default 8s)

2. Video-to-Audio (V2A)

3. Text-Controlled Video-to-Audio (TC-V2A)

4. Audio-Controlled Video-to-Audio (AC-V2A)

5. Specify duration

6. 生成 multiple candidates

7. Fixed 种子 (reproducible 结果s)

8. 列出 avAIlable 模型s

Usage (API version)

POST

获取

1. AvAIlable 模型s

2. 状态 Inquiry

3. 结果 下载

4. 状态 Inquiry & 结果 下载

Parameters

T2A (Text-to-Audio)

V2A (Video-to-Audio)

相关技能推荐

3. 结果下载

4. 状态 Inquiry & 结果下载