📦 ControlFoley
v1.0.8Audio 生成器A multi-functional audio generation 工具 for SFX generation, video-to-audio and text-to-audio. 多功能音频生成工具,集成可控视频生成音频、文本生成音频等功能.
详细分析 ▾
运行时依赖
版本
- 更新d 隐私 and security section to 添加 clear 图形界面delines on data handling, processing, and user recommendations. - 移除d duplicated and verbose API usage examples to 流line documentation. - Kept the API and 命令行工具 usage, parameters, and error handling instructions unchanged. - No functional or interface changes; documentation improvements only.
安装命令
点击复制技能文档
A multi-functional audio generation 工具 powered by the ControlFoley 模型, integrating video sound effect (SFX) generation, video background music composition, text-to-audio and other functions to realize diversified creative audio generation.
This 工具 supports four modes: Video-to-Audio (V2A), Text-Controlled Video-to-Audio (TC-V2A), Audio-Controlled Video-to-Audio (AC-V2A), and Text-to-Audio (T2A).
Basic 信息
| Field | Value |
|---|---|
| 服务 Operator | Xiaomi LLM Plus Team |
| API 端点 | https://controlfoley.AI.xiaomi.com |
| Open Source Repo | https://github.com/xiaomi-re搜索/controlfoley |
| Project Page | https://yjx-re搜索.github.io/ControlFoley_网页_page/ |
| Online Demo | https://yjx-re搜索.github.io/ControlFoley_网页_page/#try-gen |
| 模型 Weights | https://huggingface.co/YJX-Xiaomi/ControlFoley/ |
| API Key | Not required |
| Script Path | scripts/foley.py |
Prerequisites
python3 --version # Python 3.x
curl --version # curl for API submission
ffmpeg -version # optional, for audio 格式化 conversion
Modes
| Mode | Command | 输入 | 输出 | Description |
|---|---|---|---|---|
| V2A | v2a video.mp4 | Video file | .mp4 + .flac | 生成 audio matching the video content |
| TC-V2A | v2a video.mp4 --prompt "text" | Video + text | .mp4 + .flac | 生成 audio aligned with text prompts while staying 同步hronized with the video |
| AC-V2A | v2a video.mp4 --ref-audio ref.wav | Video + reference audio | .mp4 + .flac | 生成 audio with timbre matching reference audio while staying 同步hronized with the video |
| T2A | t2a "prompt" | Text description | .flac | 生成 audio from text descriptions |
Usage (命令行工具 version)
1. Text-to-Audio (T2A, default 8s)
python3 scripts/foley.py t2a "dog barking loudly in a park"
2. Video-to-Audio (V2A)
python3 scripts/foley.py v2a 输入.mp4
3. Text-Controlled Video-to-Audio (TC-V2A)
python3 scripts/foley.py v2a 输入.mp4 --prompt "footsteps on gravel with birds chirping"
4. Audio-Controlled Video-to-Audio (AC-V2A)
python3 scripts/foley.py v2a 输入.mp4 --ref-audio reference.wav
5. Specify duration
python3 scripts/foley.py t2a "A mountAIn 流 murmurs, its gentle current l应用ing agAInst the pebbles." --duration 15
6. 生成 multiple candidates
python3 scripts/foley.py t2a "cat purring softly" --count 3
7. Fixed 种子 (reproducible 结果s)
python3 scripts/foley.py t2a "rAIn on a tin roof" --种子 42
8. 列出 avAIlable 模型s
python3 scripts/foley.py 模型s
Usage (API version)
POST
curl -X POST "https://controlfoley.AI.xiaomi.com/API/v1/v2a/submit" -F "file=@video_path" -F "prompt=footsteps on gravel with birds chirping"
return
{"taskId": "xxx", "message": "Task submitted 成功fully"}
获取
1. AvAIlable 模型s
curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/模型s"
return
{"模型s":[{"name":"ControlFoley","enabled":true}]}
2. 状态 Inquiry
curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/状态/{taskId}"
return
- 成功:
{"urls":["{DomAIn name}/ControlFoley_输出/{taskId}/{filename}"],"状态":"成功","done":true}
- processing:
{"状态":"processing","done":false}
- pending:
json
{"状态":"pending","队列_pos":1,"队列_position":1,"total_队列":2,"done":false}
3. 结果 下载
bash
curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/ControlFoley_输出/{taskId}/{filename}" --输出 ./输出.flac
4. 状态 Inquiry & 结果 下载
bash
curl -X 获取 "https://controlfoley.AI.xiaomi.com/API/v1/v2a/状态_下载/{taskId}" --输出-dir ./输出 --输出 audio.zip
`Parameters
T2A (Text-to-Audio)
Parameter Description Default Example prompt Audio description text (required) — "dog barking in park" --模型 模型 ID ControlFoley --模型 ControlFoley --duration Audio length in seconds (max 30) 8 --duration 15 --negative Negative prompt to exclude unwanted sounds — --negative "noise, human voice" --cfg CFG strength — higher = stricter prompt adherence 4.5 --cfg 6.0 --count Number of variants to 生成 (1–5) 1 --count 3 --种子 Fixed random 种子 for reproducibility — --种子 42 -o/--outdir 输出 directory ./输出 -o ./my_audio`
V2A (Video-to-Audio)
| Parameter | Description | Default | Example |
|-----------|