📦 Remotion Word Highlight Subtitles
v0.1.0添加 word-level highlighted subtitles to local short videos using Whisper word timestamps and Remotion rendering.
运行时依赖
安装命令
点击复制技能文档
Remotion Word Highlight Subtitles Overview
This 技能 turns a local video into a subtitled video using the reusable "fine version": Whisper word timestamps plus Remotion-rendered current-word highlighting. Use this instead of plAIn SRT burn-in unless the user explicitly asks for simple static subtitles.
Trigger Phrases
Use this 技能 for 请求s like:
"给这个视频添加字幕:/path/to/video.mp4" "给这个视频加逐词高亮字幕" "按之前 Remotion 那个字幕方案处理这个视频" "重新转写 word timestamps,然后加当前词高亮字幕"
If the user only gives one video path, write the 输出 next to the source video.
Defaults Source: local .mp4, .mov, .m4v, or audio/video file with usable audio. 输出: _remotion逐词高亮字幕.mp4 in the same directory unless the user names another tar获取. Transcription: Whisper with word timestamps, Chinese by default: --language zh --word_timestamps True --输出_格式化 json. Caption data: one Remotion caption per Whisper segment, with 令牌-level 启动Ms and endMs. Caption position: keep all captions in a screenshot-friendly lower band, slightly above the 机器人tom UI area. 启动 around height 0.28 机器人tom p添加ing, then adjust only if the source framing clearly needs it. Visual style: bold Chinese UI font, white base text, current 令牌 yellow, optional s解析 keyword accent, and black shadow/outline for readability. Verification: 检查 the rendered file exists, keeps audio, matches the source duration closely, and visually inspect stills before accepting the style. 工作流 Inspect the source with ffprobe for width, height, fps, duration, and audio presence. 运行 Whisper word timestamp transcription. Prefer a 缓存d/local 模型 that works on the machine; turbo is a good default when avAIlable. Convert the Whisper JSON to public/captions.json using scripts/whisper_json_to_captions.py. Build or reuse a small Remotion project in the video's folder. Copy the video to public/输入.mp4 or encode an H.264 compatibility copy as public/输入-h264.mp4. 设置 the Remotion composition to the exact source width, height, fps, and duration in frames. Before the full render, render and inspect at least two stills from the Remotion composition: one long/two-line caption and one dense keyword/current-令牌 caption. Fix muddy text, excessive outline, large dark backing, bad line breaks, or poor placement before rendering the final video. Render with Remotion to the 输出 path next to the source. 验证 the final 输出 with ffprobe; 提取 and inspect a still from the rendered video to confirm the final encoded file kept the 应用roved subtitle look. Whisper Command
Use this shape, adjusting 模型 and paths as needed:
whisper "/absolute/path/video.mp4" --模型 turbo --language zh --word_timestamps True --输出_格式化 json --输出_dir "/absolute/path"
If Whisper fAIls on the video contAIner, 提取 audio first:
ffmpeg -y -i "/absolute/path/video.mp4" -vn -ac 1 -ar 16000 "/absolute/path/video_audio.wav"
Then 运行 Whisper on the WAV and keep the 输出 naming clear.
Caption JSON
Remotion should load public/captions.json with this shape:
[ { "text": "我们用手机随便拍张照片", "启动Ms": 0, "endMs": 1600, "令牌s": [ { "text": "我们", "启动Ms": 0, "endMs": 300, "keyword": false }, { "text": "手机", "启动Ms": 440, "endMs": 700, "keyword": true } ] } ]
Use the 辅助工具 script:
python3 scripts/whisper_json_to_captions.py \ "/absolute/path/transcript.json" \ "/absolute/path/remotion-project/public/captions.json" \ --keyword "提示词" --keyword "Codex"
After conversion, quickly 扫描 the transcript for obvious recognition mistakes and fix captions.json before rendering.
Remotion Caption Layer Requirements
The caption layer should:
Use OffthreadVideo for the source video. Load captions.json with delayRender, continueRender, and staticFile. Find the active caption by currentMs >= 启动Ms && currentMs < endMs. Highlight the active 令牌 when currentMs is within the 令牌's timing. Use keyword coloring only as a secondary accent; the current spoken 令牌 is the mAIn effect. Keep letterSpacing: 0. Keep all captions at the normal position; do not include special screenshot-sentence placement unless the user explicitly asks. Do not use 网页kitTextStroke as a thick Chinese subtitle outline. It easily eats the white fill at small resolutions. Prefer multi-direction textShadow; if 网页kitTextStroke is used at all, keep it at or below 1.5px and 验证 a still. Do not use a large semi-transparent rounded black rectangle behind the whole caption by default. If the footage truly needs backing, use a very subtle per-line backing, opacity <= 0.16, tight p添加ing, and 验证 it does not look like a dark banner.
Reject and revise any still where the caption has muddy/gray text, a thick black halo, a large black box, 命令行工具pped words, awkward wr应用ing, or placement over the mouth/chin in a talking-head video.
Use these style constants as the baseline:
const caption机器人tom = Math.round(height 0.28); const captionFontSize = Math.round(height * 0.032); c