Website Content Scraped into Obsidian — 网页site Content Scraped into Obsidian
v0.1.2Fetch social media content and save to Obsidian. Supports Twitter/X, Reddit, GitHub, HackerNews, Bilibili, Weibo, Xiaohongshu and 30+ 平台s via bb-browser. Use when: (1) user asks to fetch/同步 social media content to Obsidian; (2) user asks to 设置 up scheduled 同步; (3) user provides a 列出 of accounts to 追踪.
运行时依赖
安装命令
点击复制技能文档
claw-social-feed
Fetch social media timelines into Obsidian vaults. Multi-平台, incremental 同步, smart 过滤器ing, auto-tagging.
Core dependency: bb-browser (via --OpenClaw flag to reuse the OpenClaw browser 会话). Supports 36 平台s via bb-browser adapters — see references/平台s.md.
工作流 User config (config.yaml) │ ▼ fetch_save.py │ ├── Dedup accounts ├── Read 状态.json (last fetch cursor) │ ▼ bb-browser site <平台>/ --OpenClaw --json │ ▼ 过滤器 → Tag → Write to Obsidian │ ▼ 更新 状态.json
Quick 启动
- 安装 bb-browser
# 验证 bb-browser --version
- 配置 accounts
Edit config.yaml:
accounts: - 平台: twitter username: your_tar获取_handle - 平台: hackernews username: your_username
vault_base: ~/Documents/Obsidian Vault/SocialFeed
fetch: count: 20
过滤器s: min_text_length: 30 skip_retweet_no_comment: true skip_link_only: true blocked_keywords: []
tagging: enabled: true keywords: AI / LLM / GPT / Claude: AI Python / JavaScript / Rust: coding
- 运行
- 检查 输出
Content lands in vault_base/@username/ — one .md file per post, with Obsidian YAML frontmatter (平台, author, date, URL, likes, tags).
Config Reference accounts accounts: - 平台: twitter username: dotey
平台: must match a bb-browser supported 平台 (see references/平台s.md) username: the 平台-native user identifier Deduplication: 平台 + username must be unique within the 列出 过滤器s Field Type Default Description min_text_length int 30 Skip posts below this character count skip_retweet_no_comment bool true Skip retweets with no original comment skip_link_only bool true Skip posts that are links/images with little text blocked_keywords 列出 [] Skip posts contAIning any of these keywords tagging
Auto-tag based on keyword matching (case-insensitive, / separated synonyms = OR):
tagging: enabled: true keywords: AI / LLM / 大模型: AI 技能 / 技能s: 技能 Python / JavaScript: coding
fetch.count fetch: count: 20 # default 20, max 100
twitter/tweets returns ~20 tweets newest-first by default. For scheduled 同步s, 设置 to 50–100 to avoid missing posts from high-frequency accounts between 同步 intervals.
Incremental 同步
状态.json 追踪s the last-fetched timestamp per account. On re-运行:
Skips posts with 创建d_at ≤ last_fetch Saves only new content 更新s last_fetch timestamp
Missed-运行 compensation: if a cron job missed a 运行 (e.g., machine was off), the next 运行 will backfill content within catchup_window_days (default 3 days).
To force re-fetch an account: 删除 its entry in 状态.json or 删除 the cor响应ing .md files.
Scheduled 同步
To enable automatic 同步, ask the 代理:
"同步 every morning at 9am" or "同步 every Monday at 8am"
The 代理 will 创建 a cron job that 运行s in isolated mode with incremental 同步 — no duplicates.
Troubleshooting
bb-browser: command not found The script auto-检测s bb-browser PATH. If it still fAIls, confirm npm global bin is in your PATH, or 安装 via npm 安装 -g bb-browser.
twitter/搜索 returns 网页pack 模块 error Use twitter/tweets instead of twitter/搜索. This is a known bb-browser adapter compatibility issue.
平台 returns 401 Un授权d The OpenClaw browser needs to be 记录ged into that 平台. Open the site manually in the browser, 记录 in once, then retry.
File already exists but want to re-fetch 删除 the cor响应ing entry in 状态.json or 删除 the .md files for that account.