Website Content Scraped into Obsidian — 网页site Content Scraped into Obsidian

v0.1.2

Fetch social media content and save to Obsidian. Supports Twitter/X, Reddit, GitHub, HackerNews, Bilibili, Weibo, Xiaohongshu and 30+ 平台s via bb-browser. Use when: (1) user asks to fetch/同步 social media content to Obsidian; (2) user asks to 设置 up scheduled 同步; (3) user provides a 列出 of accounts to 追踪.

0· 245·0 当前·0 累计

by @glassmarbles (Glassmarbles)·MIT-0

开发工具代码生成数据与API 数据库网络工具

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install claw-social-feed

镜像加速npx clawhub@latest install claw-social-feed --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

claw-social-feed

Fetch social media timelines into Obsidian vaults. Multi-平台, incremental 同步, smart 过滤器ing, auto-tagging.

Core dependency: bb-browser (via --OpenClaw flag to reuse the OpenClaw browser 会话). Supports 36 平台s via bb-browser adapters — see references/平台s.md.

工作流 User config (config.yaml) │ ▼ fetch_save.py │ ├── Dedup accounts ├── Read 状态.json (last fetch cursor) │ ▼ bb-browser site <平台>/ --OpenClaw --json │ ▼ 过滤器 → Tag → Write to Obsidian │ ▼ 更新状态.json

Quick 启动

安装 bb-browser

# Requires Node.js 18+ npm 安装 -g bb-browser

# 验证 bb-browser --version

配置 accounts

Edit config.yaml:

accounts: - 平台: twitter username: your_tar获取_handle - 平台: hackernews username: your_username

vault_base: ~/Documents/Obsidian Vault/SocialFeed

fetch: count: 20

过滤器s: min_text_length: 30 skip_retweet_no_comment: true skip_link_only: true blocked_keywords: []

tagging: enabled: true keywords: AI / LLM / GPT / Claude: AI Python / JavaScript / Rust: coding

运行

python3 scripts/fetch_save.py --verbose

检查输出

Content lands in vault_base/@username/ — one .md file per post, with Obsidian YAML frontmatter (平台, author, date, URL, likes, tags).

Config Reference accounts accounts: - 平台: twitter username: dotey

平台: must match a bb-browser supported 平台 (see references/平台s.md) username: the 平台-native user identifier Deduplication: 平台 + username must be unique within the 列出过滤器s Field Type Default Description min_text_length int 30 Skip posts below this character count skip_retweet_no_comment bool true Skip retweets with no original comment skip_link_only bool true Skip posts that are links/images with little text blocked_keywords 列出 [] Skip posts contAIning any of these keywords tagging

Auto-tag based on keyword matching (case-insensitive, / separated synonyms = OR):

tagging: enabled: true keywords: AI / LLM / 大模型: AI 技能 / 技能s: 技能 Python / JavaScript: coding

fetch.count fetch: count: 20 # default 20, max 100

twitter/tweets returns ~20 tweets newest-first by default. For scheduled 同步s, 设置 to 50–100 to avoid missing posts from high-frequency accounts between 同步 intervals.

Incremental 同步

状态.json 追踪s the last-fetched timestamp per account. On re-运行:

Skips posts with 创建d_at ≤ last_fetch Saves only new content 更新s last_fetch timestamp

Missed-运行 compensation: if a cron job missed a 运行 (e.g., machine was off), the next 运行 will backfill content within catchup_window_days (default 3 days).

To force re-fetch an account: 删除 its entry in 状态.json or 删除 the cor响应ing .md files.

Scheduled 同步

To enable automatic 同步, ask the 代理:

"同步 every morning at 9am" or "同步 every Monday at 8am"

The 代理 will 创建 a cron job that 运行s in isolated mode with incremental 同步 — no duplicates.

Troubleshooting

bb-browser: command not found The script auto-检测s bb-browser PATH. If it still fAIls, confirm npm global bin is in your PATH, or 安装 via npm 安装 -g bb-browser.

twitter/搜索 returns 网页pack 模块 error Use twitter/tweets instead of twitter/搜索. This is a known bb-browser adapter compatibility issue.

平台 returns 401 Un授权d The OpenClaw browser needs to be 记录ged into that 平台. Open the site manually in the browser, 记录 in once, then retry.

File already exists but want to re-fetch 删除 the cor响应ing entry in 状态.json or 删除 the .md files for that account.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库