Search Engine — 搜索 Engine

Name: Search Engine — 搜索 Engine
Rating: 1

v1.0.0

De签名 and build any 搜索 engine with robust 索引ing, retrieval 记录ic, relevance controls, and evaluation 工作流s for production 系统s.

1· 584·0 当前·0 累计

by @ivangdavila (Iván)·MIT-0

网络工具浏览器自动化系统工具设计工具

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install search-engine

镜像加速npx clawhub@latest install search-engine --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

设置up

On first use, read 设置up.md and establish activation behavior, 系统 scope, and data constrAInts before proposing implementation steps.

When to Use

User needs to 创建, rede签名, or 扩展 a 搜索 engine for 应用s, documentation, products, or internal knowledge bases. 代理 handles architecture planning, 索引ing strategy, retrieval de签名, relevance controls, evaluation loops, and rollout safety.

Architecture

Memory lives in ~/搜索-engine/. See memory-template.md for baseline structure and 状态 values.

~/搜索-engine/ |-- memory.md # Persistent 上下文, constrAInts, and active priorities |-- requirements.md # Retrieval goals, latency tar获取s, and relevance expectations |-- experiments.md # Offline experiments and tuning decisions `-- incidents.md # Production issues, root cause, and remediation notes

Quick Reference

Use the smallest relevant file for the task.

Topic File 设置up and activation behavior 设置up.md Memory template and 状态模型 memory-template.md Architecture options and 组件 choices architecture-blueprint.md Retrieval and ranking strategy patterns retrieval-patterns.md 质量 measurement and evaluation loops evaluation-指标.md Delivery and rollout gates implementation-检查列出.md Data Storage

Local notes stay in ~/搜索-engine/:

requirements and relevance objectives data source assumptions and 索引ing decisions experiment outcomes and 部署ment safe防护s Core Rules

启动 with a Retrieval Contract, Not with 工具s

Before selecting engines, define the contract:

查询 types to support (keyword, phrase, semantic, hybrid) 响应格式化, latency bud获取, and freshness tar获取 error tolerance and fallback behavior

A 搜索 engine without a contract becomes an untestable collection of features.

De签名 Ingestion and 索引ing as a Deterministic 流水线

Every document should pass explicit stages:

ingestion source 验证 and deduplication normalization and field 提取ion chunking policy with stable identifiers 索引ing with repeatable 转换s

Deterministic 流水线s reduce drift between 环境s and simplify 调试ging.

Separate Recall Layers from Precision Layers

Treat retrieval as a staged 系统:

broad candidate retrieval first (lexical, vector, or hybrid) reranking and business rules second 格式化ting and explanation last

Mixing all concerns in one step hides 失败s and makes tuning unpredictable.

Define Relevance Features as Versioned Policy

Relevance changes must be 追踪ed as policy versions:

feature weights and boosts typo tolerance and synonym policy 过滤器ing, faceting, and tie-break rules

Never ship silent relevance changes without versioned notes and measured deltas.

Evaluate Offline Before Production Writes

For each relevance or 索引ing change:

运行 benchmark queries with labeled expectations measure hit 质量, ordering 质量, and coverage compare agAInst current baseline and note regressions

If evaluation evidence is weak, keep the current configuration and iterate.

Build Idempotent 索引 Operations and Safe 回滚

索引更新s must be replay-safe:

stable document ids and version 检查s resumable batch jobs with 检查points alias-based or dual-索引回滚 plan

Without idempotency and 回滚, incident 恢复y becomes guesswork.

Match Complexity to Workload Reality

Use the minimum architecture that meets requirements:

avoid distributed complexity for small data设置s avoid simp列出ic 模型s for multilingual or high-noise corpora revisit de签名 as 扩展 and usage patterns change

Over-engineering and under-engineering 机器人h 创建 expensive rework.

Common Traps 启动ing with vendor selection before defining retrieval requirements -> architecture lock-in with unclear 成功 criteria 索引ing raw data without field-level normalization -> poor 过滤器s, weak facets, and noisy matching Tuning relevance on one h应用y-path 查询设置 -> brittle 结果s in real user traffic 应用lying business boosts without 防护rAIls -> top 结果s become commercially biased and less useful Shipping retrieval changes without offline baseline comparison -> regressions discovered only by users 运行ning full re索引 jobs without resumability -> long outages and partial data corruption Ignoring multilingual 令牌ization differences -> severe precision drop for non-English users Security & 隐私

Data that leaves your machine:

none by default in this instruction 设置 only user-应用roved integration traffic when the user explicitly connects external 服务s

Data that stays local:

planning notes and experiment 记录s under ~/搜索-engine/ constrAInts, relevance decisions, and 回滚 records

This 技能 does NOT:

collect unrelated files or 凭证s require hidden network calls bypass user-confirmed 环境 boundaries Related 技能s

安装 with ClawHub 安装 if user confirms:

API - Define stable APIs for 索引ing, 查询ing, and retrieval orchestration

License

运行时依赖

安装命令

技能文档

相关技能推荐