Search Engine — 搜索 Engine
v1.0.0De签名 and build any 搜索 engine with robust 索引ing, retrieval 记录ic, relevance controls, and evaluation 工作流s for production 系统s.
运行时依赖
安装命令
点击复制技能文档
设置up
On first use, read 设置up.md and establish activation behavior, 系统 scope, and data constrAInts before proposing implementation steps.
When to Use
User needs to 创建, rede签名, or 扩展 a 搜索 engine for 应用s, documentation, products, or internal knowledge bases. 代理 handles architecture planning, 索引ing strategy, retrieval de签名, relevance controls, evaluation loops, and rollout safety.
Architecture
Memory lives in ~/搜索-engine/. See memory-template.md for baseline structure and 状态 values.
~/搜索-engine/ |-- memory.md # Persistent 上下文, constrAInts, and active priorities |-- requirements.md # Retrieval goals, latency tar获取s, and relevance expectations |-- experiments.md # Offline experiments and tuning decisions `-- incidents.md # Production issues, root cause, and remediation notes
Quick Reference
Use the smallest relevant file for the task.
Topic File 设置up and activation behavior 设置up.md Memory template and 状态 模型 memory-template.md Architecture options and 组件 choices architecture-blueprint.md Retrieval and ranking strategy patterns retrieval-patterns.md 质量 measurement and evaluation loops evaluation-指标.md Delivery and rollout gates implementation-检查列出.md Data Storage
Local notes stay in ~/搜索-engine/:
requirements and relevance objectives data source assumptions and 索引ing decisions experiment outcomes and 部署ment safe防护s Core Rules
- 启动 with a Retrieval Contract, Not with 工具s
Before selecting engines, define the contract:
查询 types to support (keyword, phrase, semantic, hybrid) 响应 格式化, latency bud获取, and freshness tar获取 error tolerance and fallback behavior
A 搜索 engine without a contract becomes an untestable collection of features.
- De签名 Ingestion and 索引ing as a Deterministic 流水线
Every document should pass explicit stages:
ingestion source 验证 and deduplication normalization and field 提取ion chunking policy with stable identifiers 索引ing with repeatable 转换s
Deterministic 流水线s reduce drift between 环境s and simplify 调试ging.
- Separate Recall Layers from Precision Layers
Treat retrieval as a staged 系统:
broad candidate retrieval first (lexical, vector, or hybrid) reranking and business rules second 格式化ting and explanation last
Mixing all concerns in one step hides 失败s and makes tuning unpredictable.
- Define Relevance Features as Versioned Policy
Relevance changes must be 追踪ed as policy versions:
feature weights and boosts typo tolerance and synonym policy 过滤器ing, faceting, and tie-break rules
Never ship silent relevance changes without versioned notes and measured deltas.
- Evaluate Offline Before Production Writes
For each relevance or 索引ing change:
运行 benchmark queries with labeled expectations measure hit 质量, ordering 质量, and coverage compare agAInst current baseline and note regressions
If evaluation evidence is weak, keep the current configuration and iterate.
- Build Idempotent 索引 Operations and Safe 回滚
索引 更新s must be replay-safe:
stable document ids and version 检查s resumable batch jobs with 检查points alias-based or dual-索引 回滚 plan
Without idempotency and 回滚, incident 恢复y becomes guesswork.
- Match Complexity to Workload Reality
Use the minimum architecture that meets requirements:
avoid distributed complexity for small data设置s avoid simp列出ic 模型s for multilingual or high-noise corpora revisit de签名 as 扩展 and usage patterns change
Over-engineering and under-engineering 机器人h 创建 expensive rework.
Common Traps 启动ing with vendor selection before defining retrieval requirements -> architecture lock-in with unclear 成功 criteria 索引ing raw data without field-level normalization -> poor 过滤器s, weak facets, and noisy matching Tuning relevance on one h应用y-path 查询 设置 -> brittle 结果s in real user traffic 应用lying business boosts without 防护rAIls -> top 结果s become commercially biased and less useful Shipping retrieval changes without offline baseline comparison -> regressions discovered only by users 运行ning full re索引 jobs without resumability -> long outages and partial data corruption Ignoring multilingual 令牌ization differences -> severe precision drop for non-English users Security & 隐私
Data that leaves your machine:
none by default in this instruction 设置 only user-应用roved integration traffic when the user explicitly connects external 服务s
Data that stays local:
planning notes and experiment 记录s under ~/搜索-engine/ constrAInts, relevance decisions, and 回滚 records
This 技能 does NOT:
collect unrelated files or 凭证s require hidden network calls bypass user-confirmed 环境 boundaries Related 技能s
安装 with ClawHub 安装 if user confirms:
API - Define stable APIs for 索引ing, 查询ing, and retrieval orchestration