适用场景
用户需要安装、运行、集成、调优或调试 Ollama,用于本地或自托管模型工作流。代理负责冒烟测试、模型选择、API 使用、Modelfile 自定义、嵌入、RAG 适配检查和安全操作。
当阻塞问题是特定于本地运行时行为时使用此技能,而非通用 AI 建议:错误的模型标签、JSON 输出异常、检索效果差、推理慢、上下文大小、GPU 回退或不安全的远程暴露。
架构
记忆存储在 ~/ollama/。如果 ~/ollama/ 不存在,运行 setup.md。结构见 memory-template.md。
~/ollama/
|-- memory.md # 持久上下文和激活边界
|-- environment.md # 主机、GPU、操作系统、运行时和服务备注
|-- model-registry.md # 已批准的模型、标签、量化和适配备注
|-- modelfiles.md # 可复用的 Modelfile 模式和参数决策
|-- rag-notes.md # 嵌入选择、分块、检索检查、向量维度
-- incident-log.md # 反复出现的故障、修复和回滚备注
快速参考
仅加载当前阻塞问题所需的文件。
| 主题 | 文件 |
|---|
| 安装指南 | setup.md |
| 记忆模板 | memory-template.md |
| 安装和冒烟测试工作流 | install-and-smoke-test.md |
| 本地 API 和 OpenAI 兼容模式 | api-patterns.md |
| Modelfile 创建和上下文控制 | modelfile-workflows.md |
| 嵌入和本地 RAG 检查 | embeddings-and-rag.md |
| 运行时操作和性能调优 | operations-and-performance.md |
| 故障恢复和事件分诊 | troubleshooting.md |
要求
- 目标机器上有本地 ollama
访问权限,或有安装指导权限。
- 足够的 RAM、VRAM 和磁盘空间,满足所提议的模型和上下文窗口。
- 在将 Ollama 暴露到 localhost 之外、更改服务管理器或删除模型文件之前,必须获得用户明确批准。
- 精确的模型标签和运行时事实必须通过实时命令验证,如 ollama list
、ollama ps 和 ollama show。
永远不要仅凭记忆假设模型能力、上下文长度、量化或 GPU 使用情况。
操作覆盖范围
此技能用于实际的 Ollama 执行,而非抽象的本地 LLM 讨论。覆盖范围:
- macOS、Linux 和 Windows 上的本地安装
- pull、run、copy、show、create 和 remove 的 CLI 工作流
- http://127.0.0.1:11434/api
上的 REST API 使用和 /v1 上的 OpenAI 兼容使用
- 硬件感知的模型大小选择、上下文调优和吞吐量权衡
- 基于 Modelfile 的自定义,包括提示、参数、适配器和可复现的模型名称
- 嵌入和本地 RAG 管道,其中索引、查询和检索必须保持一致
数据存储
仅在 ~/ollama/ 中保留持久操作上下文:
- 实质性地改变建议的主机事实:操作系统、GPU 类别、仅 CPU 限制、服务管理器、远程或本地部署
- 已批准的模型标签、复制的别名、量化选择和实践中有效的上下文限制
- Modelfile 默认值、JSON 输出模式和安全 OpenAI 兼容映射
- 嵌入模型选择、向量维度、分块默认值和检索检查
- 反复出现的故障,如部分拉取、CPU 回退、端口冲突或升级失败
核心规则
1. 给出建议前验证运行时
ollama 已安装且可访问。
从最小的事实检查开始:ollama --version、ollama list、ollama ps 和一次最小生成或 /api/tags 请求。
将"能运行"和"能正确运行"视为不同状态。2. 固定精确模型名称并实时检查
- 对于任何可复现或接近生产的场景,使用精确标签,而非模糊的系列名称。
- 在声称上下文长度、量化或能力之前,用
ollama show 或 /api/show 检查真实模型。
当稳定性重要时,避免浮动标签的静默漂移。3. 分离运行时、Modelfile 和应用提示的职责
- 分层调试本地行为:先运行时,再模型定义,最后应用提示。
- 如果输出质量变了,检查 Modelfile 中的
SYSTEM、TEMPLATE 或 PARAMETER 设置是否与应用提示冲突。
将持久默认值放在命名模型中,而非临时复制粘贴的提示中。4. 根据硬件和延迟预算选择模型
- 技术上能加载但回退到 CPU 或交换内存的模型不是好的选择。
- 在承诺性能之前,使用
ollama ps 确认处理器分配。
为聊天、编码、提取、视觉和嵌入分别保持默认模型,而非强迫一个模型做所有事。5. 使 API 和结构化输出流程确定性
- 当下一步需要严格解析时,优先使用非流式响应。
- 使用
format: "json" 或 JSON schema,设置低温度,在执行下游操作前验证解析结果。
对于 OpenAI 兼容客户端,验证 /v1 假设,而非假设每个功能都 1:1 映射。6. 将嵌入和 RAG 视为单一系统
- 除非有意迁移并重新索引,否则索引和查询使用相同的嵌入模型。
- 在归咎模型回答质量差之前,先检查检索到的块。
- 在增加提示大小之前,先修复分块、元数据、top-k 和向量维度。
7. 将远程访问和升级视为操作变更
- 未经明确批准和最小风险网络计划,不要将 Ollama 绑定到非 localhost 或开放端口
11434。
在升级前记录服务管理器变更、环境变量和回滚步骤。
在大规模拉取或替换之前保护模型存储和磁盘余量。Ollama 陷阱
latest → 升级会静默改变行为并破坏可复现性。
仅用 ollama run 测试 → 应用集成在 /api 或 /v1 上仍然失败。
假设慢响应意味着"模型差" → 通常是 CPU 回退、过大上下文或磁盘压力。
让应用提示和 Modelfile 指令互相冲突 → 输出变得不一致且难以调试。
用一个嵌入模型索引,用另一个查询 → 检索质量崩溃但没有明显错误。
在 LAN 上暴露 API 而无认证或范围限制 → 本地便利变成安全问题。
在修复检索或提示结构之前追逐更大上下文 → 内存使用增加而回答质量几乎不改善。外部端点
仅在任务需要模型下载、官方文档查询或用户明确批准的可选云执行时使用外部网络访问。
| 端点 | 发送数据 | 用途 |
|---|
| https://ollama.com/* | 模型标识符、可选文档查询和可选云 API 请求 | 官方文档、库查询、由 Ollama 运行时管理的模型拉取和可选云执行 |
不向外部发送其他数据。
安全与隐私
离开你机器的数据:
- 通过 Ollama 拉取模型时的模型标识符和下载请求
- 仅当用户明确选择 https://ollama.com/api
而非本地推理时的可选提示和附件
- 对官方 Ollama 页面的可选文档查询
留在本地的数据:
- 通过用户机器上的本地 Ollama 运行时提供的提示和输出
- ~/ollama/
下的持久工作流备注
- 本地 Modelfile、检索备注和性能基线,除非用户导出
此技能不会:
- 未经明确批准将 Ollama 远程暴露
- 在技能文件中存储 OLLAMA_API_KEY
或其他密钥
- 静默混合本地和云执行
- 虚构不支持的模型功能、GPU 行为或 API 兼容性
- 在未先解释风险的情况下推荐远程安装程序或破坏性清理
信任
使用此技能时,当用户明确选择这些路径时,模型拉取和可选云请求可能会发送到 Ollama 基础设施。
仅在你信任 Ollama 处理该数据时安装。
范围
此技能仅:
- 安装、验证、操作和故障排除 Ollama 工作流
- 帮助用可复现模式选择、固定、检查和自定义模型
- 为主机约束、模型默认值和反复出现的故障修复保留本地记忆
此技能从不:
- 声称每个 Ollama 模型支持相同的工具、上下文或 JSON 可靠性
- 推荐未经认证的远程暴露作为默认
- 在未检查嵌入、分块和检索结果的情况下将本地 RAG 质量视为已解决
- 修改自身的技能文件
相关技能
如果用户确认,使用 clawhub install
安装:
ai - 确定本地 Ollama 与云推理的适用场景。
models - 在固定默认值之前比较本地模型系列、大小和能力权衡。
api - 围绕本地服务复用健壮的 HTTP 请求、重试和解析模式。
embeddings - 将向量搜索和分块策略扩展到 Ollama 运行时之外。
langchain - 将 Ollama 集成到多步骤链、代理和检索管道中。反馈
clawhub star ollama
保持更新:clawhub sync`
When to Use
User needs to install, run, integrate, tune, or debug Ollama for local or self-hosted model workflows. Agent handles smoke tests, model selection, API usage, Modelfile customization, embeddings, RAG fit checks, and safe operations.
Use this instead of generic AI advice when the blocker is specific to local runtime behavior: wrong model tag, broken JSON output, poor retrieval, slow inference, context sizing, GPU fallback, or unsafe remote exposure.
Architecture
Memory lives in ~/ollama/. If ~/ollama/ does not exist, run setup.md. See memory-template.md for structure.
~/ollama/
|-- memory.md # Durable context and activation boundaries
|-- environment.md # Host, GPU, OS, runtime, and service notes
|-- model-registry.md # Approved models, tags, quants, and fit notes
|-- modelfiles.md # Reusable Modelfile patterns and parameter decisions
|-- rag-notes.md # Embedding choices, chunking, retrieval checks, vector dimensions
-- incident-log.md # Repeated failures, fixes, and rollback notes
Quick Reference
Load only the file needed for the current blocker.
| Topic | File |
|---|
| Setup guide | setup.md |
| Memory template | memory-template.md |
| Install and smoke-test workflow | install-and-smoke-test.md |
| Local API and OpenAI-compatible patterns | api-patterns.md |
| Modelfile creation and context control | modelfile-workflows.md |
| Embeddings and local RAG checks | embeddings-and-rag.md |
| Runtime operations and performance tuning | operations-and-performance.md |
| Failure recovery and incident triage | troubleshooting.md |
Requirements
- Local ollama
access on the target machine, or permission to guide installation.
- Enough RAM, VRAM, and disk for the exact model and context window being proposed.
- Explicit user approval before exposing Ollama beyond localhost, changing service managers, or deleting model files.
- Exact model tags and runtime facts must be verified with live commands such as ollama list
, ollama ps, and ollama show.
Never assume model capabilities, context length, quantization, or GPU usage from memory alone.
Operating Coverage
This skill is for practical Ollama execution, not abstract local-LLM discussion. It covers:
- local installs on macOS, Linux, and Windows
- CLI workflows for pull, run, copy, show, create, and remove
- REST API usage on http://127.0.0.1:11434/api
and OpenAI-compatible usage on /v1
- hardware-aware model sizing, context tuning, and throughput tradeoffs
- Modelfile-based customization for prompts, parameters, adapters, and reproducible model names
- embeddings and local RAG pipelines where indexing, querying, and retrieval must stay consistent
Data Storage
Keep only durable operational context in ~/ollama/:
- host facts that materially change advice: OS, GPU class, CPU-only constraints, service manager, remote or local deployment
- approved model tags, copied aliases, quant choices, and context limits that worked in practice
- Modelfile defaults, JSON output patterns, and safe OpenAI-compatible mappings
- embedding model choices, vector dimensions, chunking defaults, and retrieval checks
- recurring failures such as partial pulls, CPU fallback, port conflicts, or broken upgrades
Core Rules
1. Verify the Runtime Before Giving Advice
ollama is installed and reachable before proposing any deeper fix.
Start with the smallest factual checks: ollama --version, ollama list, ollama ps, and one minimal generation or /api/tags request.
Treat "it runs" and "it runs correctly" as different states.2. Pin Exact Model Names and Inspect Them Live
- Use exact tags, not vague family names, for anything reproducible or production-adjacent.
- Inspect the real model with
ollama show or /api/show before claiming context length, quantization, or capabilities.
Avoid silent drift from floating tags when stability matters.3. Separate Runtime, Modelfile, and App Prompt Responsibilities
- Debug local behavior in layers: runtime first, then model definition, then application prompt.
- If output quality changed, check whether
SYSTEM, TEMPLATE, or PARAMETER settings in the Modelfile are fighting the app prompt.
Put durable defaults in a named model, not in ad hoc copy-pasted prompts.4. Choose Models by Hardware and Latency Budget
- A model that technically loads but falls back to CPU or swaps memory is not a good fit.
- Use
ollama ps to confirm processor split before promising performance.
Keep separate defaults for chat, coding, extraction, vision, and embeddings instead of forcing one model to do everything.5. Make API and Structured Output Flows Deterministic
- Prefer non-streaming responses when the next step needs strict parsing.
- Use
format: "json" or a JSON schema, set low temperature, and validate the parsed result before taking downstream actions.
For OpenAI-compatible clients, verify /v1 assumptions instead of assuming every feature maps 1:1.6. Treat Embeddings and RAG as a Single System
- Use the same embedding model for indexing and querying unless you intentionally migrate and re-index.
- Inspect retrieved chunks before blaming the model for weak answers.
- Fix chunking, metadata, top-k, and vector dimensions before increasing prompt size.
7. Treat Remote Access and Upgrades as Operational Changes
- Do not bind Ollama to non-localhost or open port
11434 without explicit approval and a minimal-risk network plan.
Record service manager changes, environment variables, and rollback steps before upgrading.
Protect model storage and disk headroom before large pulls or replacements.Ollama Traps
latest everywhere -> upgrades silently change behavior and break reproducibility.
Testing only with ollama run -> app integration still fails on /api or /v1.
Assuming slow responses mean "bad model" -> often it is CPU fallback, oversized context, or disk pressure.
Letting app prompts and Modelfile instructions fight each other -> outputs become inconsistent and hard to debug.
Re-indexing with one embedding model and querying with another -> retrieval quality collapses without obvious errors.
Exposing the API on a LAN without auth or scoping -> local convenience becomes a security problem.
Chasing larger context before fixing retrieval or prompt shape -> memory use rises while answer quality barely improves.External Endpoints
Use external network access only when the task requires model downloads, official docs lookup, or optional cloud execution explicitly approved by the user.
| Endpoint | Data Sent | Purpose |
|---|
| https://ollama.com/* | model identifiers, optional doc queries, and optional cloud API requests | Official docs, library lookups, model pulls managed by the Ollama runtime, and optional cloud execution |
No other data is sent externally.
Security & Privacy
Data that leaves your machine:
- model identifiers and download requests when pulling models through Ollama
- optional prompts and attachments only if the user explicitly chooses https://ollama.com/api
instead of local inference
- optional documentation lookups against official Ollama pages
Data that stays local:
- prompts and outputs served through the local Ollama runtime on the user machine
- durable workflow notes under ~/ollama/
- local Modelfiles, retrieval notes, and performance baselines unless the user exports them
This skill does NOT:
- expose Ollama remotely without explicit approval
- store OLLAMA_API_KEY
or other secrets in skill files
- mix local and cloud execution silently
- invent unsupported model features, GPU behavior, or API compatibility
- recommend remote installers or destructive cleanup without explaining risk first
Trust
By using this skill, model pulls and optional cloud requests may go to Ollama infrastructure when the user explicitly chooses those paths.
Only install if you trust Ollama with that data.
Scope
This skill ONLY:
- installs, verifies, operates, and troubleshoots Ollama workflows
- helps choose, pin, inspect, and customize models with reproducible patterns
- keeps local memory for host constraints, model defaults, and recurring failure fixes
This skill NEVER:
- claim that every Ollama model supports the same tools, context, or JSON reliability
- recommend unauthenticated remote exposure as a default
- treat local RAG quality as solved without checking embeddings, chunking, and retrieval results
- modify its own skill files
Related Skills
Install with clawhub install
if user confirms:
ai - Frame when local Ollama is the right fit versus cloud inference.
models - Compare local model families, sizes, and capability tradeoffs before pinning defaults.
api - Reuse robust HTTP request, retry, and parsing patterns around local services.
embeddings - Extend vector search and chunking strategy beyond the Ollama runtime itself.
langchain - Integrate Ollama into multi-step chains, agents, and retrieval pipelines.Feedback
clawhub star ollama
Stay updated: clawhub sync`