Ollama — 本地 Ollama 模型管理

Name: Ollama — 本地 Ollama 模型管理
Author: Iván

Iván

🦙 Ollama — 本地 Ollama 模型管理

v1.0.0

运行、调优和排除本地 Ollama 模型故障，支持可靠的 API 模式、Modelfiles、嵌入和硬件感知部署工作流。

0· 610·9 当前·10 累计

by @ivangdavila (Iván)·MIT-0

AI模型访问 API工具工作流文件处理自动化

下载技能包

License

MIT-0

最后更新

2026/3/13

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能的请求和指令与其声明的目的（本地 Ollama 模型操作）一致；它读写主目录配置区域，但默认不请求无关的凭据或远程端点。

评估建议

["确认机器上有官方 'ollama' 二进制文件（不要运行不可信任的安装程序）","审查并明确批准任何写入 ~/ollama/ 的操作（技能在那里存储可持续的操作笔记）","除非明确批准和配置身份验证/防火墙，否则将端口 11434 绑定到 localhost","该技能可能运行本地命令（ollama list/serve/run 等）—— 仅在您控制的机器上授予访问权限"]...

详细分析 ▾

✓ 用途与能力

名称/描述（本地 Ollama 模型管理）与所需二进制文件（ollama）和可选工具（curl, jq）一致。所需配置路径（~/ollama/、~/.ollama/）与存储此用途的可持续本地状态一致。

ℹ 指令范围

运行时指令专注于本地 CLI 和本地 HTTP API 调用（127.0.0.1:11434）、Modelfile 工作流、嵌入和 ~/ollama/ 下的持久笔记。

✓ 安装机制

仅指令的技能，无安装规格和远程下载/提取步骤。最低风险安装姿势（依赖现有的 'ollama' 二进制文件和标准 CLI 工具）。

ℹ 凭证需求

未请求环境变量或凭据。唯一的非琐碎访问是到主目录中的用户配置路径（~/ollama/、~/.ollama/）用于可持续状态——这与声明的目的成比例，但这是用户应明确接受的持久性/隐私考虑。

✓ 持久化与权限

always:false（不强制包含）。该技能可以自主调用（平台默认），但不请求提升的持久性或修改其他技能。它记录了围绕远程暴露的防护措施，并为非本地绑定或服务管理器更改请求明确批准。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

🖥️ OSLinux · macOS · Windows

版本

latestv1.0.02026/3/13

初始发布，包含本地模型设置、稳定 JSON 输出、自定义模型工作流、更安全的远程访问和恢复手册。

● 无害

安装命令点击复制

官方npx clawhub@latest install ollama

镜像加速npx clawhub@latest install ollama --registry https://cn.clawhub-mirror.com

技能文档

适用场景

用户需要安装、运行、集成、调优或调试 Ollama，用于本地或自托管模型工作流。代理负责冒烟测试、模型选择、API 使用、Modelfile 自定义、嵌入、RAG 适配检查和安全操作。

当阻塞问题是特定于本地运行时行为时使用此技能，而非通用 AI 建议：错误的模型标签、JSON 输出异常、检索效果差、推理慢、上下文大小、GPU 回退或不安全的远程暴露。

架构

记忆存储在 ~/ollama/。如果 ~/ollama/ 不存在，运行 setup.md。结构见 memory-template.md。

~/ollama/
|-- memory.md          # 持久上下文和激活边界
|-- environment.md     # 主机、GPU、操作系统、运行时和服务备注
|-- model-registry.md  # 已批准的模型、标签、量化和适配备注
|-- modelfiles.md      # 可复用的 Modelfile 模式和参数决策
|-- rag-notes.md       # 嵌入选择、分块、检索检查、向量维度
-- incident-log.md    # 反复出现的故障、修复和回滚备注

`快速参考`

仅加载当前阻塞问题所需的文件。

主题	文件
安装指南	setup.md
记忆模板	memory-template.md
安装和冒烟测试工作流	install-and-smoke-test.md
本地 API 和 OpenAI 兼容模式	api-patterns.md
Modelfile 创建和上下文控制	modelfile-workflows.md
嵌入和本地 RAG 检查	embeddings-and-rag.md
运行时操作和性能调优	operations-and-performance.md
故障恢复和事件分诊	troubleshooting.md


要求

目标机器上有本地 ollama 访问权限，或有安装指导权限。


足够的 RAM、VRAM 和磁盘空间，满足所提议的模型和上下文窗口。
在将 Ollama 暴露到 localhost 之外、更改服务管理器或删除模型文件之前，必须获得用户明确批准。

精确的模型标签和运行时事实必须通过实时命令验证，如 ollama list、ollama ps 和 ollama show。

永远不要仅凭记忆假设模型能力、上下文长度、量化或 GPU 使用情况。

`操作覆盖范围`

此技能用于实际的 Ollama 执行，而非抽象的本地 LLM 讨论。覆盖范围：

macOS、Linux 和 Windows 上的本地安装


pull、run、copy、show、create 和 remove 的 CLI 工作流

http://127.0.0.1:11434/api 上的 REST API 使用和 /v1 上的 OpenAI 兼容使用


硬件感知的模型大小选择、上下文调优和吞吐量权衡
基于 Modelfile 的自定义，包括提示、参数、适配器和可复现的模型名称
嵌入和本地 RAG 管道，其中索引、查询和检索必须保持一致

`数据存储`

仅在 ~/ollama/中保留持久操作上下文：

实质性地改变建议的主机事实：操作系统、GPU 类别、仅 CPU 限制、服务管理器、远程或本地部署


已批准的模型标签、复制的别名、量化选择和实践中有效的上下文限制
Modelfile 默认值、JSON 输出模式和安全 OpenAI 兼容映射
嵌入模型选择、向量维度、分块默认值和检索检查
反复出现的故障，如部分拉取、CPU 回退、端口冲突或升级失败

`核心规则`

`1. 给出建议前验证运行时`

在提出任何更深层修复之前，确认 ollama 已安装且可访问。

从最小的事实检查开始：ollama --version、ollama list、ollama ps 和一次最小生成或 /api/tags 请求。


将"能运行"和"能正确运行"视为不同状态。

`2. 固定精确模型名称并实时检查`

对于任何可复现或接近生产的场景，使用精确标签，而非模糊的系列名称。

在声称上下文长度、量化或能力之前，用 ollama show 或 /api/show 检查真实模型。


当稳定性重要时，避免浮动标签的静默漂移。

`3. 分离运行时、Modelfile 和应用提示的职责`

分层调试本地行为：先运行时，再模型定义，最后应用提示。

如果输出质量变了，检查 Modelfile 中的 SYSTEM、TEMPLATE 或 PARAMETER 设置是否与应用提示冲突。


将持久默认值放在命名模型中，而非临时复制粘贴的提示中。

`4. 根据硬件和延迟预算选择模型`

技术上能加载但回退到 CPU 或交换内存的模型不是好的选择。

在承诺性能之前，使用 ollama ps 确认处理器分配。


为聊天、编码、提取、视觉和嵌入分别保持默认模型，而非强迫一个模型做所有事。

`5. 使 API 和结构化输出流程确定性`

当下一步需要严格解析时，优先使用非流式响应。

使用 format: "json" 或 JSON schema，设置低温度，在执行下游操作前验证解析结果。

对于 OpenAI 兼容客户端，验证 /v1 假设，而非假设每个功能都 1:1 映射。

`6. 将嵌入和 RAG 视为单一系统`


除非有意迁移并重新索引，否则索引和查询使用相同的嵌入模型。
在归咎模型回答质量差之前，先检查检索到的块。
在增加提示大小之前，先修复分块、元数据、top-k 和向量维度。
7. 将远程访问和升级视为操作变更

未经明确批准和最小风险网络计划，不要将 Ollama 绑定到非 localhost 或开放端口 11434。


在升级前记录服务管理器变更、环境变量和回滚步骤。
在大规模拉取或替换之前保护模型存储和磁盘余量。

`Ollama 陷阱`

到处使用 latest → 升级会静默改变行为并破坏可复现性。

仅用 ollama run 测试 → 应用集成在 /api 或 /v1 上仍然失败。


假设慢响应意味着"模型差" → 通常是 CPU 回退、过大上下文或磁盘压力。
让应用提示和 Modelfile 指令互相冲突 → 输出变得不一致且难以调试。
用一个嵌入模型索引，用另一个查询 → 检索质量崩溃但没有明显错误。
在 LAN 上暴露 API 而无认证或范围限制 → 本地便利变成安全问题。
在修复检索或提示结构之前追逐更大上下文 → 内存使用增加而回答质量几乎不改善。

`外部端点`

仅在任务需要模型下载、官方文档查询或用户明确批准的可选云执行时使用外部网络访问。

端点	发送数据	用途
https://ollama.com/*	模型标识符、可选文档查询和可选云 API 请求	官方文档、库查询、由 Ollama 运行时管理的模型拉取和可选云执行


不向外部发送其他数据。
安全与隐私
离开你机器的数据：

通过 Ollama 拉取模型时的模型标识符和下载请求

仅当用户明确选择 https://ollama.com/api 而非本地推理时的可选提示和附件


对官方 Ollama 页面的可选文档查询

留在本地的数据：

通过用户机器上的本地 Ollama 运行时提供的提示和输出

~/ollama/ 下的持久工作流备注


本地 Modelfile、检索备注和性能基线，除非用户导出

此技能不会：

未经明确批准将 Ollama 远程暴露

在技能文件中存储 OLLAMA_API_KEY 或其他密钥


静默混合本地和云执行
虚构不支持的模型功能、GPU 行为或 API 兼容性
在未先解释风险的情况下推荐远程安装程序或破坏性清理

`信任`

使用此技能时，当用户明确选择这些路径时，模型拉取和可选云请求可能会发送到 Ollama 基础设施。仅在你信任 Ollama 处理该数据时安装。

`范围`

此技能仅：

安装、验证、操作和故障排除 Ollama 工作流


帮助用可复现模式选择、固定、检查和自定义模型
为主机约束、模型默认值和反复出现的故障修复保留本地记忆

此技能从不：

声称每个 Ollama 模型支持相同的工具、上下文或 JSON 可靠性


推荐未经认证的远程暴露作为默认
在未检查嵌入、分块和检索结果的情况下将本地 RAG 质量视为已解决
修改自身的技能文件

`相关技能`


如果用户确认，使用

clawhub install

 安装：

ai - 确定本地 Ollama 与云推理的适用场景。

models - 在固定默认值之前比较本地模型系列、大小和能力权衡。

api - 围绕本地服务复用健壮的 HTTP 请求、重试和解析模式。

embeddings - 将向量搜索和分块策略扩展到 Ollama 运行时之外。

langchain - 将 Ollama 集成到多步骤链、代理和检索管道中。

`反馈`

如果有用：clawhub star ollama

保持更新：clawhub sync`

When to Use

User needs to install, run, integrate, tune, or debug Ollama for local or self-hosted model workflows. Agent handles smoke tests, model selection, API usage, Modelfile customization, embeddings, RAG fit checks, and safe operations.

Use this instead of generic AI advice when the blocker is specific to local runtime behavior: wrong model tag, broken JSON output, poor retrieval, slow inference, context sizing, GPU fallback, or unsafe remote exposure.

Architecture

Memory lives in ~/ollama/. If ~/ollama/ does not exist, run setup.md. See memory-template.md for structure.

~/ollama/
|-- memory.md          # Durable context and activation boundaries
|-- environment.md     # Host, GPU, OS, runtime, and service notes
|-- model-registry.md  # Approved models, tags, quants, and fit notes
|-- modelfiles.md      # Reusable Modelfile patterns and parameter decisions
|-- rag-notes.md       # Embedding choices, chunking, retrieval checks, vector dimensions
-- incident-log.md    # Repeated failures, fixes, and rollback notes

`Quick Reference`

Load only the file needed for the current blocker.

Topic	File
Setup guide	setup.md
Memory template	memory-template.md
Install and smoke-test workflow	install-and-smoke-test.md
Local API and OpenAI-compatible patterns	api-patterns.md
Modelfile creation and context control	modelfile-workflows.md
Embeddings and local RAG checks	embeddings-and-rag.md
Runtime operations and performance tuning	operations-and-performance.md
Failure recovery and incident triage	troubleshooting.md


Requirements

Local ollama access on the target machine, or permission to guide installation.


Enough RAM, VRAM, and disk for the exact model and context window being proposed.
Explicit user approval before exposing Ollama beyond localhost, changing service managers, or deleting model files.

Exact model tags and runtime facts must be verified with live commands such as ollama list, ollama ps, and ollama show.

Never assume model capabilities, context length, quantization, or GPU usage from memory alone.

`Operating Coverage`

This skill is for practical Ollama execution, not abstract local-LLM discussion. It covers:

local installs on macOS, Linux, and Windows


CLI workflows for pull, run, copy, show, create, and remove

REST API usage on http://127.0.0.1:11434/api and OpenAI-compatible usage on /v1


hardware-aware model sizing, context tuning, and throughput tradeoffs
Modelfile-based customization for prompts, parameters, adapters, and reproducible model names
embeddings and local RAG pipelines where indexing, querying, and retrieval must stay consistent

`Data Storage`

Keep only durable operational context in ~/ollama/:

host facts that materially change advice: OS, GPU class, CPU-only constraints, service manager, remote or local deployment


approved model tags, copied aliases, quant choices, and context limits that worked in practice
Modelfile defaults, JSON output patterns, and safe OpenAI-compatible mappings
embedding model choices, vector dimensions, chunking defaults, and retrieval checks
recurring failures such as partial pulls, CPU fallback, port conflicts, or broken upgrades

`Core Rules`

`1. Verify the Runtime Before Giving Advice`

Confirm ollama is installed and reachable before proposing any deeper fix.

Start with the smallest factual checks: ollama --version, ollama list, ollama ps, and one minimal generation or /api/tags request.


Treat "it runs" and "it runs correctly" as different states.

`2. Pin Exact Model Names and Inspect Them Live`

Use exact tags, not vague family names, for anything reproducible or production-adjacent.

Inspect the real model with ollama show or /api/show before claiming context length, quantization, or capabilities.


Avoid silent drift from floating tags when stability matters.

`3. Separate Runtime, Modelfile, and App Prompt Responsibilities`

Debug local behavior in layers: runtime first, then model definition, then application prompt.

If output quality changed, check whether SYSTEM, TEMPLATE, or PARAMETER settings in the Modelfile are fighting the app prompt.


Put durable defaults in a named model, not in ad hoc copy-pasted prompts.

`4. Choose Models by Hardware and Latency Budget`

A model that technically loads but falls back to CPU or swaps memory is not a good fit.

Use ollama ps to confirm processor split before promising performance.


Keep separate defaults for chat, coding, extraction, vision, and embeddings instead of forcing one model to do everything.

`5. Make API and Structured Output Flows Deterministic`

Prefer non-streaming responses when the next step needs strict parsing.

Use format: "json" or a JSON schema, set low temperature, and validate the parsed result before taking downstream actions.

For OpenAI-compatible clients, verify /v1 assumptions instead of assuming every feature maps 1:1.

`6. Treat Embeddings and RAG as a Single System`


Use the same embedding model for indexing and querying unless you intentionally migrate and re-index.
Inspect retrieved chunks before blaming the model for weak answers.
Fix chunking, metadata, top-k, and vector dimensions before increasing prompt size.
7. Treat Remote Access and Upgrades as Operational Changes

Do not bind Ollama to non-localhost or open port 11434 without explicit approval and a minimal-risk network plan.


Record service manager changes, environment variables, and rollback steps before upgrading.
Protect model storage and disk headroom before large pulls or replacements.

`Ollama Traps`

Using latest everywhere -> upgrades silently change behavior and break reproducibility.

Testing only with ollama run -> app integration still fails on /api or /v1.


Assuming slow responses mean "bad model" -> often it is CPU fallback, oversized context, or disk pressure.
Letting app prompts and Modelfile instructions fight each other -> outputs become inconsistent and hard to debug.
Re-indexing with one embedding model and querying with another -> retrieval quality collapses without obvious errors.
Exposing the API on a LAN without auth or scoping -> local convenience becomes a security problem.
Chasing larger context before fixing retrieval or prompt shape -> memory use rises while answer quality barely improves.

`External Endpoints`

Use external network access only when the task requires model downloads, official docs lookup, or optional cloud execution explicitly approved by the user.

Endpoint	Data Sent	Purpose
https://ollama.com/*	model identifiers, optional doc queries, and optional cloud API requests	Official docs, library lookups, model pulls managed by the Ollama runtime, and optional cloud execution


No other data is sent externally.
Security & Privacy
Data that leaves your machine:

model identifiers and download requests when pulling models through Ollama

optional prompts and attachments only if the user explicitly chooses https://ollama.com/api instead of local inference


optional documentation lookups against official Ollama pages

Data that stays local:

prompts and outputs served through the local Ollama runtime on the user machine

durable workflow notes under ~/ollama/


local Modelfiles, retrieval notes, and performance baselines unless the user exports them

This skill does NOT:

expose Ollama remotely without explicit approval

store OLLAMA_API_KEY or other secrets in skill files


mix local and cloud execution silently
invent unsupported model features, GPU behavior, or API compatibility
recommend remote installers or destructive cleanup without explaining risk first

`Trust`

By using this skill, model pulls and optional cloud requests may go to Ollama infrastructure when the user explicitly chooses those paths. Only install if you trust Ollama with that data.

`Scope`

This skill ONLY:

installs, verifies, operates, and troubleshoots Ollama workflows


helps choose, pin, inspect, and customize models with reproducible patterns
keeps local memory for host constraints, model defaults, and recurring failure fixes

This skill NEVER:

claim that every Ollama model supports the same tools, context, or JSON reliability


recommend unauthenticated remote exposure as a default
treat local RAG quality as solved without checking embeddings, chunking, and retrieval results
modify its own skill files

`Related Skills`


Install with

clawhub install

 if user confirms:

ai - Frame when local Ollama is the right fit versus cloud inference.

models - Compare local model families, sizes, and capability tradeoffs before pinning defaults.

api - Reuse robust HTTP request, retry, and parsing patterns around local services.

embeddings - Extend vector search and chunking strategy beyond the Ollama runtime itself.

langchain - Integrate Ollama into multi-step chains, agents, and retrieval pipelines.

`Feedback`

If useful: clawhub star ollama

Stay updated: clawhub sync`

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

适用场景

架构

快速参考

要求

操作覆盖范围

数据存储

核心规则

1. 给出建议前验证运行时

2. 固定精确模型名称并实时检查

3. 分离运行时、Modelfile 和应用提示的职责

4. 根据硬件和延迟预算选择模型

5. 使 API 和结构化输出流程确定性

6. 将嵌入和 RAG 视为单一系统

7. 将远程访问和升级视为操作变更

Ollama 陷阱

外部端点

安全与隐私

信任

范围

相关技能

反馈

When to Use

Architecture

Quick Reference

Requirements

Operating Coverage

Data Storage

Core Rules

1. Verify the Runtime Before Giving Advice

2. Pin Exact Model Names and Inspect Them Live

3. Separate Runtime, Modelfile, and App Prompt Responsibilities

4. Choose Models by Hardware and Latency Budget

5. Make API and Structured Output Flows Deterministic

6. Treat Embeddings and RAG as a Single System

7. Treat Remote Access and Upgrades as Operational Changes

Ollama Traps

External Endpoints

Security & Privacy

Trust

Scope

Related Skills

Feedback

安装命令点击复制

`快速参考`

`操作覆盖范围`

`数据存储`

`核心规则`

`1. 给出建议前验证运行时`

`2. 固定精确模型名称并实时检查`

`3. 分离运行时、Modelfile 和应用提示的职责`

`4. 根据硬件和延迟预算选择模型`

`5. 使 API 和结构化输出流程确定性`

`6. 将嵌入和 RAG 视为单一系统`

`Ollama 陷阱`

`外部端点`

`信任`

`范围`

`相关技能`

`反馈`

`Quick Reference`

`Operating Coverage`

`Data Storage`

`Core Rules`

`1. Verify the Runtime Before Giving Advice`

`2. Pin Exact Model Names and Inspect Them Live`

`3. Separate Runtime, Modelfile, and App Prompt Responsibilities`

`4. Choose Models by Hardware and Latency Budget`

`5. Make API and Structured Output Flows Deterministic`

`6. Treat Embeddings and RAG as a Single System`

`Ollama Traps`

`External Endpoints`

`Trust`

`Scope`

`Related Skills`

`Feedback`