llmfit — 本地硬件检测与LLM模型推荐

Name: llmfit — 本地硬件检测与LLM模型推荐
Rating: 1 (1 reviews)
Author: Alex Jones

Alex Jones

🧠 llmfit — 本地硬件检测与LLM模型推荐

v0.2.2

检测本地硬件（RAM、CPU、GPU/VRAM）并推荐最佳本地LLM模型，提供量化优化、速度估计和适配评分。

1· 867·7 当前·7 累计

by @alexsjones (Alex Jones)·MIT-0

AI模型访问系统工具

下载技能包

License

MIT-0

最后更新

2026/2/26

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

该技能的运行指令与其声明目的相符（检测硬件和推荐本地LLM），但安装元数据不一致，Homebrew/cargo安装源不明确——安装或运行二进制前请验证。

评估建议

该技能似乎做了它声称的（运行llmfit CLI并推荐模型），但安装元数据不一致，Homebrew tap是一个第三方源。安装或运行前：1）向维护者询问上游源或GitHub仓库和主页；2）在可信任的仓库中检查Homebrew公式或cargo包代码；3）优先从官方可追踪的发布（GitHub发布或crates.io）安装，而不是匿名tap；4）如果必须先运行二进制，请在沙盒/容器中运行`llmfit --version`和`llmfit --json system`检查输出；5）不要为此工具提供凭据或提升权限。如果您想帮助审核brew公式或包仓库，请提供安装URL，我将指出风险模式。...

详细分析 ▾

✓ 用途与能力

名称和描述与运行指令匹配：SKILL.md 告诉代理运行 llmfit CLI 检测硬件并生成模型推荐。所需二进制文件（llmfit）与该目的一致，无不相关的凭据或文件被请求。

✓ 指令范围

指令仅限于运行 llmfit 命令（system、recommend）并将输出映射到本地提供商（Ollama、vLLM、LM Studio）。它们不指示读取任意系统文件或泄露秘密。该技能建议编辑 openclaw.json 配置模型，这与其目标一致。

⚠ 安装机制

安装元数据不一致且潜在风险：SKILL.md / 注册表列出了一个Homebrew公式 'AlexsJones/llmfit'（第三方tap）和一个标记为 'cargo install llmfit' 的第二个安装条目，但被标记为 kind: 'node'（注册表也列出了 'node'）。这种不匹配（node vs cargo 标签）很混乱，防止了对安装源的明确审查。来自未知所有者的Homebrew tap 应在使用前进行审查，因为它们从第三方安装二进制文件。

✓ 凭证需求

该技能不请求环境变量或凭据。对于本地硬件检测和推荐工具，这是合理的。

✓ 持久化与权限

always 为 false 且该技能不请求或自动修改其他技能的配置。它仅推荐编辑 openclaw.json（由用户驱动）。根据技能元数据和指令，无需特权或持久存在。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.2.22026/2/17

更新了Homebrew安装指令和技能元数据中的公式路径。除了安装方法更新外，无功能或文档更改。

● 无害

安装命令点击复制

官方npx clawhub@latest install llmfit

镜像加速npx clawhub@latest install llmfit --registry https://cn.clawhub-mirror.com

技能文档

硬件感知的本地 LLM 顾问。检测你的系统配置（RAM、CPU、GPU/VRAM）并推荐真正适合的模型，附带最优量化和速度估算。

适用场景（触发短语）

当用户询问以下问题时立即使用此技能：

"我能运行什么本地模型？"
"哪些 LLM 适合我的硬件？"
"推荐一个本地模型"
"我的 GPU 最适合什么模型？"
"我能本地运行 Llama 70B 吗？"
"配置本地模型"
"设置 Ollama 模型"
"什么模型适合我的 VRAM？"
"帮我选一个本地编程模型"

以下场景也使用此技能：

用户想配置 models.providers.ollama 或 models.providers.lmstudio
用户提到本地运行模型，你需要知道什么适合
需要模型推荐且用户有本地推理能力（Ollama、vLLM、LM Studio）

快速开始

检测硬件

llmfit --json system

返回包含 CPU、RAM、GPU 名称、VRAM、多 GPU 信息以及内存是否统一（Apple Silicon）的 JSON。

获取热门推荐

llmfit recommend --json --limit 5

返回按综合评分（质量、速度、适配、上下文）排名的前 5 个模型，附带检测硬件的最优量化方案。

按用途筛选

llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 3
llmfit recommend --json --use-case chat --limit 3

有效用途：general、coding、reasoning、chat、multimodal、embedding。

按最低适配等级筛选

llmfit recommend --json --min-fit good --limit 10

适配等级（从优到差）：perfect、good、marginal。

理解输出

系统 JSON

{
  "system": {
    "cpu_name": "Apple M2 Max",
    "cpu_cores": 12,
    "total_ram_gb": 32.0,
    "available_ram_gb": 24.5,
    "has_gpu": true,
    "gpu_name": "Apple M2 Max",
    "gpu_vram_gb": 32.0,
    "gpu_count": 1,
    "backend": "Metal",
    "unified_memory": true
  }
}

适配等级说明

Perfect：模型轻松适配，还有余量。理想选择。
Good：模型适配但占用大部分可用内存。能正常工作。
Marginal：模型勉强适配。可能工作但预期更慢的性能或缩减的上下文。
TooTight：模型不适配。不要推荐。

运行模式说明

GPU：纯 GPU 推理。最快。模型权重完全加载到 VRAM。
CPU+GPU Offload：部分层在 GPU，其余在系统 RAM。比纯 GPU 慢。
CPU Only：所有推理在 CPU 上使用系统 RAM。最慢但无需 GPU。

用结果配置 OpenClaw

获取推荐后，配置用户的本地模型提供商。

Ollama

将 HuggingFace 模型名映射到 Ollama 标签。常见映射：

llmfit 名称	Ollama 标签
`meta-llama/Llama-3.1-8B-Instruct`	`llama3.1:8b`
`meta-llama/Llama-3.3-70B-Instruct`	`llama3.3:70b`
`Qwen/Qwen2.5-Coder-7B-Instruct`	`qwen2.5-coder:7b`
`Qwen/Qwen2.5-72B-Instruct`	`qwen2.5:72b`
`deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct`	`deepseek-coder-v2:16b`
`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`	`deepseek-r1:32b`
`google/gemma-2-9b-it`	`gemma2:9b`
`mistralai/Mistral-7B-Instruct-v0.3`	`mistral:7b`
`microsoft/Phi-3-mini-4k-instruct`	`phi3:mini`
`microsoft/Phi-4-mini-instruct`	`phi4-mini`

然后更新 openclaw.json：

{
  "models": {
    "providers": {
      "ollama": {
        "models": ["ollama/"]
      }
    }
  }
}

可选设置为默认：

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/"
      }
    }
  }
}

vLLM / LM Studio

直接使用 HuggingFace 模型名作为模型标识符，配合相应的前缀（vllm/ 或 lmstudio/）。

工作流示例

当用户问"我能运行什么本地模型？"：

运行 llmfit --json system 显示硬件概要
运行 llmfit recommend --json --limit 5 获取热门推荐
展示推荐结果，包含评分和适配等级
如果用户想配置某个模型，映射到相应的 Ollama/vLLM/LM Studio 标签
提议用选择的模型更新 openclaw.json

当用户要求特定用途如"推荐一个编程模型"：

运行 llmfit recommend --json --use-case coding --limit 3
展示编程专用推荐
提议通过 Ollama 拉取并配置

注意事项

llmfit 检测 NVIDIA GPU（通过 nvidia-smi）、AMD GPU（通过 rocm-smi）和 Apple Silicon（统一内存）。
多 GPU 设置会自动聚合各卡的 VRAM。
best_quant 字段告诉你最优量化——更高量化（Q6_K、Q8_0）意味着如果 VRAM 允许，质量更好。
速度估算（estimated_tps）是近似值，因硬件和量化而异。
fit_level: "TooTight" 的模型永远不应推荐给用户。

Hardware-aware local LLM advisor. Detects your system specs (RAM, CPU, GPU/VRAM) and recommends models that actually fit, with optimal quantization and speed estimates.

When to use (trigger phrases)

Use this skill immediately when the user asks any of:

"what local models can I run?"
"which LLMs fit my hardware?"
"recommend a local model"
"what's the best model for my GPU?"
"can I run Llama 70B locally?"
"configure local models"
"set up Ollama models"
"what models fit my VRAM?"
"help me pick a local model for coding"

Also use this skill when:

The user wants to configure models.providers.ollama or models.providers.lmstudio
The user mentions running models locally and you need to know what fits
A model recommendation is needed and the user has local inference capability (Ollama, vLLM, LM Studio)

Quick start

Detect hardware

llmfit --json system

Returns JSON with CPU, RAM, GPU name, VRAM, multi-GPU info, and whether memory is unified (Apple Silicon).

Get top recommendations

llmfit recommend --json --limit 5

Returns the top 5 models ranked by a composite score (quality, speed, fit, context) with optimal quantization for the detected hardware.

Filter by use case

llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 3
llmfit recommend --json --use-case chat --limit 3

Valid use cases: general, coding, reasoning, chat, multimodal, embedding.

Filter by minimum fit level

llmfit recommend --json --min-fit good --limit 10

Valid fit levels (best to worst): perfect, good, marginal.

Understanding the output

System JSON

{
  "system": {
    "cpu_name": "Apple M2 Max",
    "cpu_cores": 12,
    "total_ram_gb": 32.0,
    "available_ram_gb": 24.5,
    "has_gpu": true,
    "gpu_name": "Apple M2 Max",
    "gpu_vram_gb": 32.0,
    "gpu_count": 1,
    "backend": "Metal",
    "unified_memory": true
  }
}

Recommendation JSON

Each model in the models array includes:

Field	Meaning
`name`	HuggingFace model ID (e.g. `meta-llama/Llama-3.1-8B-Instruct`)
`provider`	Model provider (Meta, Alibaba, Google, etc.)
`params_b`	Parameter count in billions
`score`	Composite score 0–100 (higher is better)
`score_components`	Breakdown: `quality`, `speed`, `fit`, `context` (each 0–100)
`fit_level`	`Perfect`, `Good`, `Marginal`, or `TooTight`
`run_mode`	`GPU`, `CPU+GPU Offload`, or `CPU Only`
`best_quant`	Optimal quantization for the hardware (e.g. `Q5_K_M`, `Q4_K_M`)
`estimated_tps`	Estimated tokens per second
`memory_required_gb`	VRAM/RAM needed at this quantization
`memory_available_gb`	Available VRAM/RAM detected
`utilization_pct`	How much of available memory the model uses
`use_case`	What the model is designed for
`context_length`	Maximum context window

Fit levels explained

Perfect: Model fits comfortably with room to spare. Ideal choice.
Good: Model fits but uses most available memory. Will work well.
Marginal: Model barely fits. May work but expect slower performance or reduced context.
TooTight: Model does not fit. Do not recommend.

Run modes explained

GPU: Full GPU inference. Fastest. Model weights loaded entirely into VRAM.
CPU+GPU Offload: Some layers on GPU, rest in system RAM. Slower than pure GPU.
CPU Only: All inference on CPU using system RAM. Slowest but works without GPU.

Configuring OpenClaw with results

After getting recommendations, configure the user's local model provider.

For Ollama

Map the HuggingFace model name to its Ollama tag. Common mappings:

llmfit name	Ollama tag
`meta-llama/Llama-3.1-8B-Instruct`	`llama3.1:8b`
`meta-llama/Llama-3.3-70B-Instruct`	`llama3.3:70b`
`Qwen/Qwen2.5-Coder-7B-Instruct`	`qwen2.5-coder:7b`
`Qwen/Qwen2.5-72B-Instruct`	`qwen2.5:72b`
`deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct`	`deepseek-coder-v2:16b`
`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`	`deepseek-r1:32b`
`google/gemma-2-9b-it`	`gemma2:9b`
`mistralai/Mistral-7B-Instruct-v0.3`	`mistral:7b`
`microsoft/Phi-3-mini-4k-instruct`	`phi3:mini`
`microsoft/Phi-4-mini-instruct`	`phi4-mini`

Then update openclaw.json:

{
  "models": {
    "providers": {
      "ollama": {
        "models": ["ollama/"]
      }
    }
  }
}

And optionally set as default:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/"
      }
    }
  }
}

For vLLM / LM Studio

Use the HuggingFace model name directly as the model identifier with the appropriate provider prefix (vllm/ or lmstudio/).

Workflow example

When a user asks "what local models can I run?":

Run llmfit --json system to show hardware summary
Run llmfit recommend --json --limit 5 to get top picks
Present the recommendations with scores and fit levels
If the user wants to configure one, map it to the appropriate Ollama/vLLM/LM Studio tag
Offer to update openclaw.json with the chosen model

When a user asks for a specific use case like "recommend a coding model":

Run llmfit recommend --json --use-case coding --limit 3
Present the coding-specific recommendations
Offer to pull via Ollama and configure

Notes

llmfit detects NVIDIA GPUs (via nvidia-smi), AMD GPUs (via rocm-smi), and Apple Silicon (unified memory).
Multi-GPU setups aggregate VRAM across cards automatically.
The best_quant field tells you the optimal quantization — higher quant (Q6_K, Q8_0) means better quality if VRAM allows.
Speed estimates (estimated_tps) are approximate and vary by hardware and quantization.
Models with fit_level: "TooTight" should never be recommended to users.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

字段	含义
`name`	HuggingFace 模型 ID（如 `meta-llama/Llama-3.1-8B-Instruct`）
`provider`	模型提供商（Meta、Alibaba、Google 等）
`params_b`	参数量（十亿）
`score`	综合评分 0–100（越高越好）
`score_components`	分项：`quality`、`speed`、`fit`、`context`（各 0–100）
`fit_level`	`Perfect`、`Good`、`Marginal` 或 `TooTight`
`run_mode`	`GPU`、`CPU+GPU Offload` 或 `CPU Only`
`best_quant`	硬件最优量化（如 `Q5_K_M`、`Q4_K_M`）
`estimated_tps`	预估每秒 token 数
`memory_required_gb`	此量化所需 VRAM/RAM
`memory_available_gb`	检测到的可用 VRAM/RAM
`utilization_pct`	模型使用可用内存的百分比
`use_case`	模型设计用途
`context_length`	最大上下文窗口

License

运行时依赖

版本

安装命令 点击复制

技能文档

适用场景（触发短语）

快速开始

检测硬件

获取热门推荐

按用途筛选

按最低适配等级筛选

理解输出

系统 JSON

推荐 JSON

适配等级说明

运行模式说明

用结果配置 OpenClaw

Ollama

vLLM / LM Studio

工作流示例

注意事项

When to use (trigger phrases)

Quick start

Detect hardware

Get top recommendations

Filter by use case

Filter by minimum fit level

Understanding the output

System JSON

Recommendation JSON

Fit levels explained

Run modes explained

Configuring OpenClaw with results

For Ollama

For vLLM / LM Studio

Workflow example

Notes

安装命令点击复制