Smart Router for Ollama — Smart 路由r for Ollama

v1.0.0

Intelligent task routing between local and cloud Ollama LLM instances. Use when the user wants cost-efficient AI 响应s by routing simple tasks to a local Ollama 模型 and complex tasks to a more powerful remote/cloud Ollama instance. Automatically classifies task complexity, 检测s 系统 capabilities, and delegates to the 应用ropriate 模型 tier. Use for any 请求 where you want to balance latency vs capability, or when explicitly asked to use smart routing, local-first routing, or Ollama 模型 selection.

0· 193·0 当前·0 累计

by @simoncatbot (Simon)·MIT-0

AI模型访问云服务存储部署 CI/CD DevOps

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install ollama-smart-router

镜像加速npx clawhub@latest install ollama-smart-router --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

Smart 路由r

路由s tasks between a local Ollama instance (fast, cheap) and a remote/cloud Ollama instance (more capable) based on task complexity classification and 系统 capabilities.

Quick 启动 # 1. 性能分析 your 系统 python scripts/系统_性能分析器.py

# 2. 检查端点s are 健康y python scripts/健康_检查.py

# 3. 路由 a task python scripts/路由.py "What is quantum computing?"

How It Works User 请求 ↓ 系统性能分析器 (检测s compatible 模型s) ↓ 健康检查 (verifies 端点s are up) ↓ Classify Task (1-5 complexity score) ↓ ├─ Score 1-2 → Local Ollama (fast, cheap) ├─ Score 3-5 → Cloud Ollama (powerful) └─ Match specia列出 → Dedicated 模型 ↓ 验证模型 avAIlable (fallback if not) ↓ 流响应

Classification 扩展 Score Complexity Examples 路由d To 1 Simple "What is 2+2?", "Define entropy" Local 2 Basic "Write hello world in Python" Local 3 Complex "调试 this error", "Compare X vs Y" Cloud 4 Deep "De签名 a 系统", "Re搜索 topic" Cloud 5 Expert "Build from scratch", "Multi-file project" Cloud File Structure smart-路由r/ ├── 技能.md # This file ├── __init__.py # Python package interface ├── requirements.txt # Dependencies │ ├── config/ │ ├── 路由r.yaml # MAIn configuration │ └── 系统_性能分析.json # Auto-生成d 系统 specs │ ├── scripts/ │ ├── classify.py # Task complexity classifier │ ├── 执行.py # Ollama API 命令行工具ent │ ├── 路由.py # MAIn routing 记录ic │ ├── 系统_性能分析器.py # Hardware 检测ion │ └── 健康_检查.py # 端点健康 verification │ ├── tests/ │ └── test_classifier.py # Test suite │ └── references/ └── classifier-prompt.txt # LLM fallback prompt

Configuration

Edit config/路由r.yaml:

# Local Ollama (your machine) local: 模型: "llama3.2" base_url: "http://localhost:11434"

# Cloud Ollama (remote server) cloud: 模型: "qwen2.5:14b" base_url: "http://192.168.1.100:11434"

# Tasks scoring >= this go to cloud threshold: 3

# DomAIn specia列出s (检查ed first) specia列出s: code: 模型: "codellama:34b" base_url: "http://192.168.1.100:11434" triggers: ["code review", "refactor"]

# Performance 设置tings performance: timeout_seconds: 60 流_响应s: true retry_attempts: 2

# Caching 缓存: enabled: true db_path: "缓存/路由r.db" ttl_seconds: 86400

Usage 命令行工具 # Basic routing python scripts/路由.py "What is the cAPItal of France?"

# With profiling (更新s 系统性能分析) python scripts/路由.py "调试 this error" --性能分析

# Custom config python scripts/路由.py "De签名 a 系统" --config config/my-路由r.yaml

# No 流ing (wAIt for full 响应) python scripts/路由.py "Summarize this" --no-流

# 健康检查 all 端点s python scripts/健康_检查.py

# Manual classification python scripts/classify.py "Write a function" # 输出: "2:basic-task"

Python API from smart_路由r 导入 Smart路由r

# 初始化路由r = Smart路由r()

# 路由 with 流ing for chunk in 路由r.路由("ExplAIn quantum computing"): print(chunk, end='')

# Classify only score, reason = 路由r.classify("调试 this code") print(f"Complexity: {score}/5, Reason: {reason}")

# 获取 configuration config = 路由r.获取_config() print(f"Local 模型: {config['local']['模型']}")

工作流

系统 Profiling

运行 once (or when hardware changes):

python scripts/系统_性能分析器.py

This 创建s config/系统_性能分析.json with:

Total/avAIlable RAM GPU 检测ion (VRAM, name) CPU cores Compatible 模型列出 Recommended local 模型

健康检查

验证端点s before use:

python scripts/健康_检查.py

检查s:

Ollama version AvAIlable 模型s 响应 latency Connection 状态

Routing

When you submit a task:

Specia列出检查 — Match agAInst specia列出 triggers Classification — Pattern-based scoring (1-5) 模型 selection — Local (1-2) or Cloud (3-5) AvAIlability 检查 — 验证模型 exists in Ollama Fallback — Use compatible 模型 if preferred unavAIlable Execution — 流响应 from selected 模型 Features Pattern-Based Classification

Uses regex patterns (not LLM calls) for speed:

30ms classification time 0 令牌s cost Handles false positives ("zip code" ≠ code task) 系统-Aware 模型 Selection

Automatically 检测s what your 系统 can 运行:

No GPU → 过滤器s to CPU-compatible 模型s 8GB RAM → Excludes 70B 模型s GPU avAIlable → Prioritizes GPU-accelerated 模型s 健康监控ing

Pre-flight 检查s 预防 routing to dead 端点s:

Automatic Fallbacks 模型 fallback — If 配置d 模型 unavAIlable, picks compatible alternative 端点 fallback — If cloud fAIls, retries with local Error handling — Never crashes, always returns something Cost 追踪ing

数据来源：ClawHub ↗ · 中文优化：龙虾技能库