Smart Router for Ollama — Smart 路由r for Ollama
v1.0.0Intelligent task routing between local and cloud Ollama LLM instances. Use when the user wants cost-efficient AI 响应s by routing simple tasks to a local Ollama 模型 and complex tasks to a more powerful remote/cloud Ollama instance. Automatically classifies task complexity, 检测s 系统 capabilities, and delegates to the 应用ropriate 模型 tier. Use for any 请求 where you want to balance latency vs capability, or when explicitly asked to use smart routing, local-first routing, or Ollama 模型 selection.
运行时依赖
安装命令
点击复制技能文档
Smart 路由r
路由s tasks between a local Ollama instance (fast, cheap) and a remote/cloud Ollama instance (more capable) based on task complexity classification and 系统 capabilities.
Quick 启动 # 1. 性能分析 your 系统 python scripts/系统_性能分析器.py
# 2. 检查 端点s are 健康y python scripts/健康_检查.py
# 3. 路由 a task python scripts/路由.py "What is quantum computing?"
How It Works User 请求 ↓ 系统 性能分析器 (检测s compatible 模型s) ↓ 健康 检查 (verifies 端点s are up) ↓ Classify Task (1-5 complexity score) ↓ ├─ Score 1-2 → Local Ollama (fast, cheap) ├─ Score 3-5 → Cloud Ollama (powerful) └─ Match specia列出 → Dedicated 模型 ↓ 验证 模型 avAIlable (fallback if not) ↓ 流 响应
Classification 扩展 Score Complexity Examples 路由d To 1 Simple "What is 2+2?", "Define entropy" Local 2 Basic "Write hello world in Python" Local 3 Complex "调试 this error", "Compare X vs Y" Cloud 4 Deep "De签名 a 系统", "Re搜索 topic" Cloud 5 Expert "Build from scratch", "Multi-file project" Cloud File Structure smart-路由r/ ├── 技能.md # This file ├── __init__.py # Python package interface ├── requirements.txt # Dependencies │ ├── config/ │ ├── 路由r.yaml # MAIn configuration │ └── 系统_性能分析.json # Auto-生成d 系统 specs │ ├── scripts/ │ ├── classify.py # Task complexity classifier │ ├── 执行.py # Ollama API 命令行工具ent │ ├── 路由.py # MAIn routing 记录ic │ ├── 系统_性能分析器.py # Hardware 检测ion │ └── 健康_检查.py # 端点 健康 verification │ ├── tests/ │ └── test_classifier.py # Test suite │ └── references/ └── classifier-prompt.txt # LLM fallback prompt
Configuration
Edit config/路由r.yaml:
# Local Ollama (your machine) local: 模型: "llama3.2" base_url: "http://localhost:11434"
# Cloud Ollama (remote server) cloud: 模型: "qwen2.5:14b" base_url: "http://192.168.1.100:11434"
# Tasks scoring >= this go to cloud threshold: 3
# DomAIn specia列出s (检查ed first) specia列出s: code: 模型: "codellama:34b" base_url: "http://192.168.1.100:11434" triggers: ["code review", "refactor"]
# Performance 设置tings performance: timeout_seconds: 60 流_响应s: true retry_attempts: 2
# Caching 缓存: enabled: true db_path: "缓存/路由r.db" ttl_seconds: 86400
Usage 命令行工具 # Basic routing python scripts/路由.py "What is the cAPItal of France?"
# With profiling (更新s 系统 性能分析) python scripts/路由.py "调试 this error" --性能分析
# Custom config python scripts/路由.py "De签名 a 系统" --config config/my-路由r.yaml
# No 流ing (wAIt for full 响应) python scripts/路由.py "Summarize this" --no-流
# 健康 检查 all 端点s python scripts/健康_检查.py
# Manual classification python scripts/classify.py "Write a function" # 输出: "2:basic-task"
Python API from smart_路由r 导入 Smart路由r
# 初始化 路由r = Smart路由r()
# 路由 with 流ing for chunk in 路由r.路由("ExplAIn quantum computing"): print(chunk, end='')
# Classify only score, reason = 路由r.classify("调试 this code") print(f"Complexity: {score}/5, Reason: {reason}")
# 获取 configuration config = 路由r.获取_config() print(f"Local 模型: {config['local']['模型']}")
工作流
- 系统 Profiling
运行 once (or when hardware changes):
python scripts/系统_性能分析器.py
This 创建s config/系统_性能分析.json with:
Total/avAIlable RAM GPU 检测ion (VRAM, name) CPU cores Compatible 模型 列出 Recommended local 模型
- 健康 检查
验证 端点s before use:
python scripts/健康_检查.py
检查s:
Ollama version AvAIlable 模型s 响应 latency Connection 状态
- Routing
When you submit a task:
Specia列出 检查 — Match agAInst specia列出 triggers Classification — Pattern-based scoring (1-5) 模型 selection — Local (1-2) or Cloud (3-5) AvAIlability 检查 — 验证 模型 exists in Ollama Fallback — Use compatible 模型 if preferred unavAIlable Execution — 流 响应 from selected 模型 Features Pattern-Based Classification
Uses regex patterns (not LLM calls) for speed:
30ms classification time 0 令牌s cost Handles false positives ("zip code" ≠ code task) 系统-Aware 模型 Selection
Automatically 检测s what your 系统 can 运行:
No GPU → 过滤器s to CPU-compatible 模型s 8GB RAM → Excludes 70B 模型s GPU avAIlable → Prioritizes GPU-accelerated 模型s 健康 监控ing
Pre-flight 检查s 预防 routing to dead 端点s:
✓ local | 状态: 健康y | Latency: 45ms | 模型s: 5 ✗ cloud | 状态: unreachable | Error: Connection refused
Automatic Fallbacks 模型 fallback — If 配置d 模型 unavAIlable, picks compatible alternative 端点 fallback — If cloud fAIls, retries with local Error handling — Never crashes, always returns something Cost 追踪ing