NVIDIA LocateAnything-3B vision-language grounding model
v1.0.5NVIDIA LocateAnything-3B vision-language grounding model. Covers inference API (detect/ground/point/detect_text/ground_gui), data preparation (JSONL+Recipe 8 tasks), training/fine-tuning, evaluation. For object detection, visual grounding, GUI recognition, OCR, etc.
0· 16·0 当前·0 累计
运行时依赖
无特殊依赖
安装命令
点击复制官方npx clawhub@latest install locateanything
镜像加速npx clawhub@latest install locateanything --registry https://cn.longxiaskill.com 镜像可用