📦 Upstage Ocr — Upstage OCR

v1.0.0

使用Upstage OCR API从图像和扫描文档中提取带有词级边界框坐标的纯文本。当用户请求OCR文档时使用，例如...

0· 0·0 当前·0 累计

by @upstage-deployment (Upstage Deployment)

文档工具 API开发 AI模型访问图像处理钉钉

下载技能包

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install upstage-ocr

镜像加速npx clawhub@latest install upstage-ocr --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

Upstage OCR 从图像和扫描文档中提取带有边界框坐标的单词级文本。快速开始

import os
import requests
response = requests.post(
    "https://api.upstage.ai/v1/document-digitization",
    headers={"Authorization": f"Bearer {os.environ['UPSTAGE_API_KEY']}"},
    files={"document": open("scan.pdf", "rb")},
    data={"model": "ocr"}
)
result = response.json()
print(result["pages"][0]["text"])

API Key：始终使用 os.environ["UPSTAGE_API_KEY"]。在 console.upstage.ai 获取您的密钥。端点模式 | 端点 | 最大页数 | 最大文件大小 -----|------|------|------ Sync | POST /v1/document-digitization | 100 | 50 MB Async | POST /v1/document-digitization/async | 1000 | 50 MB 请求格式：multipart/form-data Sync：在响应体中返回结果（超时 5 分钟）。 Async：返回 request_id；轮询状态并下载每批结果（每批 10 页）。参数参数 | 类型 | 是否必需 | 描述 -----|------|------|------ model | string | 是 | ocr（别名：ocr-250904） document | file | 是 | 要处理的文档文件 schema | string | 否 | clova 或 google（用于迁移）限制项目 | Sync | Async -----|------|------ 最大页数 | 100 | 1000 最大文件大小 | 50 MB | 50 MB 最大像素/页 | 200,000,000 | 200,000,000 选择 sync用于≤ 100 页和快速（≤ 5 分钟）处理。选择 async用于最多 1000 页的文档，当您可以轮询时，或当 sync 超时时。支持格式 JPEG、PNG、BMP、PDF、TIFF、HEIC、DOCX、PPTX、XLSX、HWP、HWPX 支持语言完全支持：字母、韩文、中文字符部分支持：片假名、平假名测试：简体中文响应结构

{
    "api": "2.0",
    "model": "ocr-250904",
    "pages": [
        {
            "id": 0,
            "text": "完整提取的文本",
            "words": [
                {
                    "id": 0,
                    "text": "单词",
                    "bounding_box": {
                        "vertices": [
                            {"x": 0.12, "y": 0.05},
                            {"x": 0.25, "y": 0.05},
                            {"x": 0.25, "y": 0.08},
                            {"x": 0.12, "y": 0.08}
                        ]
                    },
                    "confidence": 0.98
                }
            ]
        }
    ],
    "usage": {"pages": 1}
}

使用示例 Sync — 基本 OCR

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
     -H "Authorization: Bearer $UPSTAGE_API_KEY" \
     -F "document=@/path/to/image.jpg" \
     -F "model=ocr"

Sync — Python（提取带坐标的文本）

import os
import requestsdef ocr_document(file_path):
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://api.upstage.ai/v1/document-digitization",
            headers={"Authorization": f"Bearer {os.environ['UPSTAGE_API_KEY']}"},
            files={"document": f},
            data={"model": "ocr"}
        )
        result = response.json()
        for page in result["pages"]:
            print(f"=== Page {page['id']} ===")
            print(page["text"])
            for word in page["words"]:
                print(f" [{word['confidence']:.2f}] {word['text']} @ {word['bounding_box']}")
        return result

Async — 提交、轮询、下载使用 async 端点处理最多 1000 页的文档。文档以 10 页为一批进行处理；结果存储 30 天，单个下载 URL 在 15 分钟后过期。

# 1. 提交
curl -X POST "https://api.upstage.ai/v1/document-digitization/async" \
     -H "Authorization: Bearer $UPSTAGE_API_KEY" \
     -F "document=@large.pdf" \
     -F "model=ocr"
# → {"request_id": "uuid-here"}
# 2. 轮询状态
curl "https://api.upstage.ai/v1/document-digitization/requests/{request_id}" \
     -H "Authorization: Bearer $UPSTAGE_API_KEY"

状态值：submitted、started、completed、failed（检查 failure_message）。completed 响应包含每批的 download_url —— 获取每个并连接页面以重建完整文档。

import os
import time
import requests
api_key = os.environ["UPSTAGE_API_KEY"]
base = "https://api.upstage.ai/v1/document-digitization"
with open("large.pdf", "rb") as f:
    r = requests.post(
        f"{base}/async",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"document": f},
        data={"model": "ocr"},
    )
    request_id = r.json()["request_id"]
    while True:
        status = requests.get(
            f"{base}/requests/{request_id}",
            headers={"Authorization": f"Bearer {api_key}"},
        ).json()
        if status["status"] == "completed":
            break
        if status["status"] == "failed":
            raise RuntimeError(status.get("failure_message", "unknown failure"))
        time.sleep(5)    pages = []
    for batch in status.get("batches", []):
        data = requests.get(batch["download_url"]).json()
        pages.extend(data["pages"])

输出文件默认：写入 <系统临时目录>/<输入文件名>.ocr.json（例如 /tmp/receipt.ocr.json）。使用 tempfile.gettempdir() 进行跨平台代码。覆盖：如果用户指定输出路径，则使用它。始终在响应中打印解析的绝对路径，以便用户可以找到文件。提示对于超过 100 页的文档，请切换到 async 端点（最多 1000 页）。

数据来源：ClawHub ↗ · 中文优化：龙虾技能库