Align task API and add FunCaptcha support

2026-03-12 19:32:59 +08:00
parent ef9518deeb
commit bc6776979e
33 changed files with 3446 additions and 672 deletions
--- a/README.md
+++ b/README.md
@@ -30,9 +30,15 @@

 | 类型 | 模型 | 说明 |
 |------|------|------|
-| slide | GapDetectorCNN | 滑块缺口检测 (OpenCV 优先 + CNN 兜底) |
+| slide | GapDetectorCNN | 滑块缺口检测 (统一输出缺口中心 x，OpenCV 优先 + CNN 兜底) |
 | rotate | RotationRegressor | 旋转角度回归 (sin/cos 编码) |

+### FunCaptcha 专项
+
+| question | 模型 | 说明 |
+|------|------|------|
+| 4_3d_rollball_animals | FunCaptchaSiamese | 整张 challenge 图裁切后做 reference/candidate 配对打分，返回 `objects` |
+
 ## 安装

 ```bash
@@ -49,81 +55,116 @@ uv sync --extra cv
 uv sync --extra dev
 ```

+说明：
+- 项目当前通过 `pyproject.toml` 将 `onnxruntime` 约束在 `<1.24`，以保持 Python 3.10 环境下的 `uv` 可安装性。
+- Linux `x86_64` 环境下，`uv sync` 会从官方 PyTorch `cu121` index 安装 `torch==2.5.1` 和 `torchvision==0.20.1`。这组版本已验证可在 GTX 1050 Ti (`sm_61`) 上执行 CUDA。
+- 仓库之前自动解析到的 `torch 2.10 + cu128` 对 GTX 1050 Ti 不兼容；如果后续升级 `torch`，先重新验证 GPU 实际能执行 CUDA 张量运算。
+
 ## 快速开始

 ### 1. 生成训练数据

 ```bash
-python cli.py generate --type normal --num 60000
-python cli.py generate --type math --num 60000
-python cli.py generate --type 3d_text --num 80000
-python cli.py generate --type 3d_rotate --num 60000
-python cli.py generate --type 3d_slider --num 60000
-python cli.py generate --type classifier --num 50000
+uv run captcha generate --type normal --num 60000
+uv run captcha generate --type math --num 60000
+uv run captcha generate --type 3d_text --num 80000
+uv run captcha generate --type 3d_rotate --num 60000
+uv run captcha generate --type 3d_slider --num 60000
+uv run captcha generate --type classifier --num 50000
 ```

 ### 2. 训练模型

 ```bash
 # 逐个训练
-python -m training.train_normal
-python -m training.train_math
-python -m training.train_3d_text
-python -m training.train_3d_rotate
-python -m training.train_3d_slider
-python -m training.train_classifier
+uv run captcha train --model normal
+uv run captcha train --model math
+uv run captcha train --model 3d_text
+uv run captcha train --model 3d_rotate
+uv run captcha train --model 3d_slider
+uv run captcha train --model classifier

 # 或通过 CLI 一键训练
-python cli.py train --all
+uv run captcha train --all
 ```

-训练支持断点续训：检测到已有 checkpoint 会自动从上次中断处继续。
+OCR / 回归训练在合成数据指纹与 checkpoint 一致时支持断点续训；生成规则变化会自动刷新数据并从 epoch 1 重新训练。分类器和 rotate solver 当前仍按整轮训练处理。

 ### 3. 导出 ONNX

 ```bash
-python cli.py export --all
+uv run captcha export --all
 # 或单个导出
-python cli.py export --model normal
+uv run captcha export --model normal
+uv run captcha export --model 4_3d_rollball_animals
 ```

+导出会同时生成 `<model>.meta.json` sidecar，保存 OCR 字符集、分类器类别顺序、回归标签范围或 FunCaptcha challenge 裁切元信息，部署推理优先读取这些 metadata。
+
 ### 4. 推理

 ```bash
 # 单张识别 (自动分类 + 识别)
-python cli.py predict image.png
+uv run captcha predict image.png

 # 指定类型跳过分类
-python cli.py predict image.png --type normal
+uv run captcha predict image.png --type normal

 # 批量识别
-python cli.py predict-dir ./test_images/
+uv run captcha predict-dir ./test_images/
+
+# FunCaptcha 专项识别
+uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals
 ```

 ### 5. 交互式 Solver

 ```bash
 # 生成 Solver 训练数据
-python cli.py generate-solver slide --num 30000
-python cli.py generate-solver rotate --num 50000
+uv run captcha generate-solver slide --num 30000
+uv run captcha generate-solver rotate --num 50000

 # 训练
-python cli.py train-solver slide
-python cli.py train-solver rotate
+uv run captcha train-solver slide
+uv run captcha train-solver rotate

 # 求解
-python cli.py solve slide --bg bg.png --tpl tpl.png
-python cli.py solve rotate --image img.png
+uv run captcha solve slide --bg bg.png --tpl tpl.png
+uv run captcha solve rotate --image img.png
+```
+
+### 6. FunCaptcha 专项训练
+
+准备整张 challenge 标注图到 `data/real/funcaptcha/4_3d_rollball_animals/`，文件名前缀为正确候选索引，例如 `2_demo.jpg`。
+
+```bash
+uv run captcha train-funcaptcha --question 4_3d_rollball_animals
+uv run captcha export --model 4_3d_rollball_animals
+uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals
 ```

 ## HTTP API

 ```bash
 uv sync --extra server
-python cli.py serve --port 8080
+uv run captcha serve --port 8080
 ```

-### POST /solve — base64 图片识别
+如需和 `ohmycaptcha` / YesCaptcha 风格客户端对齐，可在启动前设置 `CLIENT_KEY`：
+
+```bash
+CLIENT_KEY=local uv run captcha serve --port 8080
+```
+
+如需让回调接收方校验来源，可再设置 `CALLBACK_SIGNING_SECRET`；服务会在回调请求头里附带 HMAC-SHA256 签名：
+
+```bash
+CLIENT_KEY=local CALLBACK_SIGNING_SECRET=shared-secret uv run captcha serve --port 8080
+```
+
+同步/异步接口都提供根路径和 `/api/v1/*` 兼容别名，例如 `/solve` 与 `/api/v1/solve`、`/createTask` 与 `/api/v1/createTask` 都可用。
+
+### POST /solve — base64 图片识别（同步）

 ```bash
 curl -X POST http://localhost:8080/solve \
@@ -142,6 +183,17 @@ curl -X POST http://localhost:8080/solve \

 `type` 可选，省略则自动分类。可选值：`normal` / `math` / `3d_text` / `3d_rotate` / `3d_slider`

+如需专项 FunCaptcha 路由，可额外传 `question`，例如：
+
+```json
+{
+  "image": "<base64 编码的图片>",
+  "question": "4_3d_rollball_animals"
+}
+```
+
+此时响应会额外包含 `objects`。
+
 响应：

 ```json
@@ -153,17 +205,118 @@ curl -X POST http://localhost:8080/solve \
 }
 ```

-### POST /solve/upload — 文件上传识别
+### POST /solve/upload — 文件上传识别（同步）

 ```bash
 curl -X POST "http://localhost:8080/solve/upload?type=normal" \
  -F "image=@captcha.png"
 ```

-### GET /health — 健康检查
+### POST /createTask — 创建异步识别任务
+
+接口风格参考 `ohmycaptcha` 的 `taskId` 轮询方案，适合需要统一异步协议的接入方。任务结果会持久化到 `data/server_tasks/`，服务重启后仍可继续查询，默认保留 10 分钟；如设置了 `CLIENT_KEY`，则 `clientKey` 必须匹配。`callbackUrl`、`softId`、`languagePool` 字段可传入，其中 `callbackUrl` 会在任务完成后收到一次 `application/x-www-form-urlencoded` POST 回调；默认失败重试 2 次，可通过 `SERVER_CONFIG` 调整超时、重试次数和退避间隔。如设置了 `CALLBACK_SIGNING_SECRET`，回调还会带上 `X-CaptchaBreaker-Timestamp`、`X-CaptchaBreaker-Signature-Alg`、`X-CaptchaBreaker-Signature`。普通 OCR 任务走 `task.captchaType`，专项 FunCaptcha 任务走 `task.question`。
+
+```bash
+curl -X POST http://localhost:8080/createTask \
+  -H "Content-Type: application/json" \
+  -d '{"clientKey":"local","task":{"type":"ImageToTextTask","body":"'"$(base64 -w0 captcha.png)"'","captchaType":"normal"}}'
+```
+
+FunCaptcha 示例：
+
+```bash
+curl -X POST http://localhost:8080/createTask \
+  -H "Content-Type: application/json" \
+  -d '{"clientKey":"local","task":{"type":"FunCaptcha","body":"'"$(base64 -w0 challenge.jpg)"'","question":"4_3d_rollball_animals"}}'
+```
+
+响应：

 ```json
-{"status": "ok", "models_loaded": true}
+{
+  "errorId": 0,
+  "taskId": "4ec6f1904da2446caa6c6313c0f7d2b0",
+  "status": "processing",
+  "createTime": 1710000000,
+  "expiresAt": 1710000600
+}
+```
+
+### POST /getTaskResult — 查询异步任务结果
+
+```bash
+curl -X POST http://localhost:8080/getTaskResult \
+  -H "Content-Type: application/json" \
+  -d '{"clientKey":"local","taskId":"4ec6f1904da2446caa6c6313c0f7d2b0"}'
+```
+
+处理中：
+
+```json
+{
+  "errorId": 0,
+  "taskId": "4ec6f1904da2446caa6c6313c0f7d2b0",
+  "status": "processing",
+  "createTime": 1710000000
+}
+```
+
+完成：
+
+```json
+{
+  "errorId": 0,
+  "taskId": "4ec6f1904da2446caa6c6313c0f7d2b0",
+  "status": "ready",
+  "cost": "0.00000",
+  "ip": "127.0.0.1",
+  "createTime": 1710000000,
+  "endTime": 1710000001,
+  "expiresAt": 1710000600,
+  "solveCount": 1,
+  "task": {
+    "type": "ImageToTextTask",
+    "captchaType": "normal"
+  },
+  "callback": {
+    "configured": true,
+    "url": "https://example.com/callback",
+    "attempts": 1,
+    "delivered": true,
+    "deliveredAt": 1710000001,
+    "lastError": null
+  },
+  "solution": {
+    "text": "A3B8",
+    "answer": "A3B8",
+    "raw": "A3B8",
+    "captchaType": "normal",
+    "timeMs": 12.3
+  }
+}
+```
+
+### POST /getBalance — 本地兼容接口
+
+```json
+{"errorId": 0, "balance": 999999.0}
+```
+
+### GET /health 或 /api/v1/health — 健康检查
+
+```json
+{
+  "status": "ok",
+  "models_loaded": true,
+  "client_key_required": false,
+  "async_tasks": {
+    "active": 0,
+    "processing": 0,
+    "ready": 0,
+    "failed": 0,
+    "ttl_seconds": 600
+  }
+}
 ```

 ## 项目结构
@@ -188,11 +341,13 @@ curl -X POST "http://localhost:8080/solve/upload?type=normal" \
 │   ├── gap_detector.py       # 滑块缺口检测
 │   └── rotation_regressor.py # 旋转角度回归
 ├── training/                 # 训练脚本
+│   ├── data_fingerprint.py   # 合成数据指纹 / manifest
 │   ├── train_utils.py        # CTC 训练通用逻辑
 │   ├── train_regression_utils.py  # 回归训练通用逻辑
 │   ├── dataset.py            # 通用 Dataset 类
 │   └── train_*.py            # 各模型训练入口
 ├── inference/                # 推理 (仅依赖 onnxruntime)
+│   ├── model_metadata.py     # ONNX sidecar metadata
 │   ├── pipeline.py           # 核心推理流水线
 │   ├── export_onnx.py        # ONNX 导出
 │   └── math_eval.py          # 算式计算
@@ -226,7 +381,7 @@ python -m pytest tests/ -v

 ## 技术栈

- Python 3.10+
+- Python 3.10-3.12
 - PyTorch 2.x (训练)
 - ONNX + ONNXRuntime (推理部署)
 - FastAPI + uvicorn (HTTP 服务)