Align task API and add FunCaptcha support

2026-03-12 19:32:59 +08:00
parent ef9518deeb
commit bc6776979e
33 changed files with 3446 additions and 672 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -6,13 +6,26 @@

 ## 技术栈

- Python 3.10+
+- Python 3.10-3.12
 - uv (包管理，依赖定义在 pyproject.toml)
 - PyTorch 2.x (训练)
 - ONNX + ONNXRuntime (推理部署)
 - Pillow (图像处理)
+- OpenCV (可选，滑块求解与相关测试)
 - FastAPI (可选，提供 HTTP 识别服务)

+## 当前命令入口
+
+- 优先使用 `uv run captcha ...` 调用 CLI；`pyproject.toml` 已将 `captcha` 映射到 `cli:main`
+- 额外依赖按需安装：
+  - `uv sync --extra server`：HTTP 服务
+  - `uv sync --extra cv`：OpenCV / 滑块 solver
+  - `uv sync --extra dev`：pytest
+- Linux `x86_64` 环境通过 `pyproject.toml` 的 `tool.uv.sources` / `tool.uv.index` 固定从官方 `cu121` index 安装 `torch==2.5.1` 与 `torchvision==0.20.1`
+- 该组合已验证可在 GTX 1050 Ti (`sm_61`) 上执行 CUDA；仓库之前解析到的 `torch 2.10.0 + cu128` 不兼容这张卡
+- Python 3.10 环境下保持 `onnxruntime < 1.24`，避免 `uv` 解析到无 `cp310` wheel 的版本
+- 当前 CLI 子命令包括：`generate`、`train`、`export`、`predict`、`predict-dir`、`serve`、`generate-solver`、`train-solver`、`solve`、`train-funcaptcha`、`predict-funcaptcha`
+
 ## 项目结构

 ```
@@ -32,7 +45,9 @@ captcha-breaker/
 │   │   ├── math/
 │   │   ├── 3d_text/
 │   │   ├── 3d_rotate/
-│   │   └── 3d_slider/
+│   │   ├── 3d_slider/
+│   │   └── funcaptcha/
+│   │       └── 4_3d_rollball_animals/
 │   ├── classifier/              # 调度分类器训练数据 (混合各类型)
 │   └── solver/                  # Solver 训练数据
 │       ├── slide/               # 滑块缺口检测训练数据
@@ -54,7 +69,8 @@ captcha-breaker/
 │   ├── threed_cnn.py            # 3D文字验证码专用模型 (更深的CNN)
 │   ├── regression_cnn.py        # 回归CNN (3D旋转+滑块, ~1MB)
 │   ├── gap_detector.py          # 滑块缺口检测CNN (~1MB)
-│   └── rotation_regressor.py    # 旋转角度回归 sin/cos (~2MB)
+│   ├── rotation_regressor.py    # 旋转角度回归 sin/cos (~2MB)
+│   └── fun_captcha_siamese.py   # FunCaptcha 专项 Siamese
 ├── training/
 │   ├── __init__.py
 │   ├── train_classifier.py      # 训练调度模型
@@ -65,13 +81,17 @@ captcha-breaker/
 │   ├── train_3d_slider.py       # 训练3D滑块回归
 │   ├── train_slide.py           # 训练滑块缺口检测
 │   ├── train_rotate_solver.py   # 训练旋转角度回归
+│   ├── train_funcaptcha_rollball.py # 训练 4_3d_rollball_animals
 │   ├── train_utils.py           # CTC 训练通用逻辑
 │   ├── train_regression_utils.py # 回归训练通用逻辑
+│   ├── data_fingerprint.py      # 合成数据指纹 / manifest
 │   └── dataset.py               # 通用 Dataset 类
 ├── inference/
 │   ├── __init__.py
 │   ├── pipeline.py              # 核心推理流水线 (调度+识别)
+│   ├── fun_captcha.py           # FunCaptcha 专项推理
 │   ├── export_onnx.py           # PyTorch → ONNX 导出脚本
+│   ├── model_metadata.py        # ONNX sidecar metadata
 │   └── math_eval.py             # 算式计算模块
 ├── solvers/                     # 交互式验证码求解器
 │   ├── __init__.py
@@ -89,16 +109,19 @@ captcha-breaker/
 │   ├── threed_rotate.pth
 │   ├── threed_slider.pth
 │   ├── gap_detector.pth
-│   └── rotation_regressor.pth
+│   ├── rotation_regressor.pth
+│   └── funcaptcha_rollball_animals.pth
 ├── onnx_models/                 # 导出的 ONNX 模型
 │   ├── classifier.onnx
+│   ├── classifier.meta.json
 │   ├── normal.onnx
 │   ├── math.onnx
 │   ├── threed_text.onnx
 │   ├── threed_rotate.onnx
 │   ├── threed_slider.onnx
 │   ├── gap_detector.onnx
-│   └── rotation_regressor.onnx
+│   ├── rotation_regressor.onnx
+│   └── funcaptcha_rollball_animals.onnx
 ├── server.py                    # FastAPI 推理服务 (可选)
 ├── cli.py                       # 命令行入口
 └── tests/
@@ -123,6 +146,10 @@ captcha-breaker/
         "A3B8" "3+8=?"→11 "X9K2"  "135"      "87"
 ```

+补充规则:
+- ONNX 导出时同步生成 `<model>.meta.json`，保存 OCR 字符集、分类器类别顺序、回归标签范围等部署时必需的信息。
+- `inference/pipeline.py` 优先读取 sidecar metadata；缺失时才回退到 `config.py`，以兼容历史导出产物。
+
 ### 调度分类器 (classifier.py)

 - 任务: 图像分类，判断验证码属于哪个类型
@@ -204,6 +231,17 @@ def eval_captcha_math(expr: str) -> str:
 - 标签范围: 10-200px
 - 模型体积目标: ~1MB

+### FunCaptcha 专项专家 (fun_captcha_siamese.py)
+
+- 任务: 识别 `task.question=4_3d_rollball_animals`
+- 路由方式: 不走调度分类器，而是由 HTTP/CLI 的 `question` 直接路由到专项模型
+- 输入: 从整张 challenge 截图中裁出 `reference` 和 4 个 top-row candidates
+- 预处理: RGB `3x48x48`
+- 架构: 共享编码器 Siamese，`candidate/reference` 特征拼接后输出单个匹配 logit
+- 训练方式: 每个 challenge 展开成 4 组 pair，正确候选为正样本，其余为负样本
+- 推理输出: 对 4 个候选分别打分，取 argmax，返回 `objects=[index]`
+- 模型体积目标: < 2MB
+
 ## 数据生成器规范

 ### 基类 (base.py)
@@ -234,7 +272,7 @@ class BaseCaptchaGenerator:

 - 生成形如 `A op B = ?` 的算式图片
 - A, B 范围: 1-30 的整数
- op: +, -, ×  (除法只生成能整除的)
+- op: +, -, ×, ÷  (除法只生成能整除的)
 - 确保结果为非负整数
 - 标签格式: `3+8` (存储算式本身，不存结果)
 - 视觉风格: 与目标算式验证码一致
@@ -259,6 +297,11 @@ class BaseCaptchaGenerator:
 - 标签 = 缺口 x 坐标偏移（整数字符串）
 - 文件名格式: `{offset}_{index:06d}.png`

+### Slide Solver 目标约定
+
+- `slide_gen.py` / `GapDetectorCNN` / `slide_solver.py` 统一使用缺口中心点 `x` 作为输出语义
+- 训练时先按 solver 输入宽度归一化到 `[0, 1]`，运行时再映射回原图宽度
+
 ## 训练规范

 ### 通用训练配置
@@ -328,6 +371,11 @@ TRAIN_CONFIG = {
 7. 训练结束自动导出 ONNX 到 onnx_models/
 8. DataLoader 统一使用 `num_workers=0` 避免多进程兼容问题

+补充规则:
+- 合成数据目录写入 `.dataset_meta.json`，其指纹由生成器源码哈希与配置快照共同构成。
+- OCR / 回归训练只在 checkpoint 中的 `synthetic_data_spec_hash` 与当前数据指纹一致时续训；合成数据被刷新后必须从 epoch 1 重新训练。
+- legacy `normal` / `math` 数据在 manifest 缺失时可被采纳，但 `math` 仍必须覆盖 `+ - × ÷`，缺失 `÷` 时必须重建数据后再训练。
+
 ### 数据增强策略

 ```python
@@ -423,16 +471,19 @@ uv run python cli.py train --model 3d_text
 uv run python cli.py train --model 3d_rotate
 uv run python cli.py train --model 3d_slider
 uv run python cli.py train --all    # 按依赖顺序全部训练
+uv run python cli.py train-funcaptcha --question 4_3d_rollball_animals

 # 导出 ONNX
 uv run python cli.py export --all
 uv run python cli.py export --model 3d_text     # "3d_text" 自动映射为 "threed_text"
+uv run python cli.py export --model 4_3d_rollball_animals

 # 推理
 uv run python cli.py predict image.png                    # 自动分类+识别
 uv run python cli.py predict image.png --type normal       # 跳过分类直接识别
 uv run python cli.py predict image.png --type 3d_rotate    # 指定为旋转类型
 uv run python cli.py predict-dir ./test_images/            # 批量识别
+uv run python cli.py predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals

 # 启动 HTTP 服务 (需先安装 server 可选依赖)
 uv run python cli.py serve --port 8080
@@ -443,18 +494,42 @@ uv run python cli.py serve --port 8080
 纯推理服务，不依赖 torch / 训练代码，仅需 onnxruntime + FastAPI。

 ```python
-# POST /solve         - JSON base64 图片识别
+# POST /solve         - JSON base64 图片识别 (同步)
 #   请求: {"image": "<base64>", "type": "normal"}   (type 可选)
+#   请求也可用 {"image":"<base64>", "question":"4_3d_rollball_animals"}
 #   响应: {"type": "normal", "result": "A3B8", "raw": "A3B8", "time_ms": 12.3}
+#   FunCaptcha 响应: {"type":"funcaptcha","question":"4_3d_rollball_animals","objects":[2],"result":"2","raw":"2","time_ms":12.3}
 #
-# POST /solve/upload  - multipart 文件上传识别
-#   请求: multipart/form-data, 字段名 image, 可选 query param type
+# POST /solve/upload  - multipart 文件上传识别 (同步)
+#   请求: multipart/form-data, 字段名 image, 可选 query param type/question
 #   响应: 同上
 #
+# POST /createTask    - 创建异步识别任务
+#   请求: {"clientKey":"local","callbackUrl":"https://...","softId":1,"languagePool":"en","task":{"type":"ImageToTextTask","body":"<base64>","captchaType":"normal"}}
+#   或: {"clientKey":"local","task":{"type":"FunCaptcha","body":"<base64>","question":"4_3d_rollball_animals"}}
+#   响应: {"errorId":0,"taskId":"<uuid>","status":"processing","createTime":1710000000,"expiresAt":1710000600}
+#
+# POST /getTaskResult - 查询异步任务结果
+#   处理中: {"errorId":0,"taskId":"<uuid>","status":"processing"}
+#   完成: {"errorId":0,"taskId":"<uuid>","status":"ready","cost":"0.00000","ip":"127.0.0.1","solveCount":1,"task":{"type":"ImageToTextTask","captchaType":"normal"},"callback":{"configured":true,"attempts":1,"delivered":true},"solution":{"text":"A3B8","answer":"A3B8","raw":"A3B8","captchaType":"normal","timeMs":12.3}}
+#   FunCaptcha 完成: {"errorId":0,"taskId":"<uuid>","status":"ready","task":{"type":"FunCaptcha","question":"4_3d_rollball_animals"},"solution":{"objects":[2],"answer":2,"raw":"2","text":"2","question":"4_3d_rollball_animals","timeMs":12.3}}
+#
+# POST /getBalance    - 本地兼容接口
+#   响应: {"errorId":0,"balance":999999.0}
+#
 # GET  /health        - 健康检查
-#   响应: {"status": "ok", "models_loaded": true}
+# GET  /api/v1/health - 健康检查兼容别名
+#   响应: {"status": "ok", "models_loaded": true, "client_key_required": false, "async_tasks": {...}}
 ```

+- 异步任务接口参考 `ohmycaptcha` 的 `taskId` 轮询模式实现
+- 兼容根路径与 `/api/v1/*` 双路由；如设置环境变量 `CLIENT_KEY`，则任务接口要求请求体中的 `clientKey` 匹配
+- 普通 OCR 任务通过 `task.captchaType` 路由；FunCaptcha 专项任务通过 `task.question` 路由，不进入 `CaptchaPipeline` 调度分类器
+- `callbackUrl` 会在任务完成后触发一次 `application/x-www-form-urlencoded` POST 回调，字段包含 `id/taskId` 与 `code`；默认失败重试 2 次并按退避间隔重发
+- 若设置环境变量 `CALLBACK_SIGNING_SECRET`，回调请求会携带 `X-CaptchaBreaker-Timestamp`、`X-CaptchaBreaker-Signature-Alg` 和 `X-CaptchaBreaker-Signature` 头，签名算法为 `hmac-sha256`
+- 任务结果额外暴露 `task` / `callback` 元信息，便于接入方排查异步状态
+- 任务结果持久化在 `data/server_tasks/`，默认 TTL 为 600 秒，服务重启后可恢复未过期任务
+
 ## 关键约束和注意事项

 1. **所有模型用 float32 训练，导出 ONNX 时不做量化**，先保证精度
@@ -466,7 +541,7 @@ uv run python cli.py serve --port 8080
 11. **数据集字符过滤**: `CRNNDataset` 加载标签时，若发现字符不在字符集内会发出 warning，便于排查标注/字符集不匹配问题
 7. **模型保存格式**: CTC checkpoint 包含 model_state_dict, chars, best_acc, epoch; 回归 checkpoint 包含 model_state_dict, label_range, best_mae, best_tol_acc, epoch
 8. **不使用 GPU 特有功能**，确保 CPU 也能训练和推理 (只是慢一些)
-9. **类型扩展**: 新增验证码类型时，只需 (1) 加生成器 (2) 加专家模型 (3) 调度器加一个类别重新训练
+9. **类型扩展**: OCR/回归验证码类型继续遵循 “生成器 + 专家模型 + 分类器类别” 的主线；FunCaptcha 这类专项 challenge 优先走 `task.question` 专项路由，不强行塞进 `CAPTCHA_TYPES`
 10. **文档同步**: 对项目结构、配置、架构等做出变更时，必须同步更新 CLAUDE.md 中的对应内容，保持文档与代码一致

 ## 目标指标
@@ -479,6 +554,7 @@ uv run python cli.py serve --port 8080
 | 3D立体文字 | > 85% | < 50ms | < 5MB |
 | 3D旋转 (±5°) | > 85% | < 30ms | ~1MB |
 | 3D滑块 (±3px) | > 90% | < 30ms | ~1MB |
+| FunCaptcha rollball | > 90% challenge acc | < 30ms | < 2MB |
 | 全流水线 | - | < 80ms | < 12MB 总计 |

 ## 开发顺序
@@ -488,9 +564,10 @@ uv run python cli.py serve --port 8080
 3. 实现 training/dataset.py 通用数据集类
 4. 按顺序训练: normal → math → 3d_text → 3d_rotate → 3d_slider → classifier
 5. 实现 inference/pipeline.py 和 export_onnx.py
-6. 实现 cli.py 统一入口
-7. 可选: server.py HTTP 服务
-8. 编写 tests/
+6. 实现 FunCaptcha 专项推理/训练支线 (`fun_captcha.py`, `train_funcaptcha_rollball.py`)
+7. 实现 cli.py 统一入口
+8. 可选: server.py HTTP 服务
+9. 编写 tests/

 ## 交互式 Solver 扩展