# CaptchaBreaker 本地验证码识别系统，采用 **调度模型 + 多专家模型** 两级架构。调度模型分类验证码类型，专家模型负责具体识别。所有模型轻量化设计，导出 ONNX 部署。 ## 架构 ``` 输入图片 → 预处理 → 调度分类器 → 路由到专家模型 → 后处理 → 输出结果 │ ┌────────┬───┼───────┬──────────┐ ▼ ▼ ▼ ▼ ▼ normal math 3d_text 3d_rotate 3d_slider (CRNN) (CRNN) (CNN) (RegCNN) (RegCNN) │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ "A3B8" "3+8=?"→11 "X9K2" "135°" "87px" ``` ### 支持的验证码类型 | 类型 | 模型 | 说明 | |------|------|------| | normal | LiteCRNN + CTC | 普通字符验证码 (数字+字母) | | math | LiteCRNN + CTC | 算式验证码 (如 `3+8=?` → `11`) | | 3d_text | ThreeDCNN + CTC | 3D 立体文字验证码 | | 3d_rotate | RegressionCNN | 3D 旋转角度回归 (0-359°) | | 3d_slider | RegressionCNN | 3D 滑块偏移回归 (10-200px) | ### 交互式 Solver | 类型 | 模型 | 说明 | |------|------|------| | slide | GapDetectorCNN | 滑块缺口检测 (统一输出缺口中心 x，OpenCV 优先 + CNN 兜底) | | rotate | RotationRegressor | 旋转角度回归 (sin/cos 编码) | ### FunCaptcha 专项 | question | 模型 | 说明 | |------|------|------| | 4_3d_rollball_animals | FunCaptchaSiamese | 整张 challenge 图裁切后做 reference/candidate 配对打分，返回 `objects` | ## 安装 ```bash # 核心依赖 uv sync # 含 HTTP 服务 uv sync --extra server # 含 OpenCV (滑块求解) uv sync --extra cv # 含测试 uv sync --extra dev ``` 说明： - 项目当前通过 `pyproject.toml` 将 `onnxruntime` 约束在 `<1.24`，以保持 Python 3.10 环境下的 `uv` 可安装性。 - Linux `x86_64` 环境下，`uv sync` 会从官方 PyTorch `cu121` index 安装 `torch==2.5.1` 和 `torchvision==0.20.1`。这组版本已验证可在 GTX 1050 Ti (`sm_61`) 上执行 CUDA。 - 仓库之前自动解析到的 `torch 2.10 + cu128` 对 GTX 1050 Ti 不兼容；如果后续升级 `torch`，先重新验证 GPU 实际能执行 CUDA 张量运算。 ## 快速开始 ### 1. 生成训练数据 ```bash uv run captcha generate --type normal --num 60000 uv run captcha generate --type math --num 60000 uv run captcha generate --type 3d_text --num 80000 uv run captcha generate --type 3d_rotate --num 60000 uv run captcha generate --type 3d_slider --num 60000 uv run captcha generate --type classifier --num 50000 ``` ### 2. 训练模型 ```bash # 逐个训练 uv run captcha train --model normal uv run captcha train --model math uv run captcha train --model 3d_text uv run captcha train --model 3d_rotate uv run captcha train --model 3d_slider uv run captcha train --model classifier # 或通过 CLI 一键训练 uv run captcha train --all ``` OCR / 回归训练在合成数据指纹与 checkpoint 一致时支持断点续训；生成规则变化会自动刷新数据并从 epoch 1 重新训练。分类器和 rotate solver 当前仍按整轮训练处理。 ### 3. 导出 ONNX ```bash uv run captcha export --all # 或单个导出 uv run captcha export --model normal uv run captcha export --model 4_3d_rollball_animals ``` 导出会同时生成 `.meta.json` sidecar，保存 OCR 字符集、分类器类别顺序、回归标签范围或 FunCaptcha challenge 裁切元信息，部署推理优先读取这些 metadata。 ### 4. 推理 ```bash # 单张识别 (自动分类 + 识别) uv run captcha predict image.png # 指定类型跳过分类 uv run captcha predict image.png --type normal # 批量识别 uv run captcha predict-dir ./test_images/ # FunCaptcha 专项识别 uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals ``` ### 5. 交互式 Solver ```bash # 生成 Solver 训练数据 uv run captcha generate-solver slide --num 30000 uv run captcha generate-solver rotate --num 50000 # 训练 uv run captcha train-solver slide uv run captcha train-solver rotate # 求解 uv run captcha solve slide --bg bg.png --tpl tpl.png uv run captcha solve rotate --image img.png ``` ### 6. FunCaptcha 专项训练准备整张 challenge 标注图到 `data/real/funcaptcha/4_3d_rollball_animals/`，文件名前缀为正确候选索引，例如 `2_demo.jpg`。 ```bash uv run captcha train-funcaptcha --question 4_3d_rollball_animals uv run captcha export --model 4_3d_rollball_animals uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals ``` 如果暂时没有训练数据，也可以直接复用外部 ONNX： ```bash FUNCAPTCHA_ROLLBALL_MODEL_PATH=/path/to/4_3d_rollball_animals.onnx \ uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals ``` 推理查找顺序为： - `onnx_models/funcaptcha_rollball_animals.onnx` - 环境变量 `FUNCAPTCHA_ROLLBALL_MODEL_PATH` - 默认回退 `/mnt/data/code/python/funcaptcha-server/model/4_3d_rollball_animals.onnx` 不要把 ONNX 文件放到 `models/`；该目录用于 Python 模型定义源码，运行时模型产物应放在 `onnx_models/`。 ## HTTP API ```bash uv sync --extra server uv run captcha serve --port 8080 ``` 如需和 `ohmycaptcha` / YesCaptcha 风格客户端对齐，可在启动前设置 `CLIENT_KEY`： ```bash CLIENT_KEY=local uv run captcha serve --port 8080 ``` 如需让回调接收方校验来源，可再设置 `CALLBACK_SIGNING_SECRET`；服务会在回调请求头里附带 HMAC-SHA256 签名： ```bash CLIENT_KEY=local CALLBACK_SIGNING_SECRET=shared-secret uv run captcha serve --port 8080 ``` 同步/异步接口都提供根路径和 `/api/v1/*` 兼容别名，例如 `/solve` 与 `/api/v1/solve`、`/createTask` 与 `/api/v1/createTask` 都可用。 ### POST /solve — base64 图片识别（同步） ```bash curl -X POST http://localhost:8080/solve \ -H "Content-Type: application/json" \ -d '{"image": "'$(base64 -w0 captcha.png)'", "type": "normal"}' ``` 请求体： ```json { "image": "", "type": "normal" } ``` `type` 可选，省略则自动分类。可选值：`normal` / `math` / `3d_text` / `3d_rotate` / `3d_slider` 如需专项 FunCaptcha 路由，可额外传 `question`，例如： ```json { "image": "", "question": "4_3d_rollball_animals" } ``` 此时响应会额外包含 `objects`。响应： ```json { "type": "normal", "result": "A3B8", "raw": "A3B8", "time_ms": 12.3 } ``` ### POST /solve/upload — 文件上传识别（同步） ```bash curl -X POST "http://localhost:8080/solve/upload?type=normal" \ -F "image=@captcha.png" ``` ### POST /createTask — 创建异步识别任务接口风格参考 `ohmycaptcha` 的 `taskId` 轮询方案，适合需要统一异步协议的接入方。任务结果会持久化到 `data/server_tasks/`，服务重启后仍可继续查询，默认保留 10 分钟；如设置了 `CLIENT_KEY`，则 `clientKey` 必须匹配。`callbackUrl`、`softId`、`languagePool` 字段可传入，其中 `callbackUrl` 会在任务完成后收到一次 `application/x-www-form-urlencoded` POST 回调；默认失败重试 2 次，可通过 `SERVER_CONFIG` 调整超时、重试次数和退避间隔。如设置了 `CALLBACK_SIGNING_SECRET`，回调还会带上 `X-CaptchaBreaker-Timestamp`、`X-CaptchaBreaker-Signature-Alg`、`X-CaptchaBreaker-Signature`。普通 OCR 任务走 `task.captchaType`，专项 FunCaptcha 任务走 `task.question`。 ```bash curl -X POST http://localhost:8080/createTask \ -H "Content-Type: application/json" \ -d '{"clientKey":"local","task":{"type":"ImageToTextTask","body":"'"$(base64 -w0 captcha.png)"'","captchaType":"normal"}}' ``` FunCaptcha 示例： ```bash curl -X POST http://localhost:8080/createTask \ -H "Content-Type: application/json" \ -d '{"clientKey":"local","task":{"type":"FunCaptcha","body":"'"$(base64 -w0 challenge.jpg)"'","question":"4_3d_rollball_animals"}}' ``` 响应： ```json { "errorId": 0, "taskId": "4ec6f1904da2446caa6c6313c0f7d2b0", "status": "processing", "createTime": 1710000000, "expiresAt": 1710000600 } ``` ### POST /getTaskResult — 查询异步任务结果 ```bash curl -X POST http://localhost:8080/getTaskResult \ -H "Content-Type: application/json" \ -d '{"clientKey":"local","taskId":"4ec6f1904da2446caa6c6313c0f7d2b0"}' ``` 处理中： ```json { "errorId": 0, "taskId": "4ec6f1904da2446caa6c6313c0f7d2b0", "status": "processing", "createTime": 1710000000 } ``` 完成： ```json { "errorId": 0, "taskId": "4ec6f1904da2446caa6c6313c0f7d2b0", "status": "ready", "cost": "0.00000", "ip": "127.0.0.1", "createTime": 1710000000, "endTime": 1710000001, "expiresAt": 1710000600, "solveCount": 1, "task": { "type": "ImageToTextTask", "captchaType": "normal" }, "callback": { "configured": true, "url": "https://example.com/callback", "attempts": 1, "delivered": true, "deliveredAt": 1710000001, "lastError": null }, "solution": { "text": "A3B8", "answer": "A3B8", "raw": "A3B8", "captchaType": "normal", "timeMs": 12.3 } } ``` ### POST /getBalance — 本地兼容接口 ```json {"errorId": 0, "balance": 999999.0} ``` ### GET /health 或 /api/v1/health — 健康检查 ```json { "status": "ok", "models_loaded": true, "client_key_required": false, "async_tasks": { "active": 0, "processing": 0, "ready": 0, "failed": 0, "ttl_seconds": 600 } } ``` ## 项目结构 ``` ├── config.py # 全局配置 (字符集、尺寸、训练超参) ├── cli.py # 命令行入口 ├── server.py # FastAPI HTTP 服务 (纯推理，不依赖 torch) ├── generators/ # 验证码数据生成器 │ ├── normal_gen.py # 普通字符 │ ├── math_gen.py # 算式 │ ├── threed_gen.py # 3D 文字 │ ├── threed_rotate_gen.py # 3D 旋转 │ ├── threed_slider_gen.py # 3D 滑块 │ ├── slide_gen.py # 滑块缺口训练数据 │ └── rotate_solver_gen.py # 旋转求解器训练数据 ├── models/ # 模型定义 │ ├── classifier.py # 调度分类器 │ ├── lite_crnn.py # 轻量 CRNN (normal/math) │ ├── threed_cnn.py # 3D 文字 CNN │ ├── regression_cnn.py # 回归 CNN (3d_rotate/3d_slider) │ ├── gap_detector.py # 滑块缺口检测 │ └── rotation_regressor.py # 旋转角度回归 ├── training/ # 训练脚本 │ ├── data_fingerprint.py # 合成数据指纹 / manifest │ ├── train_utils.py # CTC 训练通用逻辑 │ ├── train_regression_utils.py # 回归训练通用逻辑 │ ├── dataset.py # 通用 Dataset 类 │ └── train_*.py # 各模型训练入口 ├── inference/ # 推理 (仅依赖 onnxruntime) │ ├── model_metadata.py # ONNX sidecar metadata │ ├── pipeline.py # 核心推理流水线 │ ├── export_onnx.py # ONNX 导出 │ └── math_eval.py # 算式计算 ├── solvers/ # 交互式验证码求解器 │ ├── slide_solver.py # 滑块求解 │ └── rotate_solver.py # 旋转求解 ├── utils/ │ └── slide_utils.py # 滑块轨迹生成 └── tests/ # 测试 (57 tests) ``` ## 目标指标 | 模型 | 准确率目标 | 推理延迟 | 模型体积 | |------|-----------|---------|---------| | 调度分类器 | > 99% | < 5ms | < 500KB | | 普通字符 | > 95% | < 30ms | < 2MB | | 算式识别 | > 93% | < 30ms | < 2MB | | 3D 立体文字 | > 85% | < 50ms | < 5MB | | 3D 旋转 (±5°) | > 85% | < 30ms | ~1MB | | 3D 滑块 (±3px) | > 90% | < 30ms | ~1MB | | 滑块 CNN (±5px) | > 85% | < 30ms | ~1MB | | 旋转回归 (±5°) | > 85% | < 30ms | ~2MB | ## 测试 ```bash uv sync --extra dev python -m pytest tests/ -v ``` ## 技术栈 - Python 3.10-3.12 - PyTorch 2.x (训练) - ONNX + ONNXRuntime (推理部署) - FastAPI + uvicorn (HTTP 服务) - Pillow (图像处理) - OpenCV (可选，滑块求解) - uv (包管理)