405 lines
12 KiB
Markdown
405 lines
12 KiB
Markdown
# CaptchaBreaker
|
||
|
||
本地验证码识别系统,采用 **调度模型 + 多专家模型** 两级架构。调度模型分类验证码类型,专家模型负责具体识别。所有模型轻量化设计,导出 ONNX 部署。
|
||
|
||
## 架构
|
||
|
||
```
|
||
输入图片 → 预处理 → 调度分类器 → 路由到专家模型 → 后处理 → 输出结果
|
||
│
|
||
┌────────┬───┼───────┬──────────┐
|
||
▼ ▼ ▼ ▼ ▼
|
||
normal math 3d_text 3d_rotate 3d_slider
|
||
(CRNN) (CRNN) (CNN) (RegCNN) (RegCNN)
|
||
│ │ │ │ │
|
||
▼ ▼ ▼ ▼ ▼
|
||
"A3B8" "3+8=?"→11 "X9K2" "135°" "87px"
|
||
```
|
||
|
||
### 支持的验证码类型
|
||
|
||
| 类型 | 模型 | 说明 |
|
||
|------|------|------|
|
||
| normal | LiteCRNN + CTC | 普通字符验证码 (数字+字母) |
|
||
| math | LiteCRNN + CTC | 算式验证码 (如 `3+8=?` → `11`) |
|
||
| 3d_text | ThreeDCNN + CTC | 3D 立体文字验证码 |
|
||
| 3d_rotate | RegressionCNN | 3D 旋转角度回归 (0-359°) |
|
||
| 3d_slider | RegressionCNN | 3D 滑块偏移回归 (10-200px) |
|
||
|
||
### 交互式 Solver
|
||
|
||
| 类型 | 模型 | 说明 |
|
||
|------|------|------|
|
||
| slide | GapDetectorCNN | 滑块缺口检测 (统一输出缺口中心 x,OpenCV 优先 + CNN 兜底) |
|
||
| rotate | RotationRegressor | 旋转角度回归 (sin/cos 编码) |
|
||
|
||
### FunCaptcha 专项
|
||
|
||
| question | 模型 | 说明 |
|
||
|------|------|------|
|
||
| 4_3d_rollball_animals | FunCaptchaSiamese | 整张 challenge 图裁切后做 reference/candidate 配对打分,返回 `objects` |
|
||
|
||
## 安装
|
||
|
||
```bash
|
||
# 核心依赖
|
||
uv sync
|
||
|
||
# 含 HTTP 服务
|
||
uv sync --extra server
|
||
|
||
# 含 OpenCV (滑块求解)
|
||
uv sync --extra cv
|
||
|
||
# 含测试
|
||
uv sync --extra dev
|
||
```
|
||
|
||
说明:
|
||
- 项目当前通过 `pyproject.toml` 将 `onnxruntime` 约束在 `<1.24`,以保持 Python 3.10 环境下的 `uv` 可安装性。
|
||
- Linux `x86_64` 环境下,`uv sync` 会从官方 PyTorch `cu121` index 安装 `torch==2.5.1` 和 `torchvision==0.20.1`。这组版本已验证可在 GTX 1050 Ti (`sm_61`) 上执行 CUDA。
|
||
- 仓库之前自动解析到的 `torch 2.10 + cu128` 对 GTX 1050 Ti 不兼容;如果后续升级 `torch`,先重新验证 GPU 实际能执行 CUDA 张量运算。
|
||
|
||
## 快速开始
|
||
|
||
### 1. 生成训练数据
|
||
|
||
```bash
|
||
uv run captcha generate --type normal --num 60000
|
||
uv run captcha generate --type math --num 60000
|
||
uv run captcha generate --type 3d_text --num 80000
|
||
uv run captcha generate --type 3d_rotate --num 60000
|
||
uv run captcha generate --type 3d_slider --num 60000
|
||
uv run captcha generate --type classifier --num 50000
|
||
```
|
||
|
||
### 2. 训练模型
|
||
|
||
```bash
|
||
# 逐个训练
|
||
uv run captcha train --model normal
|
||
uv run captcha train --model math
|
||
uv run captcha train --model 3d_text
|
||
uv run captcha train --model 3d_rotate
|
||
uv run captcha train --model 3d_slider
|
||
uv run captcha train --model classifier
|
||
|
||
# 或通过 CLI 一键训练
|
||
uv run captcha train --all
|
||
```
|
||
|
||
OCR / 回归训练在合成数据指纹与 checkpoint 一致时支持断点续训;生成规则变化会自动刷新数据并从 epoch 1 重新训练。分类器和 rotate solver 当前仍按整轮训练处理。
|
||
|
||
### 3. 导出 ONNX
|
||
|
||
```bash
|
||
uv run captcha export --all
|
||
# 或单个导出
|
||
uv run captcha export --model normal
|
||
uv run captcha export --model 4_3d_rollball_animals
|
||
```
|
||
|
||
导出会同时生成 `<model>.meta.json` sidecar,保存 OCR 字符集、分类器类别顺序、回归标签范围或 FunCaptcha challenge 裁切元信息,部署推理优先读取这些 metadata。
|
||
|
||
### 4. 推理
|
||
|
||
```bash
|
||
# 单张识别 (自动分类 + 识别)
|
||
uv run captcha predict image.png
|
||
|
||
# 指定类型跳过分类
|
||
uv run captcha predict image.png --type normal
|
||
|
||
# 批量识别
|
||
uv run captcha predict-dir ./test_images/
|
||
|
||
# FunCaptcha 专项识别
|
||
uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals
|
||
```
|
||
|
||
### 5. 交互式 Solver
|
||
|
||
```bash
|
||
# 生成 Solver 训练数据
|
||
uv run captcha generate-solver slide --num 30000
|
||
uv run captcha generate-solver rotate --num 50000
|
||
|
||
# 训练
|
||
uv run captcha train-solver slide
|
||
uv run captcha train-solver rotate
|
||
|
||
# 求解
|
||
uv run captcha solve slide --bg bg.png --tpl tpl.png
|
||
uv run captcha solve rotate --image img.png
|
||
```
|
||
|
||
### 6. FunCaptcha 专项训练
|
||
|
||
准备整张 challenge 标注图到 `data/real/funcaptcha/4_3d_rollball_animals/`,文件名前缀为正确候选索引,例如 `2_demo.jpg`。
|
||
|
||
```bash
|
||
uv run captcha train-funcaptcha --question 4_3d_rollball_animals
|
||
uv run captcha export --model 4_3d_rollball_animals
|
||
uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals
|
||
```
|
||
|
||
如果暂时没有训练数据,也可以直接复用外部 ONNX:
|
||
|
||
```bash
|
||
FUNCAPTCHA_ROLLBALL_MODEL_PATH=/path/to/4_3d_rollball_animals.onnx \
|
||
uv run captcha predict-funcaptcha challenge.jpg --question 4_3d_rollball_animals
|
||
```
|
||
|
||
推理查找顺序为:
|
||
- `onnx_models/funcaptcha_rollball_animals.onnx`
|
||
- 环境变量 `FUNCAPTCHA_ROLLBALL_MODEL_PATH`
|
||
- 默认回退 `/mnt/data/code/python/funcaptcha-server/model/4_3d_rollball_animals.onnx`
|
||
|
||
不要把 ONNX 文件放到 `models/`;该目录用于 Python 模型定义源码,运行时模型产物应放在 `onnx_models/`。
|
||
|
||
## HTTP API
|
||
|
||
```bash
|
||
uv sync --extra server
|
||
uv run captcha serve --port 8080
|
||
```
|
||
|
||
如需和 `ohmycaptcha` / YesCaptcha 风格客户端对齐,可在启动前设置 `CLIENT_KEY`:
|
||
|
||
```bash
|
||
CLIENT_KEY=local uv run captcha serve --port 8080
|
||
```
|
||
|
||
如需让回调接收方校验来源,可再设置 `CALLBACK_SIGNING_SECRET`;服务会在回调请求头里附带 HMAC-SHA256 签名:
|
||
|
||
```bash
|
||
CLIENT_KEY=local CALLBACK_SIGNING_SECRET=shared-secret uv run captcha serve --port 8080
|
||
```
|
||
|
||
同步/异步接口都提供根路径和 `/api/v1/*` 兼容别名,例如 `/solve` 与 `/api/v1/solve`、`/createTask` 与 `/api/v1/createTask` 都可用。
|
||
|
||
### POST /solve — base64 图片识别(同步)
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8080/solve \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"image": "'$(base64 -w0 captcha.png)'", "type": "normal"}'
|
||
```
|
||
|
||
请求体:
|
||
|
||
```json
|
||
{
|
||
"image": "<base64 编码的图片>",
|
||
"type": "normal"
|
||
}
|
||
```
|
||
|
||
`type` 可选,省略则自动分类。可选值:`normal` / `math` / `3d_text` / `3d_rotate` / `3d_slider`
|
||
|
||
如需专项 FunCaptcha 路由,可额外传 `question`,例如:
|
||
|
||
```json
|
||
{
|
||
"image": "<base64 编码的图片>",
|
||
"question": "4_3d_rollball_animals"
|
||
}
|
||
```
|
||
|
||
此时响应会额外包含 `objects`。
|
||
|
||
响应:
|
||
|
||
```json
|
||
{
|
||
"type": "normal",
|
||
"result": "A3B8",
|
||
"raw": "A3B8",
|
||
"time_ms": 12.3
|
||
}
|
||
```
|
||
|
||
### POST /solve/upload — 文件上传识别(同步)
|
||
|
||
```bash
|
||
curl -X POST "http://localhost:8080/solve/upload?type=normal" \
|
||
-F "image=@captcha.png"
|
||
```
|
||
|
||
### POST /createTask — 创建异步识别任务
|
||
|
||
接口风格参考 `ohmycaptcha` 的 `taskId` 轮询方案,适合需要统一异步协议的接入方。任务结果会持久化到 `data/server_tasks/`,服务重启后仍可继续查询,默认保留 10 分钟;如设置了 `CLIENT_KEY`,则 `clientKey` 必须匹配。`callbackUrl`、`softId`、`languagePool` 字段可传入,其中 `callbackUrl` 会在任务完成后收到一次 `application/x-www-form-urlencoded` POST 回调;默认失败重试 2 次,可通过 `SERVER_CONFIG` 调整超时、重试次数和退避间隔。如设置了 `CALLBACK_SIGNING_SECRET`,回调还会带上 `X-CaptchaBreaker-Timestamp`、`X-CaptchaBreaker-Signature-Alg`、`X-CaptchaBreaker-Signature`。普通 OCR 任务走 `task.captchaType`,专项 FunCaptcha 任务走 `task.question`。
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8080/createTask \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"clientKey":"local","task":{"type":"ImageToTextTask","body":"'"$(base64 -w0 captcha.png)"'","captchaType":"normal"}}'
|
||
```
|
||
|
||
FunCaptcha 示例:
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8080/createTask \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"clientKey":"local","task":{"type":"FunCaptcha","body":"'"$(base64 -w0 challenge.jpg)"'","question":"4_3d_rollball_animals"}}'
|
||
```
|
||
|
||
响应:
|
||
|
||
```json
|
||
{
|
||
"errorId": 0,
|
||
"taskId": "4ec6f1904da2446caa6c6313c0f7d2b0",
|
||
"status": "processing",
|
||
"createTime": 1710000000,
|
||
"expiresAt": 1710000600
|
||
}
|
||
```
|
||
|
||
### POST /getTaskResult — 查询异步任务结果
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8080/getTaskResult \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"clientKey":"local","taskId":"4ec6f1904da2446caa6c6313c0f7d2b0"}'
|
||
```
|
||
|
||
处理中:
|
||
|
||
```json
|
||
{
|
||
"errorId": 0,
|
||
"taskId": "4ec6f1904da2446caa6c6313c0f7d2b0",
|
||
"status": "processing",
|
||
"createTime": 1710000000
|
||
}
|
||
```
|
||
|
||
完成:
|
||
|
||
```json
|
||
{
|
||
"errorId": 0,
|
||
"taskId": "4ec6f1904da2446caa6c6313c0f7d2b0",
|
||
"status": "ready",
|
||
"cost": "0.00000",
|
||
"ip": "127.0.0.1",
|
||
"createTime": 1710000000,
|
||
"endTime": 1710000001,
|
||
"expiresAt": 1710000600,
|
||
"solveCount": 1,
|
||
"task": {
|
||
"type": "ImageToTextTask",
|
||
"captchaType": "normal"
|
||
},
|
||
"callback": {
|
||
"configured": true,
|
||
"url": "https://example.com/callback",
|
||
"attempts": 1,
|
||
"delivered": true,
|
||
"deliveredAt": 1710000001,
|
||
"lastError": null
|
||
},
|
||
"solution": {
|
||
"text": "A3B8",
|
||
"answer": "A3B8",
|
||
"raw": "A3B8",
|
||
"captchaType": "normal",
|
||
"timeMs": 12.3
|
||
}
|
||
}
|
||
```
|
||
|
||
### POST /getBalance — 本地兼容接口
|
||
|
||
```json
|
||
{"errorId": 0, "balance": 999999.0}
|
||
```
|
||
|
||
### GET /health 或 /api/v1/health — 健康检查
|
||
|
||
```json
|
||
{
|
||
"status": "ok",
|
||
"models_loaded": true,
|
||
"client_key_required": false,
|
||
"async_tasks": {
|
||
"active": 0,
|
||
"processing": 0,
|
||
"ready": 0,
|
||
"failed": 0,
|
||
"ttl_seconds": 600
|
||
}
|
||
}
|
||
```
|
||
|
||
## 项目结构
|
||
|
||
```
|
||
├── config.py # 全局配置 (字符集、尺寸、训练超参)
|
||
├── cli.py # 命令行入口
|
||
├── server.py # FastAPI HTTP 服务 (纯推理,不依赖 torch)
|
||
├── generators/ # 验证码数据生成器
|
||
│ ├── normal_gen.py # 普通字符
|
||
│ ├── math_gen.py # 算式
|
||
│ ├── threed_gen.py # 3D 文字
|
||
│ ├── threed_rotate_gen.py # 3D 旋转
|
||
│ ├── threed_slider_gen.py # 3D 滑块
|
||
│ ├── slide_gen.py # 滑块缺口训练数据
|
||
│ └── rotate_solver_gen.py # 旋转求解器训练数据
|
||
├── models/ # 模型定义
|
||
│ ├── classifier.py # 调度分类器
|
||
│ ├── lite_crnn.py # 轻量 CRNN (normal/math)
|
||
│ ├── threed_cnn.py # 3D 文字 CNN
|
||
│ ├── regression_cnn.py # 回归 CNN (3d_rotate/3d_slider)
|
||
│ ├── gap_detector.py # 滑块缺口检测
|
||
│ └── rotation_regressor.py # 旋转角度回归
|
||
├── training/ # 训练脚本
|
||
│ ├── data_fingerprint.py # 合成数据指纹 / manifest
|
||
│ ├── train_utils.py # CTC 训练通用逻辑
|
||
│ ├── train_regression_utils.py # 回归训练通用逻辑
|
||
│ ├── dataset.py # 通用 Dataset 类
|
||
│ └── train_*.py # 各模型训练入口
|
||
├── inference/ # 推理 (仅依赖 onnxruntime)
|
||
│ ├── model_metadata.py # ONNX sidecar metadata
|
||
│ ├── pipeline.py # 核心推理流水线
|
||
│ ├── export_onnx.py # ONNX 导出
|
||
│ └── math_eval.py # 算式计算
|
||
├── solvers/ # 交互式验证码求解器
|
||
│ ├── slide_solver.py # 滑块求解
|
||
│ └── rotate_solver.py # 旋转求解
|
||
├── utils/
|
||
│ └── slide_utils.py # 滑块轨迹生成
|
||
└── tests/ # 测试 (57 tests)
|
||
```
|
||
|
||
## 目标指标
|
||
|
||
| 模型 | 准确率目标 | 推理延迟 | 模型体积 |
|
||
|------|-----------|---------|---------|
|
||
| 调度分类器 | > 99% | < 5ms | < 500KB |
|
||
| 普通字符 | > 95% | < 30ms | < 2MB |
|
||
| 算式识别 | > 93% | < 30ms | < 2MB |
|
||
| 3D 立体文字 | > 85% | < 50ms | < 5MB |
|
||
| 3D 旋转 (±5°) | > 85% | < 30ms | ~1MB |
|
||
| 3D 滑块 (±3px) | > 90% | < 30ms | ~1MB |
|
||
| 滑块 CNN (±5px) | > 85% | < 30ms | ~1MB |
|
||
| 旋转回归 (±5°) | > 85% | < 30ms | ~2MB |
|
||
|
||
## 测试
|
||
|
||
```bash
|
||
uv sync --extra dev
|
||
python -m pytest tests/ -v
|
||
```
|
||
|
||
## 技术栈
|
||
|
||
- Python 3.10-3.12
|
||
- PyTorch 2.x (训练)
|
||
- ONNX + ONNXRuntime (推理部署)
|
||
- FastAPI + uvicorn (HTTP 服务)
|
||
- Pillow (图像处理)
|
||
- OpenCV (可选,滑块求解)
|
||
- uv (包管理)
|