Files

Hua ef9518deeb Add README.md with project documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-11 19:30:22 +08:00

6.5 KiB

Raw Blame History

CaptchaBreaker

本地验证码识别系统，采用 调度模型 + 多专家模型 两级架构。调度模型分类验证码类型，专家模型负责具体识别。所有模型轻量化设计，导出 ONNX 部署。

架构

输入图片 → 预处理 → 调度分类器 → 路由到专家模型 → 后处理 → 输出结果
                         │
            ┌────────┬───┼───────┬──────────┐
            ▼        ▼   ▼       ▼          ▼
         normal    math  3d_text  3d_rotate  3d_slider
         (CRNN)   (CRNN) (CNN)   (RegCNN)   (RegCNN)
            │        │     │       │          │
            ▼        ▼     ▼       ▼          ▼
         "A3B8" "3+8=?"→11 "X9K2"  "135°"    "87px"

支持的验证码类型

类型	模型	说明
normal	LiteCRNN + CTC	普通字符验证码 (数字+字母)
math	LiteCRNN + CTC	算式验证码 (如 `3+8=?` → `11`)
3d_text	ThreeDCNN + CTC	3D 立体文字验证码
3d_rotate	RegressionCNN	3D 旋转角度回归 (0-359°)
3d_slider	RegressionCNN	3D 滑块偏移回归 (10-200px)

交互式 Solver

类型	模型	说明
slide	GapDetectorCNN	滑块缺口检测 (OpenCV 优先 + CNN 兜底)
rotate	RotationRegressor	旋转角度回归 (sin/cos 编码)

安装

# 核心依赖
uv sync

# 含 HTTP 服务
uv sync --extra server

# 含 OpenCV (滑块求解)
uv sync --extra cv

# 含测试
uv sync --extra dev

快速开始

1. 生成训练数据

python cli.py generate --type normal --num 60000
python cli.py generate --type math --num 60000
python cli.py generate --type 3d_text --num 80000
python cli.py generate --type 3d_rotate --num 60000
python cli.py generate --type 3d_slider --num 60000
python cli.py generate --type classifier --num 50000

2. 训练模型

# 逐个训练
python -m training.train_normal
python -m training.train_math
python -m training.train_3d_text
python -m training.train_3d_rotate
python -m training.train_3d_slider
python -m training.train_classifier

# 或通过 CLI 一键训练
python cli.py train --all

训练支持断点续训：检测到已有 checkpoint 会自动从上次中断处继续。

3. 导出 ONNX

python cli.py export --all
# 或单个导出
python cli.py export --model normal

4. 推理

# 单张识别 (自动分类 + 识别)
python cli.py predict image.png

# 指定类型跳过分类
python cli.py predict image.png --type normal

# 批量识别
python cli.py predict-dir ./test_images/

5. 交互式 Solver

# 生成 Solver 训练数据
python cli.py generate-solver slide --num 30000
python cli.py generate-solver rotate --num 50000

# 训练
python cli.py train-solver slide
python cli.py train-solver rotate

# 求解
python cli.py solve slide --bg bg.png --tpl tpl.png
python cli.py solve rotate --image img.png

HTTP API

uv sync --extra server
python cli.py serve --port 8080

POST /solve — base64 图片识别

curl -X POST http://localhost:8080/solve \
  -H "Content-Type: application/json" \
  -d '{"image": "'$(base64 -w0 captcha.png)'", "type": "normal"}'

请求体：

{
  "image": "<base64 编码的图片>",
  "type": "normal"
}

type 可选，省略则自动分类。可选值：normal / math / 3d_text / 3d_rotate / 3d_slider

响应：

{
  "type": "normal",
  "result": "A3B8",
  "raw": "A3B8",
  "time_ms": 12.3
}

POST /solve/upload — 文件上传识别

curl -X POST "http://localhost:8080/solve/upload?type=normal" \
  -F "image=@captcha.png"

GET /health — 健康检查

{"status": "ok", "models_loaded": true}

项目结构

├── config.py                 # 全局配置 (字符集、尺寸、训练超参)
├── cli.py                    # 命令行入口
├── server.py                 # FastAPI HTTP 服务 (纯推理，不依赖 torch)
├── generators/               # 验证码数据生成器
│   ├── normal_gen.py         # 普通字符
│   ├── math_gen.py           # 算式
│   ├── threed_gen.py         # 3D 文字
│   ├── threed_rotate_gen.py  # 3D 旋转
│   ├── threed_slider_gen.py  # 3D 滑块
│   ├── slide_gen.py          # 滑块缺口训练数据
│   └── rotate_solver_gen.py  # 旋转求解器训练数据
├── models/                   # 模型定义
│   ├── classifier.py         # 调度分类器
│   ├── lite_crnn.py          # 轻量 CRNN (normal/math)
│   ├── threed_cnn.py         # 3D 文字 CNN
│   ├── regression_cnn.py     # 回归 CNN (3d_rotate/3d_slider)
│   ├── gap_detector.py       # 滑块缺口检测
│   └── rotation_regressor.py # 旋转角度回归
├── training/                 # 训练脚本
│   ├── train_utils.py        # CTC 训练通用逻辑
│   ├── train_regression_utils.py  # 回归训练通用逻辑
│   ├── dataset.py            # 通用 Dataset 类
│   └── train_*.py            # 各模型训练入口
├── inference/                # 推理 (仅依赖 onnxruntime)
│   ├── pipeline.py           # 核心推理流水线
│   ├── export_onnx.py        # ONNX 导出
│   └── math_eval.py          # 算式计算
├── solvers/                  # 交互式验证码求解器
│   ├── slide_solver.py       # 滑块求解
│   └── rotate_solver.py      # 旋转求解
├── utils/
│   └── slide_utils.py        # 滑块轨迹生成
└── tests/                    # 测试 (57 tests)

目标指标

模型	准确率目标	推理延迟	模型体积
调度分类器	> 99%	< 5ms	< 500KB
普通字符	> 95%	< 30ms	< 2MB
算式识别	> 93%	< 30ms	< 2MB
3D 立体文字	> 85%	< 50ms	< 5MB
3D 旋转 (±5°)	> 85%	< 30ms	~1MB
3D 滑块 (±3px)	> 90%	< 30ms	~1MB
滑块 CNN (±5px)	> 85%	< 30ms	~1MB
旋转回归 (±5°)	> 85%	< 30ms	~2MB

测试

uv sync --extra dev
python -m pytest tests/ -v

技术栈

Python 3.10+
PyTorch 2.x (训练)
ONNX + ONNXRuntime (推理部署)
FastAPI + uvicorn (HTTP 服务)
Pillow (图像处理)
OpenCV (可选，滑块求解)
uv (包管理)

6.5 KiB Raw Blame History