Files
CaptchBreaker/models/gap_detector.py
Hua 9b5f29083e Add slide and rotate interactive captcha solvers
New solver subsystem with independent models:
- GapDetectorCNN (1x128x256 grayscale → sigmoid) for slide gap detection
- RotationRegressor (3x128x128 RGB → sin/cos via tanh) for rotation angle prediction
- SlideSolver with 3-tier strategy: template match → edge detect → CNN fallback
- RotateSolver with ONNX sin/cos → atan2 inference
- Generators, training scripts, CLI commands, and slide track utility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:07:06 +08:00

83 lines
2.5 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
滑块缺口检测 CNN (GapDetectorCNN)
用于检测滑块验证码中缺口的 x 坐标位置。
输出 sigmoid 归一化到 [0,1],推理时按图片宽度缩放回像素坐标。
架构:
Conv(1→32) + BN + ReLU + Pool
Conv(32→64) + BN + ReLU + Pool
Conv(64→128) + BN + ReLU + Pool
Conv(128→128) + BN + ReLU + Pool
AdaptiveAvgPool2d(1) → FC(128→64) → ReLU → Dropout(0.2) → FC(64→1) → Sigmoid
约 250K 参数,~1MB。
"""
import torch
import torch.nn as nn
class GapDetectorCNN(nn.Module):
"""
滑块缺口检测 CNN输出缺口 x 坐标的归一化百分比 [0,1]。
与 RegressionCNN 架构相同,但语义上专用于滑块缺口检测,
默认输入尺寸 1x128x256 (灰度)。
"""
def __init__(self, img_h: int = 128, img_w: int = 256):
super().__init__()
self.img_h = img_h
self.img_w = img_w
self.features = nn.Sequential(
# block 1: 1 → 32, H/2, W/2
nn.Conv2d(1, 32, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# block 2: 32 → 64, H/4, W/4
nn.Conv2d(32, 64, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# block 3: 64 → 128, H/8, W/8
nn.Conv2d(64, 128, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# block 4: 128 → 128, H/16, W/16
nn.Conv2d(128, 128, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
)
self.pool = nn.AdaptiveAvgPool2d(1)
self.regressor = nn.Sequential(
nn.Linear(128, 64),
nn.ReLU(inplace=True),
nn.Dropout(0.2),
nn.Linear(64, 1),
nn.Sigmoid(),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""
Args:
x: (batch, 1, H, W) 灰度图
Returns:
output: (batch, 1) sigmoid 输出 [0, 1],表示缺口 x 坐标百分比
"""
feat = self.features(x)
feat = self.pool(feat) # (B, 128, 1, 1)
feat = feat.flatten(1) # (B, 128)
out = self.regressor(feat) # (B, 1)
return out