Files

Hua f5be7671bc Expand 3D captcha into three subtypes: 3d_text, 3d_rotate, 3d_slider

Split the single "3d" captcha type into three independent expert models:
- 3d_text: 3D perspective text OCR (renamed from old "3d", CTC-based ThreeDCNN)
- 3d_rotate: rotation angle regression (new RegressionCNN, circular loss)
- 3d_slider: slider offset regression (new RegressionCNN, SmoothL1 loss)

CAPTCHA_TYPES expanded from 3 to 5 classes. Classifier samples updated
to 50000 (10000 per class). New generators, model, dataset, training
utilities, and full pipeline/export/CLI support for all subtypes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-11 13:55:53 +08:00

4.0 KiB

Raw Blame History

Repository Guidelines

Project Structure & Module Organization

Use cli.py as the main entrypoint and keep shared settings in config.py. generators/ builds synthetic captchas (5 types: normal, math, 3d_text, 3d_rotate, 3d_slider), models/ contains the classifier, CTC expert models, and regression models, training/ owns datasets and training scripts, and inference/ contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in data/, checkpoints/, and onnx_models/.

Build, Test, and Development Commands

Use uv for environment and dependency management.

uv sync installs the base runtime dependencies from pyproject.toml.
uv sync --extra server installs HTTP service dependencies.
uv run captcha generate --type normal --num 1000 generates synthetic training data. Types: normal, math, 3d_text, 3d_rotate, 3d_slider, classifier.
uv run captcha train --model normal trains one model; uv run captcha train --all runs the full order: normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier.
uv run captcha export --all exports all trained models to ONNX.
uv run captcha export --model 3d_text exports a single model; 3d_text is automatically mapped to threed_text.
uv run captcha predict image.png runs auto-routing inference; add --type normal to skip classification.
uv run captcha predict-dir ./test_images runs batch inference on a directory.
uv run captcha serve --port 8080 starts the optional HTTP API when server.py is implemented.

Coding Style & Naming Conventions

Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly normal, math, 3d_text, 3d_rotate, 3d_slider, and classifier. Checkpoint/ONNX file names use threed_text, threed_rotate, threed_slider (underscored, no hyphens). Preserve the design rules from CLAUDE.md: float32 training/export, CPU-safe ops, and greedy CTC decoding for OCR models. Regression models (3d_rotate, 3d_slider) output sigmoid [0,1] scaled by REGRESSION_RANGE. normal uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in inference/math_eval.py.

Training & Data Rules

All training scripts must set the global random seed (random, numpy, torch) via config.RANDOM_SEED before training begins.
All DataLoaders use num_workers=0 for cross-platform consistency.
Generator parameters (rotation, noise, shadow, etc.) must come from config.GENERATE_CONFIG, not hardcoded values.
CRNNDataset emits a warnings.warn when a label contains characters outside the configured charset, rather than silently dropping them.
RegressionDataset parses numeric labels from filenames and normalizes to [0,1] via label_range.

Data & Testing Guidelines

Synthetic generator output should use {label}_{index:06d}.png; real labeled samples should use {label}_{anything}.png. For regression types, label is the numeric value (angle or offset). Sample targets are defined in config.py. Save best checkpoints to checkpoints/ and export matching ONNX files to onnx_models/. Use pytest, place tests under tests/ as test_<feature>.py, and run them with uv run pytest. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.

Commit & Pull Request Guidelines

Git history is not available in this workspace snapshot, so use short imperative commit subjects such as Add classifier export smoke test. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.

Documentation Sync

Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update AGENTS.md and CLAUDE.md in the same patch.

4.0 KiB Raw Blame History