Split the single "3d" captcha type into three independent expert models: - 3d_text: 3D perspective text OCR (renamed from old "3d", CTC-based ThreeDCNN) - 3d_rotate: rotation angle regression (new RegressionCNN, circular loss) - 3d_slider: slider offset regression (new RegressionCNN, SmoothL1 loss) CAPTCHA_TYPES expanded from 3 to 5 classes. Classifier samples updated to 50000 (10000 per class). New generators, model, dataset, training utilities, and full pipeline/export/CLI support for all subtypes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4.0 KiB
Repository Guidelines
Project Structure & Module Organization
Use cli.py as the main entrypoint and keep shared settings in config.py. generators/ builds synthetic captchas (5 types: normal, math, 3d_text, 3d_rotate, 3d_slider), models/ contains the classifier, CTC expert models, and regression models, training/ owns datasets and training scripts, and inference/ contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in data/, checkpoints/, and onnx_models/.
Build, Test, and Development Commands
Use uv for environment and dependency management.
uv syncinstalls the base runtime dependencies frompyproject.toml.uv sync --extra serverinstalls HTTP service dependencies.uv run captcha generate --type normal --num 1000generates synthetic training data. Types:normal,math,3d_text,3d_rotate,3d_slider,classifier.uv run captcha train --model normaltrains one model;uv run captcha train --allruns the full order:normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier.uv run captcha export --allexports all trained models to ONNX.uv run captcha export --model 3d_textexports a single model;3d_textis automatically mapped tothreed_text.uv run captcha predict image.pngruns auto-routing inference; add--type normalto skip classification.uv run captcha predict-dir ./test_imagesruns batch inference on a directory.uv run captcha serve --port 8080starts the optional HTTP API whenserver.pyis implemented.
Coding Style & Naming Conventions
Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly normal, math, 3d_text, 3d_rotate, 3d_slider, and classifier. Checkpoint/ONNX file names use threed_text, threed_rotate, threed_slider (underscored, no hyphens). Preserve the design rules from CLAUDE.md: float32 training/export, CPU-safe ops, and greedy CTC decoding for OCR models. Regression models (3d_rotate, 3d_slider) output sigmoid [0,1] scaled by REGRESSION_RANGE. normal uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in inference/math_eval.py.
Training & Data Rules
- All training scripts must set the global random seed (
random,numpy,torch) viaconfig.RANDOM_SEEDbefore training begins. - All DataLoaders use
num_workers=0for cross-platform consistency. - Generator parameters (rotation, noise, shadow, etc.) must come from
config.GENERATE_CONFIG, not hardcoded values. CRNNDatasetemits awarnings.warnwhen a label contains characters outside the configured charset, rather than silently dropping them.RegressionDatasetparses numeric labels from filenames and normalizes to [0,1] vialabel_range.
Data & Testing Guidelines
Synthetic generator output should use {label}_{index:06d}.png; real labeled samples should use {label}_{anything}.png. For regression types, label is the numeric value (angle or offset). Sample targets are defined in config.py. Save best checkpoints to checkpoints/ and export matching ONNX files to onnx_models/. Use pytest, place tests under tests/ as test_<feature>.py, and run them with uv run pytest. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
Commit & Pull Request Guidelines
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as Add classifier export smoke test. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.
Documentation Sync
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update AGENTS.md and CLAUDE.md in the same patch.