Expand 3D captcha into three subtypes: 3d_text, 3d_rotate, 3d_slider
Split the single "3d" captcha type into three independent expert models: - 3d_text: 3D perspective text OCR (renamed from old "3d", CTC-based ThreeDCNN) - 3d_rotate: rotation angle regression (new RegressionCNN, circular loss) - 3d_slider: slider offset regression (new RegressionCNN, SmoothL1 loss) CAPTCHA_TYPES expanded from 3 to 5 classes. Classifier samples updated to 50000 (10000 per class). New generators, model, dataset, training utilities, and full pipeline/export/CLI support for all subtypes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
18
AGENTS.md
18
AGENTS.md
@@ -1,25 +1,33 @@
|
||||
# Repository Guidelines
|
||||
|
||||
## Project Structure & Module Organization
|
||||
Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas, `models/` contains the classifier and expert OCR models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`.
|
||||
Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas (5 types: normal, math, 3d_text, 3d_rotate, 3d_slider), `models/` contains the classifier, CTC expert models, and regression models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`.
|
||||
|
||||
## Build, Test, and Development Commands
|
||||
Use `uv` for environment and dependency management.
|
||||
|
||||
- `uv sync` installs the base runtime dependencies from `pyproject.toml`.
|
||||
- `uv sync --extra server` installs HTTP service dependencies.
|
||||
- `uv run captcha generate --type normal --num 1000` generates synthetic training data.
|
||||
- `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d -> classifier`.
|
||||
- `uv run captcha generate --type normal --num 1000` generates synthetic training data. Types: `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, `classifier`.
|
||||
- `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`.
|
||||
- `uv run captcha export --all` exports all trained models to ONNX.
|
||||
- `uv run captcha export --model 3d_text` exports a single model; `3d_text` is automatically mapped to `threed_text`.
|
||||
- `uv run captcha predict image.png` runs auto-routing inference; add `--type normal` to skip classification.
|
||||
- `uv run captcha predict-dir ./test_images` runs batch inference on a directory.
|
||||
- `uv run captcha serve --port 8080` starts the optional HTTP API when `server.py` is implemented.
|
||||
|
||||
## Coding Style & Naming Conventions
|
||||
Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d`, and `classifier`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding unless the pipeline is intentionally redesigned. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`.
|
||||
Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Checkpoint/ONNX file names use `threed_text`, `threed_rotate`, `threed_slider` (underscored, no hyphens). Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding for OCR models. Regression models (3d_rotate, 3d_slider) output sigmoid [0,1] scaled by `REGRESSION_RANGE`. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`.
|
||||
|
||||
## Training & Data Rules
|
||||
- All training scripts must set the global random seed (`random`, `numpy`, `torch`) via `config.RANDOM_SEED` before training begins.
|
||||
- All DataLoaders use `num_workers=0` for cross-platform consistency.
|
||||
- Generator parameters (rotation, noise, shadow, etc.) must come from `config.GENERATE_CONFIG`, not hardcoded values.
|
||||
- `CRNNDataset` emits a `warnings.warn` when a label contains characters outside the configured charset, rather than silently dropping them.
|
||||
- `RegressionDataset` parses numeric labels from filenames and normalizes to [0,1] via `label_range`.
|
||||
|
||||
## Data & Testing Guidelines
|
||||
Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_<feature>.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
|
||||
Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. For regression types, label is the numeric value (angle or offset). Sample targets are defined in `config.py`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_<feature>.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
|
||||
|
||||
## Commit & Pull Request Guidelines
|
||||
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add classifier export smoke test`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.
|
||||
|
||||
Reference in New Issue
Block a user