# Repository Guidelines ## Project Structure & Module Organization Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas (5 types: normal, math, 3d_text, 3d_rotate, 3d_slider), `models/` contains the classifier, CTC expert models, and regression models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`. ## Build, Test, and Development Commands Use `uv` for environment and dependency management. - `uv sync` installs the base runtime dependencies from `pyproject.toml`. - `uv sync --extra server` installs HTTP service dependencies. - `uv run captcha generate --type normal --num 1000` generates synthetic training data. Types: `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, `classifier`. - `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`. - `uv run captcha export --all` exports all trained models to ONNX. - `uv run captcha export --model 3d_text` exports a single model; `3d_text` is automatically mapped to `threed_text`. - `uv run captcha predict image.png` runs auto-routing inference; add `--type normal` to skip classification. - `uv run captcha predict-dir ./test_images` runs batch inference on a directory. - `uv run captcha serve --port 8080` starts the optional HTTP API when `server.py` is implemented. ## Coding Style & Naming Conventions Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Checkpoint/ONNX file names use `threed_text`, `threed_rotate`, `threed_slider` (underscored, no hyphens). Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding for OCR models. Regression models (3d_rotate, 3d_slider) output sigmoid [0,1] scaled by `REGRESSION_RANGE`. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`. ## Training & Data Rules - All training scripts must set the global random seed (`random`, `numpy`, `torch`) via `config.RANDOM_SEED` before training begins. - All DataLoaders use `num_workers=0` for cross-platform consistency. - Generator parameters (rotation, noise, shadow, etc.) must come from `config.GENERATE_CONFIG`, not hardcoded values. - `CRNNDataset` emits a `warnings.warn` when a label contains characters outside the configured charset, rather than silently dropping them. - `RegressionDataset` parses numeric labels from filenames and normalizes to [0,1] via `label_range`. ## Data & Testing Guidelines Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. For regression types, label is the numeric value (angle or offset). Sample targets are defined in `config.py`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing. ## Commit & Pull Request Guidelines Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add classifier export smoke test`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes. ## Documentation Sync Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch.