# Repository Guidelines ## Project Structure & Module Organization Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas, `models/` contains the classifier and expert OCR models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`. ## Build, Test, and Development Commands Use `uv` for environment and dependency management. - `uv sync` installs the base runtime dependencies from `pyproject.toml`. - `uv sync --extra server` installs HTTP service dependencies. - `uv run captcha generate --type normal --num 1000` generates synthetic training data. - `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d -> classifier`. - `uv run captcha export --all` exports all trained models to ONNX. - `uv run captcha predict image.png` runs auto-routing inference; add `--type normal` to skip classification. - `uv run captcha predict-dir ./test_images` runs batch inference on a directory. - `uv run captcha serve --port 8080` starts the optional HTTP API when `server.py` is implemented. ## Coding Style & Naming Conventions Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d`, and `classifier`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding unless the pipeline is intentionally redesigned. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`. ## Data & Testing Guidelines Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing. ## Commit & Pull Request Guidelines Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add classifier export smoke test`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes. ## Documentation Sync Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch.