29 lines
2.9 KiB
Markdown
29 lines
2.9 KiB
Markdown
# Repository Guidelines
|
|
|
|
## Project Structure & Module Organization
|
|
Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas, `models/` contains the classifier and expert OCR models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`.
|
|
|
|
## Build, Test, and Development Commands
|
|
Use `uv` for environment and dependency management.
|
|
|
|
- `uv sync` installs the base runtime dependencies from `pyproject.toml`.
|
|
- `uv sync --extra server` installs HTTP service dependencies.
|
|
- `uv run captcha generate --type normal --num 1000` generates synthetic training data.
|
|
- `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d -> classifier`.
|
|
- `uv run captcha export --all` exports all trained models to ONNX.
|
|
- `uv run captcha predict image.png` runs auto-routing inference; add `--type normal` to skip classification.
|
|
- `uv run captcha predict-dir ./test_images` runs batch inference on a directory.
|
|
- `uv run captcha serve --port 8080` starts the optional HTTP API when `server.py` is implemented.
|
|
|
|
## Coding Style & Naming Conventions
|
|
Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d`, and `classifier`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding unless the pipeline is intentionally redesigned. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`.
|
|
|
|
## Data & Testing Guidelines
|
|
Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_<feature>.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
|
|
|
|
## Commit & Pull Request Guidelines
|
|
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add classifier export smoke test`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.
|
|
|
|
## Documentation Sync
|
|
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch.
|