2.9 KiB
Repository Guidelines
Project Structure & Module Organization
Use cli.py as the main entrypoint and keep shared settings in config.py. generators/ builds synthetic captchas, models/ contains the classifier and expert OCR models, training/ owns datasets and training scripts, and inference/ contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in data/, checkpoints/, and onnx_models/.
Build, Test, and Development Commands
Use uv for environment and dependency management.
uv syncinstalls the base runtime dependencies frompyproject.toml.uv sync --extra serverinstalls HTTP service dependencies.uv run captcha generate --type normal --num 1000generates synthetic training data.uv run captcha train --model normaltrains one model;uv run captcha train --allruns the full order:normal -> math -> 3d -> classifier.uv run captcha export --allexports all trained models to ONNX.uv run captcha predict image.pngruns auto-routing inference; add--type normalto skip classification.uv run captcha predict-dir ./test_imagesruns batch inference on a directory.uv run captcha serve --port 8080starts the optional HTTP API whenserver.pyis implemented.
Coding Style & Naming Conventions
Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly normal, math, 3d, and classifier. Preserve the design rules from CLAUDE.md: float32 training/export, CPU-safe ops, and greedy CTC decoding unless the pipeline is intentionally redesigned. normal uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in inference/math_eval.py.
Data & Testing Guidelines
Synthetic generator output should use {label}_{index:06d}.png; real labeled samples should use {label}_{anything}.png. Save best checkpoints to checkpoints/ and export matching ONNX files to onnx_models/. Use pytest, place tests under tests/ as test_<feature>.py, and run them with uv run pytest. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
Commit & Pull Request Guidelines
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as Add classifier export smoke test. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.
Documentation Sync
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update AGENTS.md and CLAUDE.md in the same patch.