Files
CaptchBreaker/AGENTS.md
2026-03-10 18:47:29 +08:00

2.9 KiB

Repository Guidelines

Project Structure & Module Organization

Use cli.py as the main entrypoint and keep shared settings in config.py. generators/ builds synthetic captchas, models/ contains the classifier and expert OCR models, training/ owns datasets and training scripts, and inference/ contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in data/, checkpoints/, and onnx_models/.

Build, Test, and Development Commands

Use uv for environment and dependency management.

  • uv sync installs the base runtime dependencies from pyproject.toml.
  • uv sync --extra server installs HTTP service dependencies.
  • uv run captcha generate --type normal --num 1000 generates synthetic training data.
  • uv run captcha train --model normal trains one model; uv run captcha train --all runs the full order: normal -> math -> 3d -> classifier.
  • uv run captcha export --all exports all trained models to ONNX.
  • uv run captcha predict image.png runs auto-routing inference; add --type normal to skip classification.
  • uv run captcha predict-dir ./test_images runs batch inference on a directory.
  • uv run captcha serve --port 8080 starts the optional HTTP API when server.py is implemented.

Coding Style & Naming Conventions

Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly normal, math, 3d, and classifier. Preserve the design rules from CLAUDE.md: float32 training/export, CPU-safe ops, and greedy CTC decoding unless the pipeline is intentionally redesigned. normal uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in inference/math_eval.py.

Data & Testing Guidelines

Synthetic generator output should use {label}_{index:06d}.png; real labeled samples should use {label}_{anything}.png. Save best checkpoints to checkpoints/ and export matching ONNX files to onnx_models/. Use pytest, place tests under tests/ as test_<feature>.py, and run them with uv run pytest. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.

Commit & Pull Request Guidelines

Git history is not available in this workspace snapshot, so use short imperative commit subjects such as Add classifier export smoke test. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.

Documentation Sync

Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update AGENTS.md and CLAUDE.md in the same patch.