Files
CaptchBreaker/AGENTS.md

65 lines
11 KiB
Markdown

# Repository Guidelines
## Project Structure & Module Organization
Use `cli.py` as the main command entrypoint, exposed as the `captcha` script from `pyproject.toml`, and keep shared constants in `config.py`. `generators/` contains seven generators: the five captcha generators (`normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`) plus solver data generators in `slide_gen.py` and `rotate_solver_gen.py`. `models/` contains the classifier, OCR/CTC models, regression models, the two solver models (`gap_detector.py`, `rotation_regressor.py`), and the FunCaptcha Siamese matcher in `fun_captcha_siamese.py`. `training/` owns datasets, shared training utilities, per-model entrypoints, dataset fingerprint helpers in `data_fingerprint.py`, and the FunCaptcha trainer in `train_funcaptcha_rollball.py`. `inference/` contains the ONNX export path, the runtime pipeline, the dedicated FunCaptcha ONNX runner in `fun_captcha.py`, math post-processing, and ONNX sidecar metadata helpers in `model_metadata.py`. `solvers/` implements interactive slide/rotate solving, and `utils/slide_utils.py` generates slider tracks. Runtime artifacts live under `data/synthetic/`, `data/real/`, `data/real/funcaptcha/`, `data/classifier/`, `data/solver/`, `data/server_tasks/`, `checkpoints/`, and `onnx_models/`.
## Build, Test, and Development Commands
Use `uv` for environment and dependency management.
- `uv sync` installs the base runtime dependencies.
- `uv sync --extra server` installs FastAPI service dependencies.
- `uv sync --extra cv` installs OpenCV for slide solver workflows.
- `uv sync --extra dev` installs pytest.
- On Linux `x86_64`, `uv sync` resolves `torch` and `torchvision` from the official PyTorch `cu121` index and pins them to `2.5.1` / `0.20.1`, which has been validated on GTX 1050 Ti (`sm_61`).
- Keep `onnxruntime` compatible with Python 3.10 when editing dependencies; the current constraint stays below `1.24`.
- `uv run captcha generate --type normal --num 1000` generates captcha training data. Valid types are `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`.
- `uv run captcha generate-solver slide --num 30000` and `uv run captcha generate-solver rotate --num 50000` generate solver datasets under `data/solver/`.
- `uv run captcha train --model normal` trains one captcha model. `uv run captcha train --all` trains `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`.
- `uv run captcha train-solver slide` trains `GapDetectorCNN`; `uv run captcha train-solver rotate` trains `RotationRegressor`.
- `uv run captcha train-funcaptcha --question 4_3d_rollball_animals` trains the dedicated FunCaptcha Siamese matcher from full challenge screenshots under `data/real/funcaptcha/4_3d_rollball_animals/`.
- `uv run captcha export --all` exports all available ONNX models, including `gap_detector` and `rotation_regressor`, and writes matching `<model>.meta.json` sidecars.
- `uv run captcha export --model 3d_text` maps to `threed_text`. The export loader also accepts internal artifact names such as `threed_rotate`, `gap_detector`, `rotation_regressor`, and `funcaptcha_rollball_animals`; `4_3d_rollball_animals` is accepted as an alias for that FunCaptcha artifact.
- `uv run captcha predict image.png` runs auto-routing inference. Add `--type normal` to skip classification.
- `uv run captcha predict-dir ./test_images` runs batch inference for `.png` and `.jpg` files.
- `uv run captcha predict-funcaptcha image.jpg --question 4_3d_rollball_animals` runs the dedicated FunCaptcha matcher and returns `objects`. It resolves the ONNX in this order: `onnx_models/funcaptcha_rollball_animals.onnx` -> env `FUNCAPTCHA_ROLLBALL_MODEL_PATH` -> configured fallback path such as the sibling `funcaptcha-server/model/4_3d_rollball_animals.onnx`.
- `uv run captcha solve slide --bg bg.png [--tpl tpl.png]` runs the slide solver. It uses template matching first when `--tpl` is provided, then OpenCV edge detection, then CNN fallback.
- `uv run captcha solve rotate --image img.png` runs the rotate solver.
- `uv run captcha serve --host 0.0.0.0 --port 8080` starts the implemented FastAPI service in `server.py`. It supports synchronous `/solve` and `/solve/upload`, plus async task endpoints `/createTask`, `/getTaskResult`, and `/getBalance`, with `/api/v1/*` compatibility aliases. If `CLIENT_KEY` is set in the environment, task endpoints require a matching `clientKey`. `createTask` accepts `callbackUrl`, `softId`, `languagePool`, and optional `task.question`; `task.question=4_3d_rollball_animals` routes to the dedicated FunCaptcha matcher and returns `solution.objects`. `callbackUrl` receives a form-encoded completion callback with configurable retry/backoff in `SERVER_CONFIG`. If `CALLBACK_SIGNING_SECRET` is set, callback requests include HMAC-SHA256 signature headers. Task responses also expose extra `task` / `callback` metadata for async debugging, and task state is persisted under `data/server_tasks/`.
- `uv run pytest` runs the test suite.
## Coding Style & Naming Conventions
Target Python 3.10-3.12 and follow the existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep public captcha type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Internal checkpoint/ONNX artifact names use `threed_text`, `threed_rotate`, `threed_slider`, and `funcaptcha_rollball_animals`; solver artifacts are `gap_detector` and `rotation_regressor`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ONNX ops, and greedy CTC decoding for OCR models. `normal` uses `NORMAL_CHARS`, `math` uses `MATH_CHARS` and must be post-processed through `inference/math_eval.py`, and `3d_text` uses `THREED_CHARS`. `3d_rotate` and `3d_slider` output sigmoid values in `[0, 1]` and scale them with `REGRESSION_RANGE`; the rotate solver model outputs `(sin, cos)` on RGB input. The FunCaptcha matcher is a dual-input RGB Siamese model keyed by `task.question`, not by `captchaType`. Runtime ONNX artifacts belong under `onnx_models/`, not `models/`; external FunCaptcha ONNX files may omit metadata, in which case inference must preserve the external preprocessing contract instead of assuming the repo's centered RGB normalization.
- Do not casually upgrade `torch` or `torchvision`: newer CUDA 12.8 wheels in this repo's previous environment dropped `sm_61` kernels and failed on GTX 1050 Ti. Re-verify GPU execution before changing the pinned pair.
## Training & Data Rules
- Set the global random seed (`random`, `numpy`, `torch`) from `config.RANDOM_SEED` before training.
- Keep `num_workers=0` for all DataLoaders.
- Pull generator parameters from `config.GENERATE_CONFIG`, `config.SOLVER_CONFIG`, and related config constants instead of hardcoding them.
- Training entrypoints auto-generate missing synthetic data, mix in real data when present, save the best checkpoint to `checkpoints/`, and export a matching ONNX file plus `<model>.meta.json` sidecar to `onnx_models/` at the end.
- Synthetic datasets store a `.dataset_meta.json` fingerprint manifest. If generator source or config snapshot changes, training refreshes the synthetic dataset before continuing.
- `train_utils.py` and `train_regression_utils.py` only resume checkpoints when the current synthetic dataset fingerprint matches the checkpoint hash. Legacy checkpoints without a stored hash may resume with a warning; refreshed datasets force a restart from epoch 1.
- Legacy `normal` and `math` datasets may be adopted into the fingerprint system when no manifest exists, but `math` still validates operator coverage so stale datasets without `÷` samples are regenerated.
- `train_classifier.py` prepares a balanced classifier dataset in `data/classifier/<type>/` by symlinking or copying from the current synthetic datasets and rebuilds the derived classifier directories from source data each run.
- `CRNNDataset` warns when labels contain characters outside the configured charset instead of silently dropping samples.
- `RegressionDataset` parses numeric filename labels and normalizes them to `[0, 1]` using `label_range`.
- `RotateSolverDataset` parses angle labels and converts them to `(sin, cos)` targets.
- `FunCaptchaChallengeDataset` reads full challenge screenshots from `data/real/funcaptcha/4_3d_rollball_animals/`, crops one reference tile plus `num_candidates` top-row candidates, and trains against the answer index from the filename prefix.
- Slide solver training labels are the gap center `x` coordinate, normalized against `SOLVER_CONFIG["slide"]["cnn_input_size"][1]`. All slide solver branches should return the same center-point `gap_x` contract.
## Data & Testing Guidelines
- Synthetic generator output should use `{label}_{index:06d}.png`. OCR real samples should keep `{label}_{anything}.png`.
- Regression labels are numeric values in filenames. Captcha regression real data lives under `data/real/3d_rotate/` and `data/real/3d_slider/`; solver real data lives under `data/solver/slide/real/` and `data/solver/rotate/real/`.
- FunCaptcha real samples use `{answer_index}_{anything}.png|jpg|jpeg` under `data/real/funcaptcha/4_3d_rollball_animals/`. Each file is the full challenge screenshot, not pre-cropped tiles.
- `data/classifier/` is a derived dataset built from per-type captcha samples; do not hand-edit it unless the training flow changes.
- ONNX inference should prefer sidecar metadata from `<model>.meta.json` for OCR charset decoding, classifier class order, and regression label ranges, with `config.py` only as a fallback for older exports.
- Tests live under `tests/` as `test_<feature>.py`. Current coverage focuses on generators, model output shapes, math evaluation, CTC decoding, slide solving, and slide track generation.
- OpenCV-dependent slide solver tests skip automatically when `opencv-python` is not installed. For solver work, prefer `uv sync --extra cv --extra dev`.
- FastAPI/httpx-dependent server tests skip automatically when the `server` extra is not installed. For HTTP API work, prefer `uv sync --extra server --extra dev`.
- For model, routing, solver, export, or CLI changes, add a fast smoke test that covers shape contracts, decoding behavior, routing, solver fallback, or command behavior.
## Commit & Pull Request Guidelines
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add slide solver export note`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction or solver behavior changes.
## Documentation Sync
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch. Update `README.md` as well when user-facing commands, solver behavior, or HTTP API behavior changes.