Align task API and add FunCaptcha support

This commit is contained in:
Hua
2026-03-12 19:32:59 +08:00
parent ef9518deeb
commit bc6776979e
33 changed files with 3446 additions and 672 deletions

View File

@@ -1,36 +1,64 @@
# Repository Guidelines
## Project Structure & Module Organization
Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas (5 types: normal, math, 3d_text, 3d_rotate, 3d_slider), `models/` contains the classifier, CTC expert models, and regression models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`.
Use `cli.py` as the main command entrypoint, exposed as the `captcha` script from `pyproject.toml`, and keep shared constants in `config.py`. `generators/` contains seven generators: the five captcha generators (`normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`) plus solver data generators in `slide_gen.py` and `rotate_solver_gen.py`. `models/` contains the classifier, OCR/CTC models, regression models, the two solver models (`gap_detector.py`, `rotation_regressor.py`), and the FunCaptcha Siamese matcher in `fun_captcha_siamese.py`. `training/` owns datasets, shared training utilities, per-model entrypoints, dataset fingerprint helpers in `data_fingerprint.py`, and the FunCaptcha trainer in `train_funcaptcha_rollball.py`. `inference/` contains the ONNX export path, the runtime pipeline, the dedicated FunCaptcha ONNX runner in `fun_captcha.py`, math post-processing, and ONNX sidecar metadata helpers in `model_metadata.py`. `solvers/` implements interactive slide/rotate solving, and `utils/slide_utils.py` generates slider tracks. Runtime artifacts live under `data/synthetic/`, `data/real/`, `data/real/funcaptcha/`, `data/classifier/`, `data/solver/`, `data/server_tasks/`, `checkpoints/`, and `onnx_models/`.
## Build, Test, and Development Commands
Use `uv` for environment and dependency management.
- `uv sync` installs the base runtime dependencies from `pyproject.toml`.
- `uv sync --extra server` installs HTTP service dependencies.
- `uv run captcha generate --type normal --num 1000` generates synthetic training data. Types: `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, `classifier`.
- `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`.
- `uv run captcha export --all` exports all trained models to ONNX.
- `uv run captcha export --model 3d_text` exports a single model; `3d_text` is automatically mapped to `threed_text`.
- `uv run captcha predict image.png` runs auto-routing inference; add `--type normal` to skip classification.
- `uv run captcha predict-dir ./test_images` runs batch inference on a directory.
- `uv run captcha serve --port 8080` starts the optional HTTP API when `server.py` is implemented.
- `uv sync` installs the base runtime dependencies.
- `uv sync --extra server` installs FastAPI service dependencies.
- `uv sync --extra cv` installs OpenCV for slide solver workflows.
- `uv sync --extra dev` installs pytest.
- On Linux `x86_64`, `uv sync` resolves `torch` and `torchvision` from the official PyTorch `cu121` index and pins them to `2.5.1` / `0.20.1`, which has been validated on GTX 1050 Ti (`sm_61`).
- Keep `onnxruntime` compatible with Python 3.10 when editing dependencies; the current constraint stays below `1.24`.
- `uv run captcha generate --type normal --num 1000` generates captcha training data. Valid types are `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`.
- `uv run captcha generate-solver slide --num 30000` and `uv run captcha generate-solver rotate --num 50000` generate solver datasets under `data/solver/`.
- `uv run captcha train --model normal` trains one captcha model. `uv run captcha train --all` trains `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`.
- `uv run captcha train-solver slide` trains `GapDetectorCNN`; `uv run captcha train-solver rotate` trains `RotationRegressor`.
- `uv run captcha train-funcaptcha --question 4_3d_rollball_animals` trains the dedicated FunCaptcha Siamese matcher from full challenge screenshots under `data/real/funcaptcha/4_3d_rollball_animals/`.
- `uv run captcha export --all` exports all available ONNX models, including `gap_detector` and `rotation_regressor`, and writes matching `<model>.meta.json` sidecars.
- `uv run captcha export --model 3d_text` maps to `threed_text`. The export loader also accepts internal artifact names such as `threed_rotate`, `gap_detector`, `rotation_regressor`, and `funcaptcha_rollball_animals`; `4_3d_rollball_animals` is accepted as an alias for that FunCaptcha artifact.
- `uv run captcha predict image.png` runs auto-routing inference. Add `--type normal` to skip classification.
- `uv run captcha predict-dir ./test_images` runs batch inference for `.png` and `.jpg` files.
- `uv run captcha predict-funcaptcha image.jpg --question 4_3d_rollball_animals` runs the dedicated FunCaptcha matcher and returns `objects`.
- `uv run captcha solve slide --bg bg.png [--tpl tpl.png]` runs the slide solver. It uses template matching first when `--tpl` is provided, then OpenCV edge detection, then CNN fallback.
- `uv run captcha solve rotate --image img.png` runs the rotate solver.
- `uv run captcha serve --host 0.0.0.0 --port 8080` starts the implemented FastAPI service in `server.py`. It supports synchronous `/solve` and `/solve/upload`, plus async task endpoints `/createTask`, `/getTaskResult`, and `/getBalance`, with `/api/v1/*` compatibility aliases. If `CLIENT_KEY` is set in the environment, task endpoints require a matching `clientKey`. `createTask` accepts `callbackUrl`, `softId`, `languagePool`, and optional `task.question`; `task.question=4_3d_rollball_animals` routes to the dedicated FunCaptcha matcher and returns `solution.objects`. `callbackUrl` receives a form-encoded completion callback with configurable retry/backoff in `SERVER_CONFIG`. If `CALLBACK_SIGNING_SECRET` is set, callback requests include HMAC-SHA256 signature headers. Task responses also expose extra `task` / `callback` metadata for async debugging, and task state is persisted under `data/server_tasks/`.
- `uv run pytest` runs the test suite.
## Coding Style & Naming Conventions
Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Checkpoint/ONNX file names use `threed_text`, `threed_rotate`, `threed_slider` (underscored, no hyphens). Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding for OCR models. Regression models (3d_rotate, 3d_slider) output sigmoid [0,1] scaled by `REGRESSION_RANGE`. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`.
Target Python 3.10-3.12 and follow the existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep public captcha type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Internal checkpoint/ONNX artifact names use `threed_text`, `threed_rotate`, `threed_slider`, and `funcaptcha_rollball_animals`; solver artifacts are `gap_detector` and `rotation_regressor`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ONNX ops, and greedy CTC decoding for OCR models. `normal` uses `NORMAL_CHARS`, `math` uses `MATH_CHARS` and must be post-processed through `inference/math_eval.py`, and `3d_text` uses `THREED_CHARS`. `3d_rotate` and `3d_slider` output sigmoid values in `[0, 1]` and scale them with `REGRESSION_RANGE`; the rotate solver model outputs `(sin, cos)` on RGB input. The FunCaptcha matcher is a dual-input RGB Siamese model keyed by `task.question`, not by `captchaType`.
- Do not casually upgrade `torch` or `torchvision`: newer CUDA 12.8 wheels in this repo's previous environment dropped `sm_61` kernels and failed on GTX 1050 Ti. Re-verify GPU execution before changing the pinned pair.
## Training & Data Rules
- All training scripts must set the global random seed (`random`, `numpy`, `torch`) via `config.RANDOM_SEED` before training begins.
- All DataLoaders use `num_workers=0` for cross-platform consistency.
- Generator parameters (rotation, noise, shadow, etc.) must come from `config.GENERATE_CONFIG`, not hardcoded values.
- `CRNNDataset` emits a `warnings.warn` when a label contains characters outside the configured charset, rather than silently dropping them.
- `RegressionDataset` parses numeric labels from filenames and normalizes to [0,1] via `label_range`.
- Set the global random seed (`random`, `numpy`, `torch`) from `config.RANDOM_SEED` before training.
- Keep `num_workers=0` for all DataLoaders.
- Pull generator parameters from `config.GENERATE_CONFIG`, `config.SOLVER_CONFIG`, and related config constants instead of hardcoding them.
- Training entrypoints auto-generate missing synthetic data, mix in real data when present, save the best checkpoint to `checkpoints/`, and export a matching ONNX file plus `<model>.meta.json` sidecar to `onnx_models/` at the end.
- Synthetic datasets store a `.dataset_meta.json` fingerprint manifest. If generator source or config snapshot changes, training refreshes the synthetic dataset before continuing.
- `train_utils.py` and `train_regression_utils.py` only resume checkpoints when the current synthetic dataset fingerprint matches the checkpoint hash. Legacy checkpoints without a stored hash may resume with a warning; refreshed datasets force a restart from epoch 1.
- Legacy `normal` and `math` datasets may be adopted into the fingerprint system when no manifest exists, but `math` still validates operator coverage so stale datasets without `÷` samples are regenerated.
- `train_classifier.py` prepares a balanced classifier dataset in `data/classifier/<type>/` by symlinking or copying from the current synthetic datasets and rebuilds the derived classifier directories from source data each run.
- `CRNNDataset` warns when labels contain characters outside the configured charset instead of silently dropping samples.
- `RegressionDataset` parses numeric filename labels and normalizes them to `[0, 1]` using `label_range`.
- `RotateSolverDataset` parses angle labels and converts them to `(sin, cos)` targets.
- `FunCaptchaChallengeDataset` reads full challenge screenshots from `data/real/funcaptcha/4_3d_rollball_animals/`, crops one reference tile plus `num_candidates` top-row candidates, and trains against the answer index from the filename prefix.
- Slide solver training labels are the gap center `x` coordinate, normalized against `SOLVER_CONFIG["slide"]["cnn_input_size"][1]`. All slide solver branches should return the same center-point `gap_x` contract.
## Data & Testing Guidelines
Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. For regression types, label is the numeric value (angle or offset). Sample targets are defined in `config.py`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_<feature>.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
- Synthetic generator output should use `{label}_{index:06d}.png`. OCR real samples should keep `{label}_{anything}.png`.
- Regression labels are numeric values in filenames. Captcha regression real data lives under `data/real/3d_rotate/` and `data/real/3d_slider/`; solver real data lives under `data/solver/slide/real/` and `data/solver/rotate/real/`.
- FunCaptcha real samples use `{answer_index}_{anything}.png|jpg|jpeg` under `data/real/funcaptcha/4_3d_rollball_animals/`. Each file is the full challenge screenshot, not pre-cropped tiles.
- `data/classifier/` is a derived dataset built from per-type captcha samples; do not hand-edit it unless the training flow changes.
- ONNX inference should prefer sidecar metadata from `<model>.meta.json` for OCR charset decoding, classifier class order, and regression label ranges, with `config.py` only as a fallback for older exports.
- Tests live under `tests/` as `test_<feature>.py`. Current coverage focuses on generators, model output shapes, math evaluation, CTC decoding, slide solving, and slide track generation.
- OpenCV-dependent slide solver tests skip automatically when `opencv-python` is not installed. For solver work, prefer `uv sync --extra cv --extra dev`.
- FastAPI/httpx-dependent server tests skip automatically when the `server` extra is not installed. For HTTP API work, prefer `uv sync --extra server --extra dev`.
- For model, routing, solver, export, or CLI changes, add a fast smoke test that covers shape contracts, decoding behavior, routing, solver fallback, or command behavior.
## Commit & Pull Request Guidelines
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add classifier export smoke test`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add slide solver export note`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction or solver behavior changes.
## Documentation Sync
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch.
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch. Update `README.md` as well when user-facing commands, solver behavior, or HTTP API behavior changes.