Align task API and add FunCaptcha support

2026-03-12 19:32:59 +08:00
parent ef9518deeb
commit bc6776979e
33 changed files with 3446 additions and 672 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,36 +1,64 @@
 # Repository Guidelines

 ## Project Structure & Module Organization
-Use `cli.py` as the main entrypoint and keep shared settings in `config.py`. `generators/` builds synthetic captchas (5 types: normal, math, 3d_text, 3d_rotate, 3d_slider), `models/` contains the classifier, CTC expert models, and regression models, `training/` owns datasets and training scripts, and `inference/` contains the ONNX pipeline, export code, and math post-processing. Runtime artifacts live in `data/`, `checkpoints/`, and `onnx_models/`.
+Use `cli.py` as the main command entrypoint, exposed as the `captcha` script from `pyproject.toml`, and keep shared constants in `config.py`. `generators/` contains seven generators: the five captcha generators (`normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`) plus solver data generators in `slide_gen.py` and `rotate_solver_gen.py`. `models/` contains the classifier, OCR/CTC models, regression models, the two solver models (`gap_detector.py`, `rotation_regressor.py`), and the FunCaptcha Siamese matcher in `fun_captcha_siamese.py`. `training/` owns datasets, shared training utilities, per-model entrypoints, dataset fingerprint helpers in `data_fingerprint.py`, and the FunCaptcha trainer in `train_funcaptcha_rollball.py`. `inference/` contains the ONNX export path, the runtime pipeline, the dedicated FunCaptcha ONNX runner in `fun_captcha.py`, math post-processing, and ONNX sidecar metadata helpers in `model_metadata.py`. `solvers/` implements interactive slide/rotate solving, and `utils/slide_utils.py` generates slider tracks. Runtime artifacts live under `data/synthetic/`, `data/real/`, `data/real/funcaptcha/`, `data/classifier/`, `data/solver/`, `data/server_tasks/`, `checkpoints/`, and `onnx_models/`.

 ## Build, Test, and Development Commands
 Use `uv` for environment and dependency management.

- `uv sync` installs the base runtime dependencies from `pyproject.toml`.
- `uv sync --extra server` installs HTTP service dependencies.
- `uv run captcha generate --type normal --num 1000` generates synthetic training data. Types: `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, `classifier`.
- `uv run captcha train --model normal` trains one model; `uv run captcha train --all` runs the full order: `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`.
- `uv run captcha export --all` exports all trained models to ONNX.
- `uv run captcha export --model 3d_text` exports a single model; `3d_text` is automatically mapped to `threed_text`.
- `uv run captcha predict image.png` runs auto-routing inference; add `--type normal` to skip classification.
- `uv run captcha predict-dir ./test_images` runs batch inference on a directory.
- `uv run captcha serve --port 8080` starts the optional HTTP API when `server.py` is implemented.
+- `uv sync` installs the base runtime dependencies.
+- `uv sync --extra server` installs FastAPI service dependencies.
+- `uv sync --extra cv` installs OpenCV for slide solver workflows.
+- `uv sync --extra dev` installs pytest.
+- On Linux `x86_64`, `uv sync` resolves `torch` and `torchvision` from the official PyTorch `cu121` index and pins them to `2.5.1` / `0.20.1`, which has been validated on GTX 1050 Ti (`sm_61`).
+- Keep `onnxruntime` compatible with Python 3.10 when editing dependencies; the current constraint stays below `1.24`.
+- `uv run captcha generate --type normal --num 1000` generates captcha training data. Valid types are `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`.
+- `uv run captcha generate-solver slide --num 30000` and `uv run captcha generate-solver rotate --num 50000` generate solver datasets under `data/solver/`.
+- `uv run captcha train --model normal` trains one captcha model. `uv run captcha train --all` trains `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`.
+- `uv run captcha train-solver slide` trains `GapDetectorCNN`; `uv run captcha train-solver rotate` trains `RotationRegressor`.
+- `uv run captcha train-funcaptcha --question 4_3d_rollball_animals` trains the dedicated FunCaptcha Siamese matcher from full challenge screenshots under `data/real/funcaptcha/4_3d_rollball_animals/`.
+- `uv run captcha export --all` exports all available ONNX models, including `gap_detector` and `rotation_regressor`, and writes matching `<model>.meta.json` sidecars.
+- `uv run captcha export --model 3d_text` maps to `threed_text`. The export loader also accepts internal artifact names such as `threed_rotate`, `gap_detector`, `rotation_regressor`, and `funcaptcha_rollball_animals`; `4_3d_rollball_animals` is accepted as an alias for that FunCaptcha artifact.
+- `uv run captcha predict image.png` runs auto-routing inference. Add `--type normal` to skip classification.
+- `uv run captcha predict-dir ./test_images` runs batch inference for `.png` and `.jpg` files.
+- `uv run captcha predict-funcaptcha image.jpg --question 4_3d_rollball_animals` runs the dedicated FunCaptcha matcher and returns `objects`.
+- `uv run captcha solve slide --bg bg.png [--tpl tpl.png]` runs the slide solver. It uses template matching first when `--tpl` is provided, then OpenCV edge detection, then CNN fallback.
+- `uv run captcha solve rotate --image img.png` runs the rotate solver.
+- `uv run captcha serve --host 0.0.0.0 --port 8080` starts the implemented FastAPI service in `server.py`. It supports synchronous `/solve` and `/solve/upload`, plus async task endpoints `/createTask`, `/getTaskResult`, and `/getBalance`, with `/api/v1/*` compatibility aliases. If `CLIENT_KEY` is set in the environment, task endpoints require a matching `clientKey`. `createTask` accepts `callbackUrl`, `softId`, `languagePool`, and optional `task.question`; `task.question=4_3d_rollball_animals` routes to the dedicated FunCaptcha matcher and returns `solution.objects`. `callbackUrl` receives a form-encoded completion callback with configurable retry/backoff in `SERVER_CONFIG`. If `CALLBACK_SIGNING_SECRET` is set, callback requests include HMAC-SHA256 signature headers. Task responses also expose extra `task` / `callback` metadata for async debugging, and task state is persisted under `data/server_tasks/`.
+- `uv run pytest` runs the test suite.

 ## Coding Style & Naming Conventions
-Target Python 3.10+ and follow existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep captcha-type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Checkpoint/ONNX file names use `threed_text`, `threed_rotate`, `threed_slider` (underscored, no hyphens). Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ops, and greedy CTC decoding for OCR models. Regression models (3d_rotate, 3d_slider) output sigmoid [0,1] scaled by `REGRESSION_RANGE`. `normal` uses the local configured charset and currently includes confusing characters; math captchas must be recognized as strings and then evaluated in `inference/math_eval.py`.
+Target Python 3.10-3.12 and follow the existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep public captcha type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Internal checkpoint/ONNX artifact names use `threed_text`, `threed_rotate`, `threed_slider`, and `funcaptcha_rollball_animals`; solver artifacts are `gap_detector` and `rotation_regressor`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ONNX ops, and greedy CTC decoding for OCR models. `normal` uses `NORMAL_CHARS`, `math` uses `MATH_CHARS` and must be post-processed through `inference/math_eval.py`, and `3d_text` uses `THREED_CHARS`. `3d_rotate` and `3d_slider` output sigmoid values in `[0, 1]` and scale them with `REGRESSION_RANGE`; the rotate solver model outputs `(sin, cos)` on RGB input. The FunCaptcha matcher is a dual-input RGB Siamese model keyed by `task.question`, not by `captchaType`.
+- Do not casually upgrade `torch` or `torchvision`: newer CUDA 12.8 wheels in this repo's previous environment dropped `sm_61` kernels and failed on GTX 1050 Ti. Re-verify GPU execution before changing the pinned pair.

 ## Training & Data Rules
- All training scripts must set the global random seed (`random`, `numpy`, `torch`) via `config.RANDOM_SEED` before training begins.
- All DataLoaders use `num_workers=0` for cross-platform consistency.
- Generator parameters (rotation, noise, shadow, etc.) must come from `config.GENERATE_CONFIG`, not hardcoded values.
- `CRNNDataset` emits a `warnings.warn` when a label contains characters outside the configured charset, rather than silently dropping them.
- `RegressionDataset` parses numeric labels from filenames and normalizes to [0,1] via `label_range`.
+- Set the global random seed (`random`, `numpy`, `torch`) from `config.RANDOM_SEED` before training.
+- Keep `num_workers=0` for all DataLoaders.
+- Pull generator parameters from `config.GENERATE_CONFIG`, `config.SOLVER_CONFIG`, and related config constants instead of hardcoding them.
+- Training entrypoints auto-generate missing synthetic data, mix in real data when present, save the best checkpoint to `checkpoints/`, and export a matching ONNX file plus `<model>.meta.json` sidecar to `onnx_models/` at the end.
+- Synthetic datasets store a `.dataset_meta.json` fingerprint manifest. If generator source or config snapshot changes, training refreshes the synthetic dataset before continuing.
+- `train_utils.py` and `train_regression_utils.py` only resume checkpoints when the current synthetic dataset fingerprint matches the checkpoint hash. Legacy checkpoints without a stored hash may resume with a warning; refreshed datasets force a restart from epoch 1.
+- Legacy `normal` and `math` datasets may be adopted into the fingerprint system when no manifest exists, but `math` still validates operator coverage so stale datasets without `÷` samples are regenerated.
+- `train_classifier.py` prepares a balanced classifier dataset in `data/classifier/<type>/` by symlinking or copying from the current synthetic datasets and rebuilds the derived classifier directories from source data each run.
+- `CRNNDataset` warns when labels contain characters outside the configured charset instead of silently dropping samples.
+- `RegressionDataset` parses numeric filename labels and normalizes them to `[0, 1]` using `label_range`.
+- `RotateSolverDataset` parses angle labels and converts them to `(sin, cos)` targets.
+- `FunCaptchaChallengeDataset` reads full challenge screenshots from `data/real/funcaptcha/4_3d_rollball_animals/`, crops one reference tile plus `num_candidates` top-row candidates, and trains against the answer index from the filename prefix.
+- Slide solver training labels are the gap center `x` coordinate, normalized against `SOLVER_CONFIG["slide"]["cnn_input_size"][1]`. All slide solver branches should return the same center-point `gap_x` contract.

 ## Data & Testing Guidelines
-Synthetic generator output should use `{label}_{index:06d}.png`; real labeled samples should use `{label}_{anything}.png`. For regression types, label is the numeric value (angle or offset). Sample targets are defined in `config.py`. Save best checkpoints to `checkpoints/` and export matching ONNX files to `onnx_models/`. Use `pytest`, place tests under `tests/` as `test_<feature>.py`, and run them with `uv run pytest`. For model, data, or routing changes, add a fast smoke test for shapes, decoding, CLI behavior, or pipeline routing.
+- Synthetic generator output should use `{label}_{index:06d}.png`. OCR real samples should keep `{label}_{anything}.png`.
+- Regression labels are numeric values in filenames. Captcha regression real data lives under `data/real/3d_rotate/` and `data/real/3d_slider/`; solver real data lives under `data/solver/slide/real/` and `data/solver/rotate/real/`.
+- FunCaptcha real samples use `{answer_index}_{anything}.png|jpg|jpeg` under `data/real/funcaptcha/4_3d_rollball_animals/`. Each file is the full challenge screenshot, not pre-cropped tiles.
+- `data/classifier/` is a derived dataset built from per-type captcha samples; do not hand-edit it unless the training flow changes.
+- ONNX inference should prefer sidecar metadata from `<model>.meta.json` for OCR charset decoding, classifier class order, and regression label ranges, with `config.py` only as a fallback for older exports.
+- Tests live under `tests/` as `test_<feature>.py`. Current coverage focuses on generators, model output shapes, math evaluation, CTC decoding, slide solving, and slide track generation.
+- OpenCV-dependent slide solver tests skip automatically when `opencv-python` is not installed. For solver work, prefer `uv sync --extra cv --extra dev`.
+- FastAPI/httpx-dependent server tests skip automatically when the `server` extra is not installed. For HTTP API work, prefer `uv sync --extra server --extra dev`.
+- For model, routing, solver, export, or CLI changes, add a fast smoke test that covers shape contracts, decoding behavior, routing, solver fallback, or command behavior.

 ## Commit & Pull Request Guidelines
-Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add classifier export smoke test`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction behavior changes.
+Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add slide solver export note`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction or solver behavior changes.

 ## Documentation Sync
-Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch.
+Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch. Update `README.md` as well when user-facing commands, solver behavior, or HTTP API behavior changes.