# Repository Guidelines ## Project Structure & Module Organization Use `cli.py` as the main command entrypoint, exposed as the `captcha` script from `pyproject.toml`, and keep shared constants in `config.py`. `generators/` contains seven generators: the five captcha generators (`normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`) plus solver data generators in `slide_gen.py` and `rotate_solver_gen.py`. `models/` contains the classifier, OCR/CTC models, regression models, the two solver models (`gap_detector.py`, `rotation_regressor.py`), and the FunCaptcha Siamese matcher in `fun_captcha_siamese.py`. `training/` owns datasets, shared training utilities, per-model entrypoints, dataset fingerprint helpers in `data_fingerprint.py`, and the FunCaptcha trainer in `train_funcaptcha_rollball.py`. `inference/` contains the ONNX export path, the runtime pipeline, the dedicated FunCaptcha ONNX runner in `fun_captcha.py`, math post-processing, and ONNX sidecar metadata helpers in `model_metadata.py`. `solvers/` implements interactive slide/rotate solving, and `utils/slide_utils.py` generates slider tracks. Runtime artifacts live under `data/synthetic/`, `data/real/`, `data/real/funcaptcha/`, `data/classifier/`, `data/solver/`, `data/server_tasks/`, `checkpoints/`, and `onnx_models/`. ## Build, Test, and Development Commands Use `uv` for environment and dependency management. - `uv sync` installs the base runtime dependencies. - `uv sync --extra server` installs FastAPI service dependencies. - `uv sync --extra cv` installs OpenCV for slide solver workflows. - `uv sync --extra dev` installs pytest. - On Linux `x86_64`, `uv sync` resolves `torch` and `torchvision` from the official PyTorch `cu121` index and pins them to `2.5.1` / `0.20.1`, which has been validated on GTX 1050 Ti (`sm_61`). - Keep `onnxruntime` compatible with Python 3.10 when editing dependencies; the current constraint stays below `1.24`. - `uv run captcha generate --type normal --num 1000` generates captcha training data. Valid types are `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. - `uv run captcha generate-solver slide --num 30000` and `uv run captcha generate-solver rotate --num 50000` generate solver datasets under `data/solver/`. - `uv run captcha train --model normal` trains one captcha model. `uv run captcha train --all` trains `normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier`. - `uv run captcha train-solver slide` trains `GapDetectorCNN`; `uv run captcha train-solver rotate` trains `RotationRegressor`. - `uv run captcha train-funcaptcha --question 4_3d_rollball_animals` trains the dedicated FunCaptcha Siamese matcher from full challenge screenshots under `data/real/funcaptcha/4_3d_rollball_animals/`. - `uv run captcha export --all` exports all available ONNX models, including `gap_detector` and `rotation_regressor`, and writes matching `.meta.json` sidecars. - `uv run captcha export --model 3d_text` maps to `threed_text`. The export loader also accepts internal artifact names such as `threed_rotate`, `gap_detector`, `rotation_regressor`, and `funcaptcha_rollball_animals`; `4_3d_rollball_animals` is accepted as an alias for that FunCaptcha artifact. - `uv run captcha predict image.png` runs auto-routing inference. Add `--type normal` to skip classification. - `uv run captcha predict-dir ./test_images` runs batch inference for `.png` and `.jpg` files. - `uv run captcha predict-funcaptcha image.jpg --question 4_3d_rollball_animals` runs the dedicated FunCaptcha matcher and returns `objects`. - `uv run captcha solve slide --bg bg.png [--tpl tpl.png]` runs the slide solver. It uses template matching first when `--tpl` is provided, then OpenCV edge detection, then CNN fallback. - `uv run captcha solve rotate --image img.png` runs the rotate solver. - `uv run captcha serve --host 0.0.0.0 --port 8080` starts the implemented FastAPI service in `server.py`. It supports synchronous `/solve` and `/solve/upload`, plus async task endpoints `/createTask`, `/getTaskResult`, and `/getBalance`, with `/api/v1/*` compatibility aliases. If `CLIENT_KEY` is set in the environment, task endpoints require a matching `clientKey`. `createTask` accepts `callbackUrl`, `softId`, `languagePool`, and optional `task.question`; `task.question=4_3d_rollball_animals` routes to the dedicated FunCaptcha matcher and returns `solution.objects`. `callbackUrl` receives a form-encoded completion callback with configurable retry/backoff in `SERVER_CONFIG`. If `CALLBACK_SIGNING_SECRET` is set, callback requests include HMAC-SHA256 signature headers. Task responses also expose extra `task` / `callback` metadata for async debugging, and task state is persisted under `data/server_tasks/`. - `uv run pytest` runs the test suite. ## Coding Style & Naming Conventions Target Python 3.10-3.12 and follow the existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep public captcha type ids exactly `normal`, `math`, `3d_text`, `3d_rotate`, `3d_slider`, and `classifier`. Internal checkpoint/ONNX artifact names use `threed_text`, `threed_rotate`, `threed_slider`, and `funcaptcha_rollball_animals`; solver artifacts are `gap_detector` and `rotation_regressor`. Preserve the design rules from `CLAUDE.md`: float32 training/export, CPU-safe ONNX ops, and greedy CTC decoding for OCR models. `normal` uses `NORMAL_CHARS`, `math` uses `MATH_CHARS` and must be post-processed through `inference/math_eval.py`, and `3d_text` uses `THREED_CHARS`. `3d_rotate` and `3d_slider` output sigmoid values in `[0, 1]` and scale them with `REGRESSION_RANGE`; the rotate solver model outputs `(sin, cos)` on RGB input. The FunCaptcha matcher is a dual-input RGB Siamese model keyed by `task.question`, not by `captchaType`. - Do not casually upgrade `torch` or `torchvision`: newer CUDA 12.8 wheels in this repo's previous environment dropped `sm_61` kernels and failed on GTX 1050 Ti. Re-verify GPU execution before changing the pinned pair. ## Training & Data Rules - Set the global random seed (`random`, `numpy`, `torch`) from `config.RANDOM_SEED` before training. - Keep `num_workers=0` for all DataLoaders. - Pull generator parameters from `config.GENERATE_CONFIG`, `config.SOLVER_CONFIG`, and related config constants instead of hardcoding them. - Training entrypoints auto-generate missing synthetic data, mix in real data when present, save the best checkpoint to `checkpoints/`, and export a matching ONNX file plus `.meta.json` sidecar to `onnx_models/` at the end. - Synthetic datasets store a `.dataset_meta.json` fingerprint manifest. If generator source or config snapshot changes, training refreshes the synthetic dataset before continuing. - `train_utils.py` and `train_regression_utils.py` only resume checkpoints when the current synthetic dataset fingerprint matches the checkpoint hash. Legacy checkpoints without a stored hash may resume with a warning; refreshed datasets force a restart from epoch 1. - Legacy `normal` and `math` datasets may be adopted into the fingerprint system when no manifest exists, but `math` still validates operator coverage so stale datasets without `รท` samples are regenerated. - `train_classifier.py` prepares a balanced classifier dataset in `data/classifier//` by symlinking or copying from the current synthetic datasets and rebuilds the derived classifier directories from source data each run. - `CRNNDataset` warns when labels contain characters outside the configured charset instead of silently dropping samples. - `RegressionDataset` parses numeric filename labels and normalizes them to `[0, 1]` using `label_range`. - `RotateSolverDataset` parses angle labels and converts them to `(sin, cos)` targets. - `FunCaptchaChallengeDataset` reads full challenge screenshots from `data/real/funcaptcha/4_3d_rollball_animals/`, crops one reference tile plus `num_candidates` top-row candidates, and trains against the answer index from the filename prefix. - Slide solver training labels are the gap center `x` coordinate, normalized against `SOLVER_CONFIG["slide"]["cnn_input_size"][1]`. All slide solver branches should return the same center-point `gap_x` contract. ## Data & Testing Guidelines - Synthetic generator output should use `{label}_{index:06d}.png`. OCR real samples should keep `{label}_{anything}.png`. - Regression labels are numeric values in filenames. Captcha regression real data lives under `data/real/3d_rotate/` and `data/real/3d_slider/`; solver real data lives under `data/solver/slide/real/` and `data/solver/rotate/real/`. - FunCaptcha real samples use `{answer_index}_{anything}.png|jpg|jpeg` under `data/real/funcaptcha/4_3d_rollball_animals/`. Each file is the full challenge screenshot, not pre-cropped tiles. - `data/classifier/` is a derived dataset built from per-type captcha samples; do not hand-edit it unless the training flow changes. - ONNX inference should prefer sidecar metadata from `.meta.json` for OCR charset decoding, classifier class order, and regression label ranges, with `config.py` only as a fallback for older exports. - Tests live under `tests/` as `test_.py`. Current coverage focuses on generators, model output shapes, math evaluation, CTC decoding, slide solving, and slide track generation. - OpenCV-dependent slide solver tests skip automatically when `opencv-python` is not installed. For solver work, prefer `uv sync --extra cv --extra dev`. - FastAPI/httpx-dependent server tests skip automatically when the `server` extra is not installed. For HTTP API work, prefer `uv sync --extra server --extra dev`. - For model, routing, solver, export, or CLI changes, add a fast smoke test that covers shape contracts, decoding behavior, routing, solver fallback, or command behavior. ## Commit & Pull Request Guidelines Git history is not available in this workspace snapshot, so use short imperative commit subjects such as `Add slide solver export note`. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction or solver behavior changes. ## Documentation Sync Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update `AGENTS.md` and `CLAUDE.md` in the same patch. Update `README.md` as well when user-facing commands, solver behavior, or HTTP API behavior changes.