Files
CaptchBreaker/AGENTS.md

11 KiB

Repository Guidelines

Project Structure & Module Organization

Use cli.py as the main command entrypoint, exposed as the captcha script from pyproject.toml, and keep shared constants in config.py. generators/ contains seven generators: the five captcha generators (normal, math, 3d_text, 3d_rotate, 3d_slider) plus solver data generators in slide_gen.py and rotate_solver_gen.py. models/ contains the classifier, OCR/CTC models, regression models, the two solver models (gap_detector.py, rotation_regressor.py), and the FunCaptcha Siamese matcher in fun_captcha_siamese.py. training/ owns datasets, shared training utilities, per-model entrypoints, dataset fingerprint helpers in data_fingerprint.py, and the FunCaptcha trainer in train_funcaptcha_rollball.py. inference/ contains the ONNX export path, the runtime pipeline, the dedicated FunCaptcha ONNX runner in fun_captcha.py, math post-processing, and ONNX sidecar metadata helpers in model_metadata.py. solvers/ implements interactive slide/rotate solving, and utils/slide_utils.py generates slider tracks. Runtime artifacts live under data/synthetic/, data/real/, data/real/funcaptcha/, data/classifier/, data/solver/, data/server_tasks/, checkpoints/, and onnx_models/.

Build, Test, and Development Commands

Use uv for environment and dependency management.

  • uv sync installs the base runtime dependencies.
  • uv sync --extra server installs FastAPI service dependencies.
  • uv sync --extra cv installs OpenCV for slide solver workflows.
  • uv sync --extra dev installs pytest.
  • On Linux x86_64, uv sync resolves torch and torchvision from the official PyTorch cu121 index and pins them to 2.5.1 / 0.20.1, which has been validated on GTX 1050 Ti (sm_61).
  • Keep onnxruntime compatible with Python 3.10 when editing dependencies; the current constraint stays below 1.24.
  • uv run captcha generate --type normal --num 1000 generates captcha training data. Valid types are normal, math, 3d_text, 3d_rotate, 3d_slider, and classifier.
  • uv run captcha generate-solver slide --num 30000 and uv run captcha generate-solver rotate --num 50000 generate solver datasets under data/solver/.
  • uv run captcha train --model normal trains one captcha model. uv run captcha train --all trains normal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier.
  • uv run captcha train-solver slide trains GapDetectorCNN; uv run captcha train-solver rotate trains RotationRegressor.
  • uv run captcha train-funcaptcha --question 4_3d_rollball_animals trains the dedicated FunCaptcha Siamese matcher from full challenge screenshots under data/real/funcaptcha/4_3d_rollball_animals/.
  • uv run captcha export --all exports all available ONNX models, including gap_detector and rotation_regressor, and writes matching <model>.meta.json sidecars.
  • uv run captcha export --model 3d_text maps to threed_text. The export loader also accepts internal artifact names such as threed_rotate, gap_detector, rotation_regressor, and funcaptcha_rollball_animals; 4_3d_rollball_animals is accepted as an alias for that FunCaptcha artifact.
  • uv run captcha predict image.png runs auto-routing inference. Add --type normal to skip classification.
  • uv run captcha predict-dir ./test_images runs batch inference for .png and .jpg files.
  • uv run captcha predict-funcaptcha image.jpg --question 4_3d_rollball_animals runs the dedicated FunCaptcha matcher and returns objects. It resolves the ONNX in this order: onnx_models/funcaptcha_rollball_animals.onnx -> env FUNCAPTCHA_ROLLBALL_MODEL_PATH -> configured fallback path such as the sibling funcaptcha-server/model/4_3d_rollball_animals.onnx.
  • uv run captcha solve slide --bg bg.png [--tpl tpl.png] runs the slide solver. It uses template matching first when --tpl is provided, then OpenCV edge detection, then CNN fallback.
  • uv run captcha solve rotate --image img.png runs the rotate solver.
  • uv run captcha serve --host 0.0.0.0 --port 8080 starts the implemented FastAPI service in server.py. It supports synchronous /solve and /solve/upload, plus async task endpoints /createTask, /getTaskResult, and /getBalance, with /api/v1/* compatibility aliases. If CLIENT_KEY is set in the environment, task endpoints require a matching clientKey. createTask accepts callbackUrl, softId, languagePool, and optional task.question; task.question=4_3d_rollball_animals routes to the dedicated FunCaptcha matcher and returns solution.objects. callbackUrl receives a form-encoded completion callback with configurable retry/backoff in SERVER_CONFIG. If CALLBACK_SIGNING_SECRET is set, callback requests include HMAC-SHA256 signature headers. Task responses also expose extra task / callback metadata for async debugging, and task state is persisted under data/server_tasks/.
  • uv run pytest runs the test suite.

Coding Style & Naming Conventions

Target Python 3.10-3.12 and follow the existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep public captcha type ids exactly normal, math, 3d_text, 3d_rotate, 3d_slider, and classifier. Internal checkpoint/ONNX artifact names use threed_text, threed_rotate, threed_slider, and funcaptcha_rollball_animals; solver artifacts are gap_detector and rotation_regressor. Preserve the design rules from CLAUDE.md: float32 training/export, CPU-safe ONNX ops, and greedy CTC decoding for OCR models. normal uses NORMAL_CHARS, math uses MATH_CHARS and must be post-processed through inference/math_eval.py, and 3d_text uses THREED_CHARS. 3d_rotate and 3d_slider output sigmoid values in [0, 1] and scale them with REGRESSION_RANGE; the rotate solver model outputs (sin, cos) on RGB input. The FunCaptcha matcher is a dual-input RGB Siamese model keyed by task.question, not by captchaType. Runtime ONNX artifacts belong under onnx_models/, not models/; external FunCaptcha ONNX files may omit metadata, in which case inference must preserve the external preprocessing contract instead of assuming the repo's centered RGB normalization.

  • Do not casually upgrade torch or torchvision: newer CUDA 12.8 wheels in this repo's previous environment dropped sm_61 kernels and failed on GTX 1050 Ti. Re-verify GPU execution before changing the pinned pair.

Training & Data Rules

  • Set the global random seed (random, numpy, torch) from config.RANDOM_SEED before training.
  • Keep num_workers=0 for all DataLoaders.
  • Pull generator parameters from config.GENERATE_CONFIG, config.SOLVER_CONFIG, and related config constants instead of hardcoding them.
  • Training entrypoints auto-generate missing synthetic data, mix in real data when present, save the best checkpoint to checkpoints/, and export a matching ONNX file plus <model>.meta.json sidecar to onnx_models/ at the end.
  • Synthetic datasets store a .dataset_meta.json fingerprint manifest. If generator source or config snapshot changes, training refreshes the synthetic dataset before continuing.
  • train_utils.py and train_regression_utils.py only resume checkpoints when the current synthetic dataset fingerprint matches the checkpoint hash. Legacy checkpoints without a stored hash may resume with a warning; refreshed datasets force a restart from epoch 1.
  • Legacy normal and math datasets may be adopted into the fingerprint system when no manifest exists, but math still validates operator coverage so stale datasets without ÷ samples are regenerated.
  • train_classifier.py prepares a balanced classifier dataset in data/classifier/<type>/ by symlinking or copying from the current synthetic datasets and rebuilds the derived classifier directories from source data each run.
  • CRNNDataset warns when labels contain characters outside the configured charset instead of silently dropping samples.
  • RegressionDataset parses numeric filename labels and normalizes them to [0, 1] using label_range.
  • RotateSolverDataset parses angle labels and converts them to (sin, cos) targets.
  • FunCaptchaChallengeDataset reads full challenge screenshots from data/real/funcaptcha/4_3d_rollball_animals/, crops one reference tile plus num_candidates top-row candidates, and trains against the answer index from the filename prefix.
  • Slide solver training labels are the gap center x coordinate, normalized against SOLVER_CONFIG["slide"]["cnn_input_size"][1]. All slide solver branches should return the same center-point gap_x contract.

Data & Testing Guidelines

  • Synthetic generator output should use {label}_{index:06d}.png. OCR real samples should keep {label}_{anything}.png.
  • Regression labels are numeric values in filenames. Captcha regression real data lives under data/real/3d_rotate/ and data/real/3d_slider/; solver real data lives under data/solver/slide/real/ and data/solver/rotate/real/.
  • FunCaptcha real samples use {answer_index}_{anything}.png|jpg|jpeg under data/real/funcaptcha/4_3d_rollball_animals/. Each file is the full challenge screenshot, not pre-cropped tiles.
  • data/classifier/ is a derived dataset built from per-type captcha samples; do not hand-edit it unless the training flow changes.
  • ONNX inference should prefer sidecar metadata from <model>.meta.json for OCR charset decoding, classifier class order, and regression label ranges, with config.py only as a fallback for older exports.
  • Tests live under tests/ as test_<feature>.py. Current coverage focuses on generators, model output shapes, math evaluation, CTC decoding, slide solving, and slide track generation.
  • OpenCV-dependent slide solver tests skip automatically when opencv-python is not installed. For solver work, prefer uv sync --extra cv --extra dev.
  • FastAPI/httpx-dependent server tests skip automatically when the server extra is not installed. For HTTP API work, prefer uv sync --extra server --extra dev.
  • For model, routing, solver, export, or CLI changes, add a fast smoke test that covers shape contracts, decoding behavior, routing, solver fallback, or command behavior.

Commit & Pull Request Guidelines

Git history is not available in this workspace snapshot, so use short imperative commit subjects such as Add slide solver export note. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction or solver behavior changes.

Documentation Sync

Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update AGENTS.md and CLAUDE.md in the same patch. Update README.md as well when user-facing commands, solver behavior, or HTTP API behavior changes.