11 KiB
Repository Guidelines
Project Structure & Module Organization
Use cli.py as the main command entrypoint, exposed as the captcha script from pyproject.toml, and keep shared constants in config.py. generators/ contains seven generators: the five captcha generators (normal, math, 3d_text, 3d_rotate, 3d_slider) plus solver data generators in slide_gen.py and rotate_solver_gen.py. models/ contains the classifier, OCR/CTC models, regression models, the two solver models (gap_detector.py, rotation_regressor.py), and the FunCaptcha Siamese matcher in fun_captcha_siamese.py. training/ owns datasets, shared training utilities, per-model entrypoints, dataset fingerprint helpers in data_fingerprint.py, and the FunCaptcha trainer in train_funcaptcha_rollball.py. inference/ contains the ONNX export path, the runtime pipeline, the dedicated FunCaptcha ONNX runner in fun_captcha.py, math post-processing, and ONNX sidecar metadata helpers in model_metadata.py. solvers/ implements interactive slide/rotate solving, and utils/slide_utils.py generates slider tracks. Runtime artifacts live under data/synthetic/, data/real/, data/real/funcaptcha/, data/classifier/, data/solver/, data/server_tasks/, checkpoints/, and onnx_models/.
Build, Test, and Development Commands
Use uv for environment and dependency management.
uv syncinstalls the base runtime dependencies.uv sync --extra serverinstalls FastAPI service dependencies.uv sync --extra cvinstalls OpenCV for slide solver workflows.uv sync --extra devinstalls pytest.- On Linux
x86_64,uv syncresolvestorchandtorchvisionfrom the official PyTorchcu121index and pins them to2.5.1/0.20.1, which has been validated on GTX 1050 Ti (sm_61). - Keep
onnxruntimecompatible with Python 3.10 when editing dependencies; the current constraint stays below1.24. uv run captcha generate --type normal --num 1000generates captcha training data. Valid types arenormal,math,3d_text,3d_rotate,3d_slider, andclassifier.uv run captcha generate-solver slide --num 30000anduv run captcha generate-solver rotate --num 50000generate solver datasets underdata/solver/.uv run captcha train --model normaltrains one captcha model.uv run captcha train --alltrainsnormal -> math -> 3d_text -> 3d_rotate -> 3d_slider -> classifier.uv run captcha train-solver slidetrainsGapDetectorCNN;uv run captcha train-solver rotatetrainsRotationRegressor.uv run captcha train-funcaptcha --question 4_3d_rollball_animalstrains the dedicated FunCaptcha Siamese matcher from full challenge screenshots underdata/real/funcaptcha/4_3d_rollball_animals/.uv run captcha export --allexports all available ONNX models, includinggap_detectorandrotation_regressor, and writes matching<model>.meta.jsonsidecars.uv run captcha export --model 3d_textmaps tothreed_text. The export loader also accepts internal artifact names such asthreed_rotate,gap_detector,rotation_regressor, andfuncaptcha_rollball_animals;4_3d_rollball_animalsis accepted as an alias for that FunCaptcha artifact.uv run captcha predict image.pngruns auto-routing inference. Add--type normalto skip classification.uv run captcha predict-dir ./test_imagesruns batch inference for.pngand.jpgfiles.uv run captcha predict-funcaptcha image.jpg --question 4_3d_rollball_animalsruns the dedicated FunCaptcha matcher and returnsobjects. It resolves the ONNX in this order:onnx_models/funcaptcha_rollball_animals.onnx-> envFUNCAPTCHA_ROLLBALL_MODEL_PATH-> configured fallback path such as the siblingfuncaptcha-server/model/4_3d_rollball_animals.onnx.uv run captcha solve slide --bg bg.png [--tpl tpl.png]runs the slide solver. It uses template matching first when--tplis provided, then OpenCV edge detection, then CNN fallback.uv run captcha solve rotate --image img.pngruns the rotate solver.uv run captcha serve --host 0.0.0.0 --port 8080starts the implemented FastAPI service inserver.py. It supports synchronous/solveand/solve/upload, plus async task endpoints/createTask,/getTaskResult, and/getBalance, with/api/v1/*compatibility aliases. IfCLIENT_KEYis set in the environment, task endpoints require a matchingclientKey.createTaskacceptscallbackUrl,softId,languagePool, and optionaltask.question;task.question=4_3d_rollball_animalsroutes to the dedicated FunCaptcha matcher and returnssolution.objects.callbackUrlreceives a form-encoded completion callback with configurable retry/backoff inSERVER_CONFIG. IfCALLBACK_SIGNING_SECRETis set, callback requests include HMAC-SHA256 signature headers. Task responses also expose extratask/callbackmetadata for async debugging, and task state is persisted underdata/server_tasks/.uv run pytestruns the test suite.
Coding Style & Naming Conventions
Target Python 3.10-3.12 and follow the existing style: 4-space indentation, snake_case for functions/modules, PascalCase for classes, and short docstrings on public entrypoints. Keep public captcha type ids exactly normal, math, 3d_text, 3d_rotate, 3d_slider, and classifier. Internal checkpoint/ONNX artifact names use threed_text, threed_rotate, threed_slider, and funcaptcha_rollball_animals; solver artifacts are gap_detector and rotation_regressor. Preserve the design rules from CLAUDE.md: float32 training/export, CPU-safe ONNX ops, and greedy CTC decoding for OCR models. normal uses NORMAL_CHARS, math uses MATH_CHARS and must be post-processed through inference/math_eval.py, and 3d_text uses THREED_CHARS. 3d_rotate and 3d_slider output sigmoid values in [0, 1] and scale them with REGRESSION_RANGE; the rotate solver model outputs (sin, cos) on RGB input. The FunCaptcha matcher is a dual-input RGB Siamese model keyed by task.question, not by captchaType. Runtime ONNX artifacts belong under onnx_models/, not models/; external FunCaptcha ONNX files may omit metadata, in which case inference must preserve the external preprocessing contract instead of assuming the repo's centered RGB normalization.
- Do not casually upgrade
torchortorchvision: newer CUDA 12.8 wheels in this repo's previous environment droppedsm_61kernels and failed on GTX 1050 Ti. Re-verify GPU execution before changing the pinned pair.
Training & Data Rules
- Set the global random seed (
random,numpy,torch) fromconfig.RANDOM_SEEDbefore training. - Keep
num_workers=0for all DataLoaders. - Pull generator parameters from
config.GENERATE_CONFIG,config.SOLVER_CONFIG, and related config constants instead of hardcoding them. - Training entrypoints auto-generate missing synthetic data, mix in real data when present, save the best checkpoint to
checkpoints/, and export a matching ONNX file plus<model>.meta.jsonsidecar toonnx_models/at the end. - Synthetic datasets store a
.dataset_meta.jsonfingerprint manifest. If generator source or config snapshot changes, training refreshes the synthetic dataset before continuing. train_utils.pyandtrain_regression_utils.pyonly resume checkpoints when the current synthetic dataset fingerprint matches the checkpoint hash. Legacy checkpoints without a stored hash may resume with a warning; refreshed datasets force a restart from epoch 1.- Legacy
normalandmathdatasets may be adopted into the fingerprint system when no manifest exists, butmathstill validates operator coverage so stale datasets without÷samples are regenerated. train_classifier.pyprepares a balanced classifier dataset indata/classifier/<type>/by symlinking or copying from the current synthetic datasets and rebuilds the derived classifier directories from source data each run.CRNNDatasetwarns when labels contain characters outside the configured charset instead of silently dropping samples.RegressionDatasetparses numeric filename labels and normalizes them to[0, 1]usinglabel_range.RotateSolverDatasetparses angle labels and converts them to(sin, cos)targets.FunCaptchaChallengeDatasetreads full challenge screenshots fromdata/real/funcaptcha/4_3d_rollball_animals/, crops one reference tile plusnum_candidatestop-row candidates, and trains against the answer index from the filename prefix.- Slide solver training labels are the gap center
xcoordinate, normalized againstSOLVER_CONFIG["slide"]["cnn_input_size"][1]. All slide solver branches should return the same center-pointgap_xcontract.
Data & Testing Guidelines
- Synthetic generator output should use
{label}_{index:06d}.png. OCR real samples should keep{label}_{anything}.png. - Regression labels are numeric values in filenames. Captcha regression real data lives under
data/real/3d_rotate/anddata/real/3d_slider/; solver real data lives underdata/solver/slide/real/anddata/solver/rotate/real/. - FunCaptcha real samples use
{answer_index}_{anything}.png|jpg|jpegunderdata/real/funcaptcha/4_3d_rollball_animals/. Each file is the full challenge screenshot, not pre-cropped tiles. data/classifier/is a derived dataset built from per-type captcha samples; do not hand-edit it unless the training flow changes.- ONNX inference should prefer sidecar metadata from
<model>.meta.jsonfor OCR charset decoding, classifier class order, and regression label ranges, withconfig.pyonly as a fallback for older exports. - Tests live under
tests/astest_<feature>.py. Current coverage focuses on generators, model output shapes, math evaluation, CTC decoding, slide solving, and slide track generation. - OpenCV-dependent slide solver tests skip automatically when
opencv-pythonis not installed. For solver work, preferuv sync --extra cv --extra dev. - FastAPI/httpx-dependent server tests skip automatically when the
serverextra is not installed. For HTTP API work, preferuv sync --extra server --extra dev. - For model, routing, solver, export, or CLI changes, add a fast smoke test that covers shape contracts, decoding behavior, routing, solver fallback, or command behavior.
Commit & Pull Request Guidelines
Git history is not available in this workspace snapshot, so use short imperative commit subjects such as Add slide solver export note. Keep pull requests focused, describe affected modules, list the commands you ran, and attach sample outputs when prediction or solver behavior changes.
Documentation Sync
Do not commit large generated datasets unless explicitly required. When a change affects project structure, commands, config, architecture, artifact paths, supported captcha types, or workflow rules, update AGENTS.md and CLAUDE.md in the same patch. Update README.md as well when user-facing commands, solver behavior, or HTTP API behavior changes.