math_trainer / README.md
NorthernTribe-Research's picture
Switch Space trainer defaults to math_conjecture_sota profile and remove DeepSeek references
9a4f619 verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: Math Conjecture Trainer
sdk: gradio
sdk_version: 6.6.0
python_version: '3.10'
app_file: app.py
pinned: false
emoji: 🧮

Math Conjecture Command Center

An autonomous training app for Lean-aware and math-reasoning base models that runs multi-stage curriculum fine-tuning on Space GPU, executes post-training quality evaluation, and publishes only qualified adapters, checkpoints, and run reports to your Hugging Face model repository.

The UI now uses a dark graphite command-center aesthetic with tactical green/cyan telemetry accents, higher-contrast cards, and mobile-friendly stacking while preserving the existing training controls.

This Space is the training app for maths-conjuncture-solutions and is wired to:

  • dataset: NorthernTribe-Research/math-conjecture-training-corpus
  • model repo: NorthernTribe-Research/math-conjecture-model

End-to-end flow

  1. Download released parquet splits (train/validation/test).
  2. Build runtime config from configs/math_conjecture_sota.yaml (default answered-conjecture solver profile).
  3. Run 4-stage curriculum LoRA fine-tuning with scripts/train_sota.py.
  4. Run post-train evaluation (pass@1, pass@k, exact/boxed, family metrics).
  5. Apply quality gate thresholds before hub push.
  6. Emit training_summary.json + post_eval_report.json and stream live telemetry in UI.

Access posture

Credentials and publish permissions are handled by deployment runtime settings.

Operations controls

  • Autonomous Mode: enabled by default; applies full-stage training/eval/gate/publish profile automatically.
  • Continuous Auto-Restart: enabled by default; after each cycle ends, the next training cycle starts automatically until cancelled.
  • Run Evaluation After Training: toggles post-train eval in runtime config.
  • Enforce Quality Gate: enables/disables promotion gate checks.
  • Gate Min pass@1, Gate Min pass@k, Gate Min Rows: runtime gate thresholds.
  • Telemetry Deck: real-time stage progression, runtime posture, training-loss graph (sparkline), artifact index, and recent-run outcomes.
  • Run Log (Live Stream + Summary JSON): unified panel for line-by-line runtime stream, heartbeats, and structured run summary.
  • Validation Mode (No Training): validates pipeline with --dry-run.
  • Force Dataset Redownload: bypasses cached parquet files.
  • Abort Active Run: cancels active subprocess tree.

Artifacts

  • runtime config: workspace/runtime/math_conjecture_sota.runtime.yaml
  • run output root: workspace/runs/math-conjecture-sota
  • final adapter: workspace/runs/math-conjecture-sota/final_adapter
  • training summary: workspace/runs/math-conjecture-sota/training_summary.json
  • post-eval report: workspace/runs/math-conjecture-sota/post_eval_report.json
  • run history index: workspace/runtime/run_history.json
  • per-run records: workspace/runtime/run_records/<run_label>.json

Notes

  • Full training runs on GPU when available and automatically falls back to CPU mode when CUDA is unavailable.
  • Default SOTA profile uses Qwen/Qwen2.5-Math-7B-Instruct for answered-conjecture solving and Lean/formal-proof alignment.
  • Alternate SOTA profile configs/qwen25_math_sota.yaml targets Qwen/Qwen2.5-Math-7B-Instruct.
  • App handles Gradio copy-button compatibility across versions automatically.
  • The interface is optimized to stay legible on mobile, with telemetry cards collapsing into a single-column stack.

Production Hardening

Before deployment, run:

python scripts/preflight_check.py
python -m unittest discover -s tests -v

Optional deeper validation (runs stage-1 dry-run training check):

python scripts/preflight_check.py --run-training-dry-run

Continuous mode now includes two production fail-safes:

  • restart cooldown between cycles (default 15s)
  • circuit breaker that stops after consecutive non-success cycles (default 3)

Environment overrides:

  • CONTINUOUS_RESTART_DELAY_SECONDS (integer, >=0)
  • CONTINUOUS_MAX_CONSECUTIVE_FAILURES (integer, >=1)
  • APP_LOG_MAX_CHARS (integer, >=20000; default 200000)
  • RUN_HISTORY_LIMIT (integer, >=5; default 80)

Recommended runtime secrets posture:

  • set HF_TOKEN / HUGGINGFACE_HUB_TOKEN from Space secrets
  • avoid storing long-lived API tokens in repository files

Detailed deployment/rollback steps are documented in PRODUCTION.md.

Validation Record

  • Latest verification and hardening run details are recorded in VALIDATION_LOG.md.