Commit History

Improve UX: Move evaluation settings to top of page
62d78bf

Zen0 commited on

Add persistent leaderboard feature - solves GPU timeout issue
2338c46

Zen0 commited on

Add GPU timeout warnings and best practices for multi-model evaluation
f8a48f3

Zen0 commited on

Fix ZeroGPU timeout: reduce to 60s and lower default sample size
886558c

Zen0 commited on

Fix GPU timeout: extend duration from 60s to 180s
b53b535

Zen0 commited on

Fix dependency issues and remove gated models
83123ec

Zen0 Claude commited on

Remove flag emoji from chart title to fix font warnings
629509e

Zen0 Claude commited on

Simplify to 32 tested, stable models for reliability
ddb1d4e

Zen0 Claude commited on

Major update: 51 models including 2025 releases & enhanced visuals
5129b9d

Zen0 Claude commited on

Fix model compatibility issues and update model count
cf9c9bb

Zen0 Claude commited on

Reduce default max_tokens to speed up evaluation
74d9bb4

Zen0 commited on

Fix model generation: disable KV cache to avoid DynamicCache error
31cbca7

Zen0 commited on

Add comprehensive debugging for 0% accuracy issue
f47aa21

Zen0 commited on

Add debug logging and improve answer extraction
ac9d3f4

Zen0 commited on

Fix ranking medals: handle fewer than 3 models
b124e72

Zen0 commited on

Fix schema mismatch: load files individually and strip metadata
fc15652

Zen0 commited on

Fix dataset schema mismatch: load JSON files directly
35bed4c

Zen0 commited on

Fix dataset loading: use correct config syntax
d9c0e48

Zen0 commited on

Fix generator wrapper: use 'yield from' instead of lambda
60927c5

Zen0 commited on

Fix GPU decorator: move from generator to actual GPU function
5e0c53a

Zen0 commited on

Add @spaces.GPU decorator for HuggingFace GPU support
0414451

Zen0 commited on

Initial deployment of AusCyberBench Evaluation Dashboard
7d0c82c

Zen0 commited on