Zen0 commited on
Commit
35bed4c
·
1 Parent(s): d9c0e48

Fix dataset schema mismatch: load JSON files directly

Browse files

The dataset loader has a restricted schema (9 columns) but data has 35 columns.
Bypass the loader entirely and load JSONL files directly using json reader.

Changed:
- load_dataset('Zen0/AusCyberBench', name=subset, split='train')
+ load_dataset('json', data_files='hf://datasets/Zen0/AusCyberBench/data/{subset}/*.jsonl')

This allows the full schema to be inferred from the actual data files.

Files changed (1) hide show
  1. app.py +6 -1
app.py CHANGED
@@ -77,7 +77,12 @@ def load_benchmark_dataset(subset="australian", num_samples=200):
77
  global dataset_cache
78
 
79
  if dataset_cache is None:
80
- dataset_cache = load_dataset("Zen0/AusCyberBench", name=subset, split="train")
 
 
 
 
 
81
 
82
  # Proportional sampling
83
  import random
 
77
  global dataset_cache
78
 
79
  if dataset_cache is None:
80
+ # Load data files directly as JSON to avoid schema mismatch issues
81
+ dataset_cache = load_dataset(
82
+ "json",
83
+ data_files=f"hf://datasets/Zen0/AusCyberBench/data/{subset}/*.jsonl",
84
+ split="train"
85
+ )
86
 
87
  # Proportional sampling
88
  import random