Spaces:
Sleeping
Sleeping
Zen0 commited on
Commit ·
35bed4c
1
Parent(s): d9c0e48
Fix dataset schema mismatch: load JSON files directly
Browse filesThe dataset loader has a restricted schema (9 columns) but data has 35 columns.
Bypass the loader entirely and load JSONL files directly using json reader.
Changed:
- load_dataset('Zen0/AusCyberBench', name=subset, split='train')
+ load_dataset('json', data_files='hf://datasets/Zen0/AusCyberBench/data/{subset}/*.jsonl')
This allows the full schema to be inferred from the actual data files.
app.py
CHANGED
|
@@ -77,7 +77,12 @@ def load_benchmark_dataset(subset="australian", num_samples=200):
|
|
| 77 |
global dataset_cache
|
| 78 |
|
| 79 |
if dataset_cache is None:
|
| 80 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
# Proportional sampling
|
| 83 |
import random
|
|
|
|
| 77 |
global dataset_cache
|
| 78 |
|
| 79 |
if dataset_cache is None:
|
| 80 |
+
# Load data files directly as JSON to avoid schema mismatch issues
|
| 81 |
+
dataset_cache = load_dataset(
|
| 82 |
+
"json",
|
| 83 |
+
data_files=f"hf://datasets/Zen0/AusCyberBench/data/{subset}/*.jsonl",
|
| 84 |
+
split="train"
|
| 85 |
+
)
|
| 86 |
|
| 87 |
# Proportional sampling
|
| 88 |
import random
|