Number of Pretraining Tokens per Qwen 2.5 Model?
1
#9 opened 4 months ago
by
RylanSchaeffer
evaluation pipeline
#8 opened 11 months ago
by
SantoshHF
Hello, is this 1.5B model trained from scratch, or is it distilled like LLaMA 3.2?
1
#7 opened about 1 year ago
by
adol01
recommended context length for SFT?
#6 opened over 1 year ago
by
brando
Why is there no model.safetensors.index.json file?
1
#5 opened over 1 year ago
by
Infernaught
[AUTOMATED] Model Memory Requirements
#3 opened over 1 year ago
by
model-sizer-bot
lm_eval results is weird
5
#2 opened over 1 year ago
by
xianf
Upload ONNX weights
#1 opened over 1 year ago
by
Xenova