https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16
(The instruct model weights are in FP8)
I've used this:
https://huggingface.co/meituan/DeepSeek-R1-Channel-INT8/tree/main/inference
Hoped I can make it work on my CPU... :P
@ubergarm you might have the resources!? 😀
Follow-up
With the smaller context length dataset the training has succeeded.
No success so far, the training data contains some larger contexts and it fails just before complete the first epoch.
(dataset: DevQuasar/brainstorm-v3.1_vicnua_1k)
If anyone has further suggestion to the bnb config (with ROCm on MI100)?
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
Now testing with my other dataset that is smaller seems I have a lower memory need
DevQuasar/brainstorm_vicuna_1k
It's failed by the morning, need to find more room to decrease the memory