Dream-org/Dream-v0-Instruct-7B · diffuse-cpp: C++ inference engine for Dream on CPU (GGUF format, Q4_K

Hi Dream team! We have built CPU inference support for Dream-v0-Instruct-7B using diffuse-cpp, a C++ inference engine for diffusion language models built on GGML.

Pre-quantized GGUF models

Available at diffuse-cpp/Dream-v0-Instruct-7B-GGUF:

File	Type	Size
dream-7b-f16.gguf	F16	15.2 GB
dream-7b-q8_0.gguf	Q8_0	8.6 GB
dream-7b-q4km.gguf	Q4_K_M	5.3 GB

Performance (Q4_K_M, entropy_exit + inter-step cache, 12 threads)

Prompt	tok/s	Steps	vs llama.cpp
Capital of France?	21.6	2	2.5x
15 x 23?	21.6	2	2.5x
Translate to French	14.3	6	1.7x
Python is_prime()	8.2	7	1.0x
Average	11.6		1.4x

Dream excels at math and code prompts — correctly solves 15x23=345 in just 2 denoising steps at 21.6 tok/s.

Key features

entropy_exit: adaptive scheduler that exits early when the model is confident (2-7 steps for easy prompts vs 16 for hard ones)
Inter-step KV cache: reuses K,V tensors between denoising steps (1.6x average speedup)
Full GQA support: 28 query / 4 KV heads handled natively
QKV biases: preserved at F32 in all quantizations