deepseek-v3.2_512M

A 512M parameter DeepSeek-style language model trained from scratch, exported to GGUF format.

Model Details

Architecture: DeepSeek V3.2 style (MLA attention, MoE, MTP)
Parameters: ~1.34B (512M active)
Training: 9,900 steps on Modal (8x A100-40GB)
Final Loss: 10.385
Format: GGUF (Q8_0 quantization)

Architecture Configuration

Parameter	Value
d_model	2048
n_heads	32
n_layers	24
vocab_size	32000
d_ff	8192
max_seq_len	1024

Usage

With llama.cpp

./main -m deepseek-512M-q8_0.gguf -p "Once upon a time" -n 128

With Ollama

Create a Modelfile:

FROM ./deepseek-512M-q8_0.gguf

TEMPLATE "{{.Prompt}}"

PARAMETER temperature 0.7
PARAMETER top_p 0.9

Then run:

ollama create deepseek-512m -f Modelfile
ollama run deepseek-512m

With LM Studio

Download the GGUF file
Import into LM Studio
Start chatting!

Training Details

This model was trained as part of the DeepSeek-From-Scratch project, implementing DeepSeek V3 architecture from scratch.

Training Infrastructure

Platform: Modal Cloud
GPUs: 8x NVIDIA A100-40GB
Parallelism: TP=2, PP=2, DP=2 (5D parallelism)
Precision: BF16

Limitations

This is a research/educational model trained on synthetic data. It is not intended for production use and may generate nonsensical or harmful content.

License

MIT License - See the repository for details.

Downloads last month: 151

GGUF

Model size

1B params

Architecture

deepseek

Hardware compatibility

8-bit