deepseek-v3.2_512M

A 512M parameter DeepSeek-style language model trained from scratch, exported to GGUF format.

Model Details

  • Architecture: DeepSeek V3.2 style (MLA attention, MoE, MTP)
  • Parameters: ~1.34B (512M active)
  • Training: 9,900 steps on Modal (8x A100-40GB)
  • Final Loss: 10.385
  • Format: GGUF (Q8_0 quantization)

Architecture Configuration

Parameter Value
d_model 2048
n_heads 32
n_layers 24
vocab_size 32000
d_ff 8192
max_seq_len 1024

Usage

With llama.cpp

./main -m deepseek-512M-q8_0.gguf -p "Once upon a time" -n 128

With Ollama

Create a Modelfile:

FROM ./deepseek-512M-q8_0.gguf

TEMPLATE "{{.Prompt}}"

PARAMETER temperature 0.7
PARAMETER top_p 0.9

Then run:

ollama create deepseek-512m -f Modelfile
ollama run deepseek-512m

With LM Studio

  1. Download the GGUF file
  2. Import into LM Studio
  3. Start chatting!

Training Details

This model was trained as part of the DeepSeek-From-Scratch project, implementing DeepSeek V3 architecture from scratch.

Training Infrastructure

  • Platform: Modal Cloud
  • GPUs: 8x NVIDIA A100-40GB
  • Parallelism: TP=2, PP=2, DP=2 (5D parallelism)
  • Precision: BF16

Limitations

This is a research/educational model trained on synthetic data. It is not intended for production use and may generate nonsensical or harmful content.

License

MIT License - See the repository for details.

Downloads last month
151
GGUF
Model size
1B params
Architecture
deepseek
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support