add details on IQ2_KS
Browse files- .gitattributes +6 -0
- README.md +72 -1
.gitattributes
CHANGED
|
@@ -33,6 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 36 |
imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat filter=lfs diff=lfs merge=lfs -text
|
| 37 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00001-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00002-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
|
@@ -41,3 +42,8 @@ IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00004-of-00007.gguf filter=lfs diff=lfs
|
|
| 41 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00005-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 42 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00006-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00007-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
<<<<<<< Updated upstream
|
| 37 |
imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat filter=lfs diff=lfs merge=lfs -text
|
| 38 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00001-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00002-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 42 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00005-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00006-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 44 |
IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00007-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
=======
|
| 46 |
+
*.gguf filter=lfs diff=lfs merge=lfs -text
|
| 47 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
| 48 |
+
imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
|
| 49 |
+
>>>>>>> Stashed changes
|
README.md
CHANGED
|
@@ -24,8 +24,10 @@ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https:
|
|
| 24 |
Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://discord.com/channels/1238219753324281886/1238239819017097246/1238676202357784650) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
|
| 25 |
|
| 26 |
## Quants
|
|
|
|
|
|
|
| 27 |
#### * `IQ3_KS` 281.463 GiB (3.598 BPW)
|
| 28 |
-
Special mix with all new `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` output "head".
|
| 29 |
|
| 30 |
Final estimate: PPL = 3.3167 +/- 0.01789
|
| 31 |
|
|
@@ -94,6 +96,75 @@ custom=$(
|
|
| 94 |
|
| 95 |
</details>
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
## Quick Start
|
| 98 |
```
|
| 99 |
## clone latest ik_llama.cpp
|
|
|
|
| 24 |
Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://discord.com/channels/1238219753324281886/1238239819017097246/1238676202357784650) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
|
| 25 |
|
| 26 |
## Quants
|
| 27 |
+
For some larger non-imatrix ik quant options check out [Kebob/DeepSeek-TNG-R1T2-Chimera-IK_GGUF](https://huggingface.co/Kebob/DeepSeek-TNG-R1T2-Chimera-IK_GGUF)
|
| 28 |
+
|
| 29 |
#### * `IQ3_KS` 281.463 GiB (3.598 BPW)
|
| 30 |
+
Special mix with all new `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` `output` "head".
|
| 31 |
|
| 32 |
Final estimate: PPL = 3.3167 +/- 0.01789
|
| 33 |
|
|
|
|
| 96 |
|
| 97 |
</details>
|
| 98 |
|
| 99 |
+
#### * `IQ2_KS` 203.553 GiB (2.602 BPW)
|
| 100 |
+
Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and new `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` `output` "head".
|
| 101 |
+
|
| 102 |
+
Final estimate: PPL = 3.6254 +/- 0.02001
|
| 103 |
+
|
| 104 |
+
<details>
|
| 105 |
+
|
| 106 |
+
<summary>👈 Secret Recipe</summary>
|
| 107 |
+
|
| 108 |
+
```bash
|
| 109 |
+
#!/usr/bin/env bash
|
| 110 |
+
|
| 111 |
+
custom="
|
| 112 |
+
# First 3 dense layers (0-3) (GPU)
|
| 113 |
+
# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
|
| 114 |
+
blk\.[0-2]\.attn_k_b.*=q5_0
|
| 115 |
+
blk\.[0-2]\.attn_.*=iq5_ks
|
| 116 |
+
blk\.[0-2]\.ffn_down.*=iq5_ks
|
| 117 |
+
blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
|
| 118 |
+
blk\.[0-2]\..*=iq5_ks
|
| 119 |
+
|
| 120 |
+
# All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
|
| 121 |
+
# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
|
| 122 |
+
blk\.[3-9]\.attn_k_b.*=q5_0
|
| 123 |
+
blk\.[1-5][0-9]\.attn_k_b.*=q5_0
|
| 124 |
+
blk\.60\.attn_k_b.*=q5_0
|
| 125 |
+
|
| 126 |
+
blk\.[3-9]\.attn_.*=iq5_ks
|
| 127 |
+
blk\.[1-5][0-9]\.attn_.*=iq5_ks
|
| 128 |
+
blk\.60\.attn_.*=iq5_ks
|
| 129 |
+
|
| 130 |
+
# Shared Expert (3-60) (GPU)
|
| 131 |
+
blk\.[3-9]\.ffn_down_shexp\.weight=iq5_ks
|
| 132 |
+
blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq5_ks
|
| 133 |
+
blk\.60\.ffn_down_shexp\.weight=iq5_ks
|
| 134 |
+
|
| 135 |
+
blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
|
| 136 |
+
blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
|
| 137 |
+
blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
|
| 138 |
+
|
| 139 |
+
# Routed Experts (3-60) (CPU)
|
| 140 |
+
blk\.[3-9]\.ffn_down_exps\.weight=iq3_ks
|
| 141 |
+
blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq3_ks
|
| 142 |
+
blk\.60\.ffn_down_exps\.weight=iq3_ks
|
| 143 |
+
|
| 144 |
+
blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
|
| 145 |
+
blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
|
| 146 |
+
blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
|
| 147 |
+
|
| 148 |
+
# Token embedding and output tensors (GPU)
|
| 149 |
+
token_embd\.weight=iq5_k
|
| 150 |
+
output\.weight=iq6_k
|
| 151 |
+
"
|
| 152 |
+
|
| 153 |
+
custom=$(
|
| 154 |
+
echo "$custom" | grep -v '^#' | \
|
| 155 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
./build/bin/llama-quantize \
|
| 159 |
+
--custom-q "$custom" \
|
| 160 |
+
--imatrix /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat \
|
| 161 |
+
/mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/DeepSeek-TNG-R1T2-Chimera-256x21B-BF16-00001-of-00030.gguf \
|
| 162 |
+
/mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/DeepSeek-TNG-R1T2-Chimera-IQ2_KS.gguf \
|
| 163 |
+
IQ2_KS \
|
| 164 |
+
24
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
|
| 168 |
## Quick Start
|
| 169 |
```
|
| 170 |
## clone latest ik_llama.cpp
|