ubergarm
/

DeepSeek-TNG-R1T2-Chimera-GGUF

@@ -33,6 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00001-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00002-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
@@ -41,3 +42,8 @@ IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00004-of-00007.gguf filter=lfs diff=lfs
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00005-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00006-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00007-of-00007.gguf filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+<<<<<<< Updated upstream
 imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00001-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00002-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00005-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00006-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00007-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
+=======
+*.gguf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
+>>>>>>> Stashed changes

README.md CHANGED Viewed

@@ -24,8 +24,10 @@ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https:
 Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://discord.com/channels/1238219753324281886/1238239819017097246/1238676202357784650) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
 ## Quants
 #### * `IQ3_KS` 281.463 GiB (3.598 BPW)
-Special mix with all new `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` output "head".
 Final estimate: PPL = 3.3167 +/- 0.01789
@@ -94,6 +96,75 @@ custom=$(
 </details>
 ## Quick Start
 ```
 ## clone latest ik_llama.cpp

 Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://discord.com/channels/1238219753324281886/1238239819017097246/1238676202357784650) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
 ## Quants
+For some larger non-imatrix ik quant options check out [Kebob/DeepSeek-TNG-R1T2-Chimera-IK_GGUF](https://huggingface.co/Kebob/DeepSeek-TNG-R1T2-Chimera-IK_GGUF)
 #### * `IQ3_KS` 281.463 GiB (3.598 BPW)
+Special mix with all new `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` `output` "head".
 Final estimate: PPL = 3.3167 +/- 0.01789
 </details>
+#### * `IQ2_KS` 203.553 GiB (2.602 BPW)
+Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and new `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` `output` "head".
+Final estimate: PPL = 3.6254 +/- 0.02001
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+# First 3 dense layers (0-3) (GPU)
+# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
+blk\.[0-2]\.attn_k_b.*=q5_0
+blk\.[0-2]\.attn_.*=iq5_ks
+blk\.[0-2]\.ffn_down.*=iq5_ks
+blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
+blk\.[0-2]\..*=iq5_ks
+# All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
+# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
+blk\.[3-9]\.attn_k_b.*=q5_0
+blk\.[1-5][0-9]\.attn_k_b.*=q5_0
+blk\.60\.attn_k_b.*=q5_0
+blk\.[3-9]\.attn_.*=iq5_ks
+blk\.[1-5][0-9]\.attn_.*=iq5_ks
+blk\.60\.attn_.*=iq5_ks
+# Shared Expert (3-60) (GPU)
+blk\.[3-9]\.ffn_down_shexp\.weight=iq5_ks
+blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq5_ks
+blk\.60\.ffn_down_shexp\.weight=iq5_ks
+blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts (3-60) (CPU)
+blk\.[3-9]\.ffn_down_exps\.weight=iq3_ks
+blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq3_ks
+blk\.60\.ffn_down_exps\.weight=iq3_ks
+blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
+# Token embedding and output tensors (GPU)
+token_embd\.weight=iq5_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat \
+    /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/DeepSeek-TNG-R1T2-Chimera-256x21B-BF16-00001-of-00030.gguf \
+    /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/DeepSeek-TNG-R1T2-Chimera-IQ2_KS.gguf \
+    IQ2_KS \
+    24
+```
 ## Quick Start
 ```
 ## clone latest ik_llama.cpp