ubergarm commited on
Commit
2f2b5a7
·
1 Parent(s): cce15f7

add details on IQ2_KS

Browse files
Files changed (2) hide show
  1. .gitattributes +6 -0
  2. README.md +72 -1
.gitattributes CHANGED
@@ -33,6 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
36
  imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat filter=lfs diff=lfs merge=lfs -text
37
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00001-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
38
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00002-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
@@ -41,3 +42,8 @@ IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00004-of-00007.gguf filter=lfs diff=lfs
41
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00005-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
42
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00006-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
43
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00007-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ <<<<<<< Updated upstream
37
  imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat filter=lfs diff=lfs merge=lfs -text
38
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00001-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
39
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00002-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
 
42
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00005-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
43
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00006-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
44
  IQ3_KS/DeepSeek-TNG-R1T2-Chimera-IQ3_KS-00007-of-00007.gguf filter=lfs diff=lfs merge=lfs -text
45
+ =======
46
+ *.gguf filter=lfs diff=lfs merge=lfs -text
47
+ *.png filter=lfs diff=lfs merge=lfs -text
48
+ imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
49
+ >>>>>>> Stashed changes
README.md CHANGED
@@ -24,8 +24,10 @@ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https:
24
  Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://discord.com/channels/1238219753324281886/1238239819017097246/1238676202357784650) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
25
 
26
  ## Quants
 
 
27
  #### * `IQ3_KS` 281.463 GiB (3.598 BPW)
28
- Special mix with all new `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` output "head".
29
 
30
  Final estimate: PPL = 3.3167 +/- 0.01789
31
 
@@ -94,6 +96,75 @@ custom=$(
94
 
95
  </details>
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ## Quick Start
98
  ```
99
  ## clone latest ik_llama.cpp
 
24
  Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://discord.com/channels/1238219753324281886/1238239819017097246/1238676202357784650) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
25
 
26
  ## Quants
27
+ For some larger non-imatrix ik quant options check out [Kebob/DeepSeek-TNG-R1T2-Chimera-IK_GGUF](https://huggingface.co/Kebob/DeepSeek-TNG-R1T2-Chimera-IK_GGUF)
28
+
29
  #### * `IQ3_KS` 281.463 GiB (3.598 BPW)
30
+ Special mix with all new `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` `output` "head".
31
 
32
  Final estimate: PPL = 3.3167 +/- 0.01789
33
 
 
96
 
97
  </details>
98
 
99
+ #### * `IQ2_KS` 203.553 GiB (2.602 BPW)
100
+ Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and new `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn and shared expert. `iq5_k` `token_embd` and `iq6_k` `output` "head".
101
+
102
+ Final estimate: PPL = 3.6254 +/- 0.02001
103
+
104
+ <details>
105
+
106
+ <summary>👈 Secret Recipe</summary>
107
+
108
+ ```bash
109
+ #!/usr/bin/env bash
110
+
111
+ custom="
112
+ # First 3 dense layers (0-3) (GPU)
113
+ # Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
114
+ blk\.[0-2]\.attn_k_b.*=q5_0
115
+ blk\.[0-2]\.attn_.*=iq5_ks
116
+ blk\.[0-2]\.ffn_down.*=iq5_ks
117
+ blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
118
+ blk\.[0-2]\..*=iq5_ks
119
+
120
+ # All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
121
+ # Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
122
+ blk\.[3-9]\.attn_k_b.*=q5_0
123
+ blk\.[1-5][0-9]\.attn_k_b.*=q5_0
124
+ blk\.60\.attn_k_b.*=q5_0
125
+
126
+ blk\.[3-9]\.attn_.*=iq5_ks
127
+ blk\.[1-5][0-9]\.attn_.*=iq5_ks
128
+ blk\.60\.attn_.*=iq5_ks
129
+
130
+ # Shared Expert (3-60) (GPU)
131
+ blk\.[3-9]\.ffn_down_shexp\.weight=iq5_ks
132
+ blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq5_ks
133
+ blk\.60\.ffn_down_shexp\.weight=iq5_ks
134
+
135
+ blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
136
+ blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
137
+ blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
138
+
139
+ # Routed Experts (3-60) (CPU)
140
+ blk\.[3-9]\.ffn_down_exps\.weight=iq3_ks
141
+ blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq3_ks
142
+ blk\.60\.ffn_down_exps\.weight=iq3_ks
143
+
144
+ blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
145
+ blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
146
+ blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
147
+
148
+ # Token embedding and output tensors (GPU)
149
+ token_embd\.weight=iq5_k
150
+ output\.weight=iq6_k
151
+ "
152
+
153
+ custom=$(
154
+ echo "$custom" | grep -v '^#' | \
155
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
156
+ )
157
+
158
+ ./build/bin/llama-quantize \
159
+ --custom-q "$custom" \
160
+ --imatrix /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/imatrix-DeepSeek-TNG-R1T2-Chimera-Q8_0.dat \
161
+ /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/DeepSeek-TNG-R1T2-Chimera-256x21B-BF16-00001-of-00030.gguf \
162
+ /mnt/raid/models/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/DeepSeek-TNG-R1T2-Chimera-IQ2_KS.gguf \
163
+ IQ2_KS \
164
+ 24
165
+ ```
166
+
167
+
168
  ## Quick Start
169
  ```
170
  ## clone latest ik_llama.cpp