ubergarm/DeepSeek-R1T-Chimera-GGUF

gopi87

Jul 3, 2025

currently i am dowloading this quant hope its done

ubergarm

Owner Jul 3, 2025

No its not done yet afaik, let me go poke the upload with a stick.

ubergarm

Owner Jul 3, 2025

Looks like only one file remains, I'll make sure it finishes up and check my script! About time because the newer R1T2 whatever chimera is already a thing lmao...

ubergarm

Owner Jul 3, 2025

Look like about 8 hours left on the final 35 out of 35 file!

gopi87

Jul 3, 2025

lol 8hr on 10gb uploade!! any have i am so excited to try it. has dual xeon e5 2680 and 260gb ram and 3060 rtx 12gp will add another 12gp for ktransformer. was getting 6 t/sec with Hunyuan-A13B-Instruct wib GGUF q8 with your ik_llama fork

gopi87

Jul 3, 2025

This comment has been hidden

gopi87

Jul 3, 2025

someone need to create tg accout for ik_llama so that newbies can easily look for support i think.

ubergarm

Owner Jul 3, 2025

someone need to create tg accout for ik_llama so that newbies can easily look for support i think.

What is tg account ? telegram? is that kinda like twitter? I'm so old lmao

gopi87

Jul 3, 2025

yes its telegram or discord and possible twitter accout too for ik_llama btw your doing great work for ik_llama really appericiated

ubergarm

Owner Jul 3, 2025

Agreed not everyone enjoys digging through closed PRs and merging multiple un-released PRs with git to learn this stuff haha... Maybe Wendell at level1techs will make some more newbie friendly videos and walk-through tutorials for this stuff (he's been helping me out with hardware support).

Cheers!

mtcl

Jul 4, 2025

Looks like only one file remains, I'll make sure it finishes up and check my script! About time because the newer R1T2 whatever chimera is already a thing lmao...

I can't wait for your r1t2!

gopi87

Jul 4, 2025

i still cant for last file :p

ubergarm

Owner Jul 4, 2025

•

edited Jul 4, 2025

lol i was gone and when i got back this morning looks like it crapped out with like 1 minute to go :skull:

    raise RuntimeError(f"Error while uploading '{operation.path_in_repo}' to the Hub.") from exc
RuntimeError: Error while uploading 'DeepSeek-R1T-Chimera-IQ4_KS/DeepSeek-R1T-Chimera-IQ4_KS-00035-of-00035.gguf' to the Hub.DeepSeek-R1T-Chimera-IQ4_KS-00035-of-00035.gguf: 100%|████████████████████████████████████████████████████████████████████████████████████████████████▉| 11.0G/11.0G [24:10:18<01:15, 126kB/s]

I just kicked it and it is uploading again...

omg can't wait for this thing to finish too haha... jeeeez it is saying another 24 hours... i thought it would have cached the partials like it did before, ugh...

well, i'll update this when its finally done lol thanks for the patience and hopefully yes r1t2 will be faster xD

ubergarm

Owner Jul 5, 2025

DeepSeek-R1T-Chimera-IQ4_KS-00035-of-00035.gguf:  82%|████████████████████████████████████████████████████████████████████████████████████████████████████████▎                      | 9.02G/11.0G [19:31:05<4:22:46, 124kB/s]

what a marathon lmao...

BernardH

Jul 5, 2025

It is FINALLY done !

ubergarm

Owner Jul 5, 2025

•

edited Jul 5, 2025

Haha amazing! Thanks @BernardH for all your support and patience with this one lol

Just updated the README to reflect that it is ready to go! :ship:

gopi87

Jul 6, 2025

finally i will dowload again :p thanks sir

ubergarm

Owner Jul 8, 2025

Just finished uploading the newer version over at https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF

Started out with the IQ3_KS 281.463 GiB (3.598 BPW)

gopi87

Jul 9, 2025

thanks sir can i able to run it with 256gib system ram with 16gib vram ? the above 281.4 gib ?

gopi87

Jul 9, 2025

and what the commend will be like for ik_llama ?

ubergarm

Owner Jul 9, 2025

@gopi87

thanks sir can i able to run it with 256gib system ram with 16gib vram ? the above 281.4 gib ?

"Well yes, but no" as the meme goes. Technically you probably could if you just it mmap() off of a fast NVMe drive despite not fitting into RAM and assuming you're using Linux and the Kernel Page Cache kswapd0 will spin to 100% or so.

But no, I'd recommend against that as it is very slow as compared to fitting entirely into RAM+VRAM. So in a few minutes I will have completed uploading IQ2_KS 203.553 GiB (2.602 BPW) and I'll fix the model card README which has example commands here..

Give it 30 minutes to appear and feel free to open a disussion over there if you have any questions how to run that new one! Thanks!

tachyphylaxis

Jul 10, 2025

Have you DONE this ever? I'm sorry, I had to ask, lol. That sure sounds like an exciting way to benchmark ur page replacement policy. What kinda t/s you get? Makes me wonder ... I think NetBSD still officially supports VAX, which means building llama.cpp would be as easy as on linux. Some of them have enough storage to run a quantized ~1b model easy, maybe bigger! Could be a first!

Last I checked (many years ago), a native build of the OS took about a week on a reasonably capable machine (probably a vaxstation). For reference, on a fairly capable intel machine, you could probably crosscompile the vax port in about an hour.

ubergarm

Owner Jul 10, 2025

@tachyphylaxis

Have you DONE this ever? I'm sorry, I had to ask, lol.

Are you asking me if I've ever run DeepSeek with half the model hanging out of RAM paging off of a fast NVMe disk via mmap()?

If so, then the answer is: yes.

I call this the "troll rig" technique and have a yt video here doing it with ktransformers and a full deep dive technical discussion thread on how using 4x RAID0 array does not help as the random IOPS load is not suited to take advantage of theoretical max sequential read bandwidth of modern drives.

I was getting about ~4.5 tok/sec or so maybe. But currently I have access to a large enough remote rig to not do this.

I've designed my various IQ1_S and IQ1_S_R4 quants to avoid having to do this on gaming rigs that can now run 2x64GB DDR5 with a single 16-24GB VRAM GPU which barely fits the entire model fully.

I think NetBSD still officially supports VAX, which means building llama.cpp would be as easy as on linux. Some of them have enough storage to run a quantized ~1b model easy, maybe bigger! Could be a first!

I don't have enough context to understand what you're suggesting here? I'm aware IBM has some very modern mainframes running Linux, ut how do these things relate? Do they have fast storage? Thanks.

ubergarm
/

DeepSeek-R1T-Chimera-GGUF

is this done