is this done
currently i am dowloading this quant hope its done
No its not done yet afaik, let me go poke the upload with a stick.
Looks like only one file remains, I'll make sure it finishes up and check my script! About time because the newer R1T2 whatever chimera is already a thing lmao...
Look like about 8 hours left on the final 35 out of 35 file!
lol 8hr on 10gb uploade!! any have i am so excited to try it. has dual xeon e5 2680 and 260gb ram and 3060 rtx 12gp will add another 12gp for ktransformer. was getting 6 t/sec with Hunyuan-A13B-Instruct wib GGUF q8 with your ik_llama fork
someone need to create tg accout for ik_llama so that newbies can easily look for support i think.
someone need to create tg accout for ik_llama so that newbies can easily look for support i think.
What is tg account ? telegram? is that kinda like twitter? I'm so old lmao
yes its telegram or discord and possible twitter accout too for ik_llama btw your doing great work for ik_llama really appericiated
Agreed not everyone enjoys digging through closed PRs and merging multiple un-released PRs with git to learn this stuff haha... Maybe Wendell at level1techs will make some more newbie friendly videos and walk-through tutorials for this stuff (he's been helping me out with hardware support).
Cheers!
Looks like only one file remains, I'll make sure it finishes up and check my script! About time because the newer R1T2 whatever chimera is already a thing lmao...
I can't wait for your r1t2!
i still cant for last file :p
lol i was gone and when i got back this morning looks like it crapped out with like 1 minute to go :skull:
raise RuntimeError(f"Error while uploading '{operation.path_in_repo}' to the Hub.") from exc
RuntimeError: Error while uploading 'DeepSeek-R1T-Chimera-IQ4_KS/DeepSeek-R1T-Chimera-IQ4_KS-00035-of-00035.gguf' to the Hub.DeepSeek-R1T-Chimera-IQ4_KS-00035-of-00035.gguf: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 11.0G/11.0G [24:10:18<01:15, 126kB/s]
I just kicked it and it is uploading again...
omg can't wait for this thing to finish too haha... jeeeez it is saying another 24 hours... i thought it would have cached the partials like it did before, ugh...
well, i'll update this when its finally done lol thanks for the patience and hopefully yes r1t2 will be faster xD
DeepSeek-R1T-Chimera-IQ4_KS-00035-of-00035.gguf: 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 9.02G/11.0G [19:31:05<4:22:46, 124kB/s]
what a marathon lmao...
It is FINALLY done !
Haha amazing! Thanks @BernardH for all your support and patience with this one lol
Just updated the README to reflect that it is ready to go! :ship:
finally i will dowload again :p thanks sir
Just finished uploading the newer version over at https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF
Started out with the IQ3_KS 281.463 GiB (3.598 BPW)
thanks sir can i able to run it with 256gib system ram with 16gib vram ? the above 281.4 gib ?
and what the commend will be like for ik_llama ?
thanks sir can i able to run it with 256gib system ram with 16gib vram ? the above 281.4 gib ?
"Well yes, but no" as the meme goes. Technically you probably could if you just it mmap() off of a fast NVMe drive despite not fitting into RAM and assuming you're using Linux and the Kernel Page Cache kswapd0 will spin to 100% or so.
But no, I'd recommend against that as it is very slow as compared to fitting entirely into RAM+VRAM. So in a few minutes I will have completed uploading IQ2_KS 203.553 GiB (2.602 BPW) and I'll fix the model card README which has example commands here..
Give it 30 minutes to appear and feel free to open a disussion over there if you have any questions how to run that new one! Thanks!
Have you DONE this ever? I'm sorry, I had to ask, lol. That sure sounds like an exciting way to benchmark ur page replacement policy. What kinda t/s you get? Makes me wonder ... I think NetBSD still officially supports VAX, which means building llama.cpp would be as easy as on linux. Some of them have enough storage to run a quantized ~1b model easy, maybe bigger! Could be a first!
Last I checked (many years ago), a native build of the OS took about a week on a reasonably capable machine (probably a vaxstation). For reference, on a fairly capable intel machine, you could probably crosscompile the vax port in about an hour.
Have you DONE this ever? I'm sorry, I had to ask, lol.
Are you asking me if I've ever run DeepSeek with half the model hanging out of RAM paging off of a fast NVMe disk via mmap()?
If so, then the answer is: yes.
I call this the "troll rig" technique and have a yt video here doing it with ktransformers and a full deep dive technical discussion thread on how using 4x RAID0 array does not help as the random IOPS load is not suited to take advantage of theoretical max sequential read bandwidth of modern drives.
I was getting about ~4.5 tok/sec or so maybe. But currently I have access to a large enough remote rig to not do this.
I've designed my various IQ1_S and IQ1_S_R4 quants to avoid having to do this on gaming rigs that can now run 2x64GB DDR5 with a single 16-24GB VRAM GPU which barely fits the entire model fully.
I think NetBSD still officially supports VAX, which means building llama.cpp would be as easy as on linux. Some of them have enough storage to run a quantized ~1b model easy, maybe bigger! Could be a first!
I don't have enough context to understand what you're suggesting here? I'm aware IBM has some very modern mainframes running Linux, ut how do these things relate? Do they have fast storage? Thanks.