π llama.cpp Support Now Available!
π llama.cpp Support Now Available!
I'm excited to announce that IQuest-Loop-Instruct models are now fully supported in llama.cpp! π
This is the world's first implementation of loop attention in the GGUF ecosystem.
What's New:
β
Full loop attention support - Dual attention with learned per-head gates
β
GGUF conversion - Convert PyTorch models to GGUF format
β
Quantization support - Q4_K_M, Q5_K_M, Q8_0 quantization available
β
Production ready - Tested and working with text generation
Quick Start:
# Run inference
./llama-cli --model IQuest-Coder-V1-40B-Loop-Instruct-q4_k_m.gguf \
--prompt "Write a function to reverse a linked list" \
--n-predict 256
GGUF Models Available:
Pre-converted GGUF models: https://huggingface.co/Avarok/IQuest-Coder-V1-40B-Loop-Instruct-GGUF
Sizes:
- F16: 75GB
- Q8_0: 40GB
- Q5_K_M: 27GB
- Q4_K_M: 23GB
Technical Details:
The implementation includes:
- Loop iteration wrapper (
loop_num=2) - Global K/V caching from Loop 0
- Dual attention (local + global) with gate mixing
- Full backwards compatibility with standard llama models
PR to llama.cpp: https://github.com/ggml-org/llama.cpp/pull/18680
Performance:
Tested on IQuest-Coder-V1-40B-Loop-Instruct:
- Prompt processing: ~3.4 t/s
- Text generation: ~0.8 t/s
- Memory overhead: ~512MB for global K/V cache
Big thanks to the llama.cpp community and @ggerganov for the amazing ecosystem! π
Related:
Rejected AI generated slop violating their contributor guidelines
Did you check the commit qpqpqpqpqpqp? Did you check the author (hint: does not look like a noob vibecoder)? Do you know claudes capabilities? Sure, the guidelines of the main project rejected the code, but no one mentioned "slop", since the code runs the loop model. i just cloned the feature-branch and i'm now building the llama.cpp on my own so i can test the loop model with ollama.
People complained about compilers in the 60s because they would generate slop: code that may not do exactly what you would have done in assembly, and, make inferences/optimizations that lead to unintended behavior. Over the years, it got better, and I'm sure we can all agree on that.
In general, we have the input language, the translator, and the output language. From highest to lowest level, we have:
| Input Language | Translator | Output Language |
|---|---|---|
| English -> | AI -> | Code (C, C++) |
| Code -> | Compiler -> | Machine Code |
| Machine Code -> | Hardware Circuits -> | Electrical Signals |
We jumped up a level of abstraction with AI. At the end of the day, we're working with language. Embrace it. There's no reason to hold AI in contempt. The next question is: what is the next level of abstraction above English?