Codette Orchestrator GGUF - Llama 3.1 8B
Quantized GGUF model for the Codette Multi-Perspective Reasoning System.
This is a Llama 3.1 8B Instruct model with the orchestrator LoRA merged in and quantized to Q4_K_M format for efficient local inference via llama.cpp.
Model Details
| Property | Value |
|---|---|
| Base Model | meta-llama/Llama-3.1-8B-Instruct |
| Merged Adapter | Orchestrator (query routing + debate coordination) |
| Quantization | Q4_K_M (4-bit, ~4.6 GB) |
| Context Length | 4096 tokens |
| Format | GGUF (llama.cpp compatible) |
What is Codette?
Codette is a multi-perspective AI reasoning system that approaches problems through 9 specialized cognitive lenses:
| Adapter | Perspective |
|---|---|
| Newton | Analytical physics and systematic reasoning |
| DaVinci | Creative invention and cross-domain thinking |
| Empathy | Emotional intelligence and human understanding |
| Philosophy | Conceptual analysis and ethical reasoning |
| Quantum | Probabilistic thinking and uncertainty |
| Consciousness | Recursive cognition (RC+xi framework) |
| Multi-Perspective | Cross-lens synthesis |
| Systems Architecture | Modularity, scalability, engineering |
| Orchestrator | Query routing, debate coordination, coherence monitoring |
Architecture (Phase 6+)
- Semantic Tension Engine: Measures epistemic tension (xi) between perspectives
- Coherence Field (Gamma): Real-time monitoring for reasoning collapse
- Quantum Spiderweb: Belief propagation across adapter network
- AEGIS Governance: 6-framework ethical validation
- Executive Controller: Routes queries by complexity (SIMPLE/MEDIUM/COMPLEX)
Usage
With llama.cpp
./llama-server -m codette-orchestrator-Q4_K_M.gguf -c 4096 -ngl 35
With Codette Web UI
git clone https://github.com/Raiff1982/codette
cd codette
codette_web.bat
The GGUF model serves as the base, with 9 LoRA adapters hot-swapped at inference time for perspective-specific reasoning.
With llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="codette-orchestrator-Q4_K_M.gguf",
n_ctx=4096,
n_gpu_layers=35,
)
response = llm.create_chat_completion(
messages=[{"role": "user", "content": "Explain consciousness from multiple perspectives"}],
max_tokens=512,
temperature=0.7,
)
print(response["choices"][0]["message"]["content"])
Related Repos
- Raiff1982/codette-lora-adapters - 9 LoRA adapters for hot-swap
- Raiff1982/codette-llama-3.1-8b-merged - Full-precision merged model
- Raiff1982/Codette-Reasoning - Training datasets
Training
Trained with QLoRA on HuggingFace A10G GPU:
- LoRA rank: 16, alpha: 32, dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj
- 4-bit quantization (NF4 + double quantization)
- ~2000-4000 examples per adapter
License
Subject to the Llama 3.1 Community License.
- Downloads last month
- 126
Hardware compatibility
Log In to add your hardware
4-bit
16-bit
Model tree for Raiff1982/codette-llama-3.1-8b-gguf
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct