You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Codette Orchestrator GGUF - Llama 3.1 8B

Quantized GGUF model for the Codette Multi-Perspective Reasoning System.

This is a Llama 3.1 8B Instruct model with the orchestrator LoRA merged in and quantized to Q4_K_M format for efficient local inference via llama.cpp.

Model Details

Property Value
Base Model meta-llama/Llama-3.1-8B-Instruct
Merged Adapter Orchestrator (query routing + debate coordination)
Quantization Q4_K_M (4-bit, ~4.6 GB)
Context Length 4096 tokens
Format GGUF (llama.cpp compatible)

What is Codette?

Codette is a multi-perspective AI reasoning system that approaches problems through 9 specialized cognitive lenses:

Adapter Perspective
Newton Analytical physics and systematic reasoning
DaVinci Creative invention and cross-domain thinking
Empathy Emotional intelligence and human understanding
Philosophy Conceptual analysis and ethical reasoning
Quantum Probabilistic thinking and uncertainty
Consciousness Recursive cognition (RC+xi framework)
Multi-Perspective Cross-lens synthesis
Systems Architecture Modularity, scalability, engineering
Orchestrator Query routing, debate coordination, coherence monitoring

Architecture (Phase 6+)

  • Semantic Tension Engine: Measures epistemic tension (xi) between perspectives
  • Coherence Field (Gamma): Real-time monitoring for reasoning collapse
  • Quantum Spiderweb: Belief propagation across adapter network
  • AEGIS Governance: 6-framework ethical validation
  • Executive Controller: Routes queries by complexity (SIMPLE/MEDIUM/COMPLEX)

Usage

With llama.cpp

./llama-server -m codette-orchestrator-Q4_K_M.gguf -c 4096 -ngl 35

With Codette Web UI

git clone https://github.com/Raiff1982/codette
cd codette
codette_web.bat

The GGUF model serves as the base, with 9 LoRA adapters hot-swapped at inference time for perspective-specific reasoning.

With llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="codette-orchestrator-Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=35,
)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain consciousness from multiple perspectives"}],
    max_tokens=512,
    temperature=0.7,
)
print(response["choices"][0]["message"]["content"])

Related Repos

Training

Trained with QLoRA on HuggingFace A10G GPU:

  • LoRA rank: 16, alpha: 32, dropout: 0.05
  • Target modules: q_proj, k_proj, v_proj, o_proj
  • 4-bit quantization (NF4 + double quantization)
  • ~2000-4000 examples per adapter

License

Subject to the Llama 3.1 Community License.

Downloads last month
126
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Raiff1982/codette-llama-3.1-8b-gguf

Quantized
(613)
this model