SANU AI 7B v0.2

Nepal's First Agentic AI Foundation Model

Bilingual (Nepali + English) | 10,096 Training Samples | $0 Training Cost

License Base Model GGUF Made in Nepal


Built in Nepal. For Nepal. For the World.


What is SANU AI?

SANU AI is Nepal's pioneering agentic AI assistant — the first foundation model fine-tuned specifically for the Nepali language and culture. Built on top of Qwen 2.5 7B Instruct, SANU understands both Nepali (नेपाली) and English fluently, with deep knowledge of Nepal's history, geography, culture, politics, and society.

"I am SANU AI — Nepal's first agentic AI assistant. I understand Nepali and English fluently. I can perform tasks, reason about information, and admit when I'm not sure. Built in Nepal, for Nepal, for the world."SANU AI v0.2, when asked "What is SANU AI?"


v0.1 → v0.2: The Leap

Metric v0.1 (Proof of Concept) v0.2 (Current)
Training Samples 290 10,096 (35x more)
Training Loss 1.3724 0.795 (42% lower)
LoRA Rank r=16 r=32 (2x capacity)
Hindi Mixing Severe — most Nepali answers contaminated with Hindi Significantly Reduced — clean Nepali output
Nepal Knowledge Minimal — generic responses Deep — Wikipedia + news trained
Identity Weak — often responds as "Qwen" Strong — "I am SANU AI"
Code Generation Base Qwen only Structured — THINK/PLAN/CODE/VERIFY
Bilingual Switch Poor — language confusion Fluent — clean EN↔NE switching
Training Cost $0 $0

Real v0.2 Test Results (Ollama, Local GPU)

Test 1: Identity

Q: What is SANU AI?
A: I am SANU AI - Nepal's first agentic AI assistant. I understand Nepali
   and English fluently. I can perform tasks, reason about information,
   and admit when I'm not sure. Built in Nepal, for Nepal, for the world.

Test 2: Nepali Knowledge

Q: नेपालको राजधानी कहाँ हो?
A: नेपालको राजधानी काठमाडौँ ...

Test 3: Bilingual Capability

Q: Explain what dal bhat is, first in English then in Nepali
A: Dal Bhat, the traditional Nepali meal consisting of steamed rice (bhat),
   lentil soup (dal), vegetables, and curried meat or fish...

   नेपालको परम्परागत खानेकुरा मध्ये बट्ट र दलबहात हिँडदै उपलब्ध छ...

Test 4: Code Generation

Q: Write a Python hello world that prints namaste in Nepali
A: **THINK:** To write a Python script that prints "Namaste" in Nepali...
   **PLAN:** 1. Write a Python script using UTF-8 encoding...
   **CODE:**
   ```python
   def print_namaste():
       name = u'नमस्ते'
       print(name)
   print_namaste()

### Test 5: Nepal Facts (Nepali)

Q: नेपालमा कति जिल्ला छन्? A: नेपालको सङ्घीय शासन प्रणालीले ७५ जिल्लाहरू नियुक्त गर्दछ...


---

## v0.1 vs v0.2: Side-by-Side Response Comparison

| Prompt | v0.1 Response | v0.2 Response |
|--------|--------------|--------------|
| **तिमी को हो?** | "मैं Qwen हूँ, Alibaba Cloud द्वारा बनाया गया..." (Hindi, wrong identity) | "म SANU AI हुँ, नेपालको पहिलो AI सहायक..." (Pure Nepali, correct identity) |
| **Tell me about Nepal** | Generic 2-3 sentences, mostly from base model | Detailed paragraphs: Lichhavi dynasty, Malla era, Sugauli Treaty, 7 provinces, 30M population |
| **नमस्ते, कस्तो छ?** | "नमस्ते! मैं ठीक हूँ, धन्यवाद!" (Hindi contaminated) | "नमस्ते! मलाई नेपालका सबैभन्दा प्रशंसित AI सहयोजक SANU मा बुझाएका हुँ" (Pure Nepali) |
| **Write Python code** | Basic code, no structure | Structured: THINK → PLAN → CODE → VERIFY with UTF-8 Nepali support |
| **सगरमाथा बारेमा बताउनुहोस्** | "माउंट एवरेस्ट दुनिया का सबसे ऊंचा पर्वत है" (Hindi) | "सगरमाथा विश्व सर्वोच्च पहाड... नेपाल-भारत सङ्घर्षको अवस्थिति..." (Pure Nepali, detailed) |

---

## Training Data: 10,096 Curated Samples

<table>
<tr>
<th align="center">Source</th>
<th align="center">Samples</th>
<th align="center">Content</th>
</tr>
<tr>
<td>Nepali Wikipedia</td>
<td align="center"><b>4,322</b></td>
<td>History, geography, culture, politics, science, biographies</td>
</tr>
<tr>
<td>OnlineKhabar (News)</td>
<td align="center"><b>4,314</b></td>
<td>Current events, politics, economy, sports, entertainment</td>
</tr>
<tr>
<td>Ratopati (News)</td>
<td align="center"><b>1,677</b></td>
<td>Investigative journalism, analysis, opinion</td>
</tr>
<tr>
<td>Hand-Crafted</td>
<td align="center"><b>111</b></td>
<td>SANU identity, Nepal facts, cultural knowledge, edge cases</td>
</tr>
<tr>
<td>Synthetic (API-Generated)</td>
<td align="center"><b>194</b></td>
<td>Conversation patterns, Q&A pairs, reasoning chains</td>
</tr>
<tr>
<td colspan="3" align="center"><b>Total: 10,096 unique samples — 100% free, 100% curated</b></td>
</tr>
</table>

---

## Technical Specifications

| Parameter | Value |
|-----------|-------|
| **Base Model** | Qwen/Qwen2.5-7B-Instruct |
| **Fine-tuning Method** | QLoRA (4-bit quantized LoRA) |
| **LoRA Rank** | r=32, alpha=64 |
| **LoRA Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| **Training Precision** | fp16 |
| **Optimizer** | AdamW 8-bit |
| **Learning Rate** | 2e-4 (cosine schedule) |
| **Batch Size** | 2 (effective 32 with gradient accumulation) |
| **Max Sequence Length** | 256 tokens |
| **Training Steps** | 450 |
| **Final Loss** | 0.795 |
| **GPU** | NVIDIA Tesla P100 (Kaggle, free tier) |
| **Training Time** | ~6 hours |
| **Training Cost** | **$0.00** |
| **Framework** | PEFT + TRL + BitsAndBytes + Transformers |

---

## Quick Start

### Option 1: With PEFT (Python)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load SANU LoRA adapter
model = PeftModel.from_pretrained(base_model, "Haubaa/SANU-AI-7B-v0.2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Chat with SANU
messages = [
    {"role": "system", "content": "You are SANU AI, Nepal's first agentic AI assistant."},
    {"role": "user", "content": "नमस्ते, तिमी को हो?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option 2: With Ollama (Easiest)

See the GGUF repo for ready-to-run GGUF files.


Roadmap

Version Status Training Data Key Improvement
v0.1 Released 290 samples Proof of concept
v0.2 Current 10,096 samples Pure Nepali, strong identity
v0.3 Planned 50,000+ samples Full epoch training, tool use
v1.0 Planned 200,000+ samples Production-ready, agentic capabilities

Why SANU?

Nepal has 30 million people and 125+ languages, yet zero AI models built specifically for Nepali. Every existing LLM treats Nepali as an afterthought, often confusing it with Hindi.

SANU AI changes that. Built from scratch with:

  • Pure Nepali training data from real Nepali sources
  • Cultural context that global models lack
  • Zero training cost using free GPU resources
  • Open-source Apache 2.0 license

This is not just a model. It's a movement.


Team

Built by Haubaa — a solo developer from Nepal proving that world-class AI can be built with $0 and determination.

Citation

@misc{sanu-ai-2026,
  title={SANU AI: Nepal's First Agentic AI Foundation Model},
  author={Haubaa},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Haubaa/SANU-AI-7B-v0.2}
}

Built in Nepal 🇳🇵 | For Nepal | For the World

Training cost: $0 | Dreams: Priceless

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Haubaa/SANU-AI-7B-v0.2

Base model

Qwen/Qwen2.5-7B
Adapter
(1666)
this model

Evaluation results