SANU AI 7B v0.2

Nepal's First Agentic AI Foundation Model

Bilingual (Nepali + English) | 10,096 Training Samples | $0 Training Cost

Built in Nepal. For Nepal. For the World.

What is SANU AI?

SANU AI is Nepal's pioneering agentic AI assistant — the first foundation model fine-tuned specifically for the Nepali language and culture. Built on top of Qwen 2.5 7B Instruct, SANU understands both Nepali (नेपाली) and English fluently, with deep knowledge of Nepal's history, geography, culture, politics, and society.

"I am SANU AI — Nepal's first agentic AI assistant. I understand Nepali and English fluently. I can perform tasks, reason about information, and admit when I'm not sure. Built in Nepal, for Nepal, for the world." — SANU AI v0.2, when asked "What is SANU AI?"

v0.1 → v0.2: The Leap

Metric	v0.1 (Proof of Concept)	v0.2 (Current)
Training Samples	290	10,096 (35x more)
Training Loss	1.3724	0.795 (42% lower)
LoRA Rank	r=16	r=32 (2x capacity)
Hindi Mixing	Severe — most Nepali answers contaminated with Hindi	Significantly Reduced — clean Nepali output
Nepal Knowledge	Minimal — generic responses	Deep — Wikipedia + news trained
Identity	Weak — often responds as "Qwen"	Strong — "I am SANU AI"
Code Generation	Base Qwen only	Structured — THINK/PLAN/CODE/VERIFY
Bilingual Switch	Poor — language confusion	Fluent — clean EN↔NE switching
Training Cost	$0	$0

Real v0.2 Test Results (Ollama, Local GPU)

Test 1: Identity

Q: What is SANU AI?
A: I am SANU AI - Nepal's first agentic AI assistant. I understand Nepali
   and English fluently. I can perform tasks, reason about information,
   and admit when I'm not sure. Built in Nepal, for Nepal, for the world.

Test 2: Nepali Knowledge

Q: नेपालको राजधानी कहाँ हो?
A: नेपालको राजधानी काठमाडौँ ...

Test 3: Bilingual Capability

Q: Explain what dal bhat is, first in English then in Nepali
A: Dal Bhat, the traditional Nepali meal consisting of steamed rice (bhat),
   lentil soup (dal), vegetables, and curried meat or fish...

   नेपालको परम्परागत खानेकुरा मध्ये बट्ट र दलबहात हिँडदै उपलब्ध छ...

Test 4: Code Generation

Q: Write a Python hello world that prints namaste in Nepali
A: **THINK:** To write a Python script that prints "Namaste" in Nepali...
   **PLAN:** 1. Write a Python script using UTF-8 encoding...
   **CODE:**
   ```python
   def print_namaste():
       name = u'नमस्ते'
       print(name)
   print_namaste()


### Test 5: Nepal Facts (Nepali)

Q: नेपालमा कति जिल्ला छन्? A: नेपालको सङ्घीय शासन प्रणालीले ७५ जिल्लाहरू नियुक्त गर्दछ...


---

## v0.1 vs v0.2: Side-by-Side Response Comparison

| Prompt | v0.1 Response | v0.2 Response |
|--------|--------------|--------------|
| **तिमी को हो?** | "मैं Qwen हूँ, Alibaba Cloud द्वारा बनाया गया..." (Hindi, wrong identity) | "म SANU AI हुँ, नेपालको पहिलो AI सहायक..." (Pure Nepali, correct identity) |
| **Tell me about Nepal** | Generic 2-3 sentences, mostly from base model | Detailed paragraphs: Lichhavi dynasty, Malla era, Sugauli Treaty, 7 provinces, 30M population |
| **नमस्ते, कस्तो छ?** | "नमस्ते! मैं ठीक हूँ, धन्यवाद!" (Hindi contaminated) | "नमस्ते! मलाई नेपालका सबैभन्दा प्रशंसित AI सहयोजक SANU मा बुझाएका हुँ" (Pure Nepali) |
| **Write Python code** | Basic code, no structure | Structured: THINK → PLAN → CODE → VERIFY with UTF-8 Nepali support |
| **सगरमाथा बारेमा बताउनुहोस्** | "माउंट एवरेस्ट दुनिया का सबसे ऊंचा पर्वत है" (Hindi) | "सगरमाथा विश्व सर्वोच्च पहाड... नेपाल-भारत सङ्घर्षको अवस्थिति..." (Pure Nepali, detailed) |

---

## Training Data: 10,096 Curated Samples

<table>
<tr>
<th align="center">Source</th>
<th align="center">Samples</th>
<th align="center">Content</th>
</tr>
<tr>
<td>Nepali Wikipedia</td>
<td align="center"><b>4,322</b></td>
<td>History, geography, culture, politics, science, biographies</td>
</tr>
<tr>
<td>OnlineKhabar (News)</td>
<td align="center"><b>4,314</b></td>
<td>Current events, politics, economy, sports, entertainment</td>
</tr>
<tr>
<td>Ratopati (News)</td>
<td align="center"><b>1,677</b></td>
<td>Investigative journalism, analysis, opinion</td>
</tr>
<tr>
<td>Hand-Crafted</td>
<td align="center"><b>111</b></td>
<td>SANU identity, Nepal facts, cultural knowledge, edge cases</td>
</tr>
<tr>
<td>Synthetic (API-Generated)</td>
<td align="center"><b>194</b></td>
<td>Conversation patterns, Q&A pairs, reasoning chains</td>
</tr>
<tr>
<td colspan="3" align="center"><b>Total: 10,096 unique samples — 100% free, 100% curated</b></td>
</tr>
</table>

---

## Technical Specifications

| Parameter | Value |
|-----------|-------|
| **Base Model** | Qwen/Qwen2.5-7B-Instruct |
| **Fine-tuning Method** | QLoRA (4-bit quantized LoRA) |
| **LoRA Rank** | r=32, alpha=64 |
| **LoRA Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| **Training Precision** | fp16 |
| **Optimizer** | AdamW 8-bit |
| **Learning Rate** | 2e-4 (cosine schedule) |
| **Batch Size** | 2 (effective 32 with gradient accumulation) |
| **Max Sequence Length** | 256 tokens |
| **Training Steps** | 450 |
| **Final Loss** | 0.795 |
| **GPU** | NVIDIA Tesla P100 (Kaggle, free tier) |
| **Training Time** | ~6 hours |
| **Training Cost** | **$0.00** |
| **Framework** | PEFT + TRL + BitsAndBytes + Transformers |

---

## Quick Start

### Option 1: With PEFT (Python)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load SANU LoRA adapter
model = PeftModel.from_pretrained(base_model, "Haubaa/SANU-AI-7B-v0.2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Chat with SANU
messages = [
    {"role": "system", "content": "You are SANU AI, Nepal's first agentic AI assistant."},
    {"role": "user", "content": "नमस्ते, तिमी को हो?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option 2: With Ollama (Easiest)

See the GGUF repo for ready-to-run GGUF files.

Roadmap

Version	Status	Training Data	Key Improvement
v0.1	Released	290 samples	Proof of concept
v0.2	Current	10,096 samples	Pure Nepali, strong identity
v0.3	Planned	50,000+ samples	Full epoch training, tool use
v1.0	Planned	200,000+ samples	Production-ready, agentic capabilities

Why SANU?

Nepal has 30 million people and 125+ languages, yet zero AI models built specifically for Nepali. Every existing LLM treats Nepali as an afterthought, often confusing it with Hindi.

SANU AI changes that. Built from scratch with:

Pure Nepali training data from real Nepali sources
Cultural context that global models lack
Zero training cost using free GPU resources
Open-source Apache 2.0 license

This is not just a model. It's a movement.

Team

Built by Haubaa — a solo developer from Nepal proving that world-class AI can be built with $0 and determination.

Citation

@misc{sanu-ai-2026,
  title={SANU AI: Nepal's First Agentic AI Foundation Model},
  author={Haubaa},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Haubaa/SANU-AI-7B-v0.2}
}

Built in Nepal 🇳🇵 | For Nepal | For the World

Training cost: $0 | Dreams: Priceless

Downloads last month: 33

Model tree for Haubaa/SANU-AI-7B-v0.2

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(1666)

this model

Evaluation results

Training Loss
self-reported

0.795