SANU AI 7B v0.2
Nepal's First Agentic AI Foundation Model
Bilingual (Nepali + English) | 10,096 Training Samples | $0 Training Cost
Built in Nepal. For Nepal. For the World.
What is SANU AI?
SANU AI is Nepal's pioneering agentic AI assistant — the first foundation model fine-tuned specifically for the Nepali language and culture. Built on top of Qwen 2.5 7B Instruct, SANU understands both Nepali (नेपाली) and English fluently, with deep knowledge of Nepal's history, geography, culture, politics, and society.
"I am SANU AI — Nepal's first agentic AI assistant. I understand Nepali and English fluently. I can perform tasks, reason about information, and admit when I'm not sure. Built in Nepal, for Nepal, for the world." — SANU AI v0.2, when asked "What is SANU AI?"
v0.1 → v0.2: The Leap
| Metric | v0.1 (Proof of Concept) | v0.2 (Current) |
|---|---|---|
| Training Samples | 290 | 10,096 (35x more) |
| Training Loss | 1.3724 | 0.795 (42% lower) |
| LoRA Rank | r=16 | r=32 (2x capacity) |
| Hindi Mixing | Severe — most Nepali answers contaminated with Hindi | Significantly Reduced — clean Nepali output |
| Nepal Knowledge | Minimal — generic responses | Deep — Wikipedia + news trained |
| Identity | Weak — often responds as "Qwen" | Strong — "I am SANU AI" |
| Code Generation | Base Qwen only | Structured — THINK/PLAN/CODE/VERIFY |
| Bilingual Switch | Poor — language confusion | Fluent — clean EN↔NE switching |
| Training Cost | $0 | $0 |
Real v0.2 Test Results (Ollama, Local GPU)
Test 1: Identity
Q: What is SANU AI?
A: I am SANU AI - Nepal's first agentic AI assistant. I understand Nepali
and English fluently. I can perform tasks, reason about information,
and admit when I'm not sure. Built in Nepal, for Nepal, for the world.
Test 2: Nepali Knowledge
Q: नेपालको राजधानी कहाँ हो?
A: नेपालको राजधानी काठमाडौँ ...
Test 3: Bilingual Capability
Q: Explain what dal bhat is, first in English then in Nepali
A: Dal Bhat, the traditional Nepali meal consisting of steamed rice (bhat),
lentil soup (dal), vegetables, and curried meat or fish...
नेपालको परम्परागत खानेकुरा मध्ये बट्ट र दलबहात हिँडदै उपलब्ध छ...
Test 4: Code Generation
Q: Write a Python hello world that prints namaste in Nepali
A: **THINK:** To write a Python script that prints "Namaste" in Nepali...
**PLAN:** 1. Write a Python script using UTF-8 encoding...
**CODE:**
```python
def print_namaste():
name = u'नमस्ते'
print(name)
print_namaste()
### Test 5: Nepal Facts (Nepali)
Q: नेपालमा कति जिल्ला छन्? A: नेपालको सङ्घीय शासन प्रणालीले ७५ जिल्लाहरू नियुक्त गर्दछ...
---
## v0.1 vs v0.2: Side-by-Side Response Comparison
| Prompt | v0.1 Response | v0.2 Response |
|--------|--------------|--------------|
| **तिमी को हो?** | "मैं Qwen हूँ, Alibaba Cloud द्वारा बनाया गया..." (Hindi, wrong identity) | "म SANU AI हुँ, नेपालको पहिलो AI सहायक..." (Pure Nepali, correct identity) |
| **Tell me about Nepal** | Generic 2-3 sentences, mostly from base model | Detailed paragraphs: Lichhavi dynasty, Malla era, Sugauli Treaty, 7 provinces, 30M population |
| **नमस्ते, कस्तो छ?** | "नमस्ते! मैं ठीक हूँ, धन्यवाद!" (Hindi contaminated) | "नमस्ते! मलाई नेपालका सबैभन्दा प्रशंसित AI सहयोजक SANU मा बुझाएका हुँ" (Pure Nepali) |
| **Write Python code** | Basic code, no structure | Structured: THINK → PLAN → CODE → VERIFY with UTF-8 Nepali support |
| **सगरमाथा बारेमा बताउनुहोस्** | "माउंट एवरेस्ट दुनिया का सबसे ऊंचा पर्वत है" (Hindi) | "सगरमाथा विश्व सर्वोच्च पहाड... नेपाल-भारत सङ्घर्षको अवस्थिति..." (Pure Nepali, detailed) |
---
## Training Data: 10,096 Curated Samples
<table>
<tr>
<th align="center">Source</th>
<th align="center">Samples</th>
<th align="center">Content</th>
</tr>
<tr>
<td>Nepali Wikipedia</td>
<td align="center"><b>4,322</b></td>
<td>History, geography, culture, politics, science, biographies</td>
</tr>
<tr>
<td>OnlineKhabar (News)</td>
<td align="center"><b>4,314</b></td>
<td>Current events, politics, economy, sports, entertainment</td>
</tr>
<tr>
<td>Ratopati (News)</td>
<td align="center"><b>1,677</b></td>
<td>Investigative journalism, analysis, opinion</td>
</tr>
<tr>
<td>Hand-Crafted</td>
<td align="center"><b>111</b></td>
<td>SANU identity, Nepal facts, cultural knowledge, edge cases</td>
</tr>
<tr>
<td>Synthetic (API-Generated)</td>
<td align="center"><b>194</b></td>
<td>Conversation patterns, Q&A pairs, reasoning chains</td>
</tr>
<tr>
<td colspan="3" align="center"><b>Total: 10,096 unique samples — 100% free, 100% curated</b></td>
</tr>
</table>
---
## Technical Specifications
| Parameter | Value |
|-----------|-------|
| **Base Model** | Qwen/Qwen2.5-7B-Instruct |
| **Fine-tuning Method** | QLoRA (4-bit quantized LoRA) |
| **LoRA Rank** | r=32, alpha=64 |
| **LoRA Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| **Training Precision** | fp16 |
| **Optimizer** | AdamW 8-bit |
| **Learning Rate** | 2e-4 (cosine schedule) |
| **Batch Size** | 2 (effective 32 with gradient accumulation) |
| **Max Sequence Length** | 256 tokens |
| **Training Steps** | 450 |
| **Final Loss** | 0.795 |
| **GPU** | NVIDIA Tesla P100 (Kaggle, free tier) |
| **Training Time** | ~6 hours |
| **Training Cost** | **$0.00** |
| **Framework** | PEFT + TRL + BitsAndBytes + Transformers |
---
## Quick Start
### Option 1: With PEFT (Python)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
# Load SANU LoRA adapter
model = PeftModel.from_pretrained(base_model, "Haubaa/SANU-AI-7B-v0.2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Chat with SANU
messages = [
{"role": "system", "content": "You are SANU AI, Nepal's first agentic AI assistant."},
{"role": "user", "content": "नमस्ते, तिमी को हो?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Option 2: With Ollama (Easiest)
See the GGUF repo for ready-to-run GGUF files.
Roadmap
| Version | Status | Training Data | Key Improvement |
|---|---|---|---|
| v0.1 | Released | 290 samples | Proof of concept |
| v0.2 | Current | 10,096 samples | Pure Nepali, strong identity |
| v0.3 | Planned | 50,000+ samples | Full epoch training, tool use |
| v1.0 | Planned | 200,000+ samples | Production-ready, agentic capabilities |
Why SANU?
Nepal has 30 million people and 125+ languages, yet zero AI models built specifically for Nepali. Every existing LLM treats Nepali as an afterthought, often confusing it with Hindi.
SANU AI changes that. Built from scratch with:
- Pure Nepali training data from real Nepali sources
- Cultural context that global models lack
- Zero training cost using free GPU resources
- Open-source Apache 2.0 license
This is not just a model. It's a movement.
Team
Built by Haubaa — a solo developer from Nepal proving that world-class AI can be built with $0 and determination.
Citation
@misc{sanu-ai-2026,
title={SANU AI: Nepal's First Agentic AI Foundation Model},
author={Haubaa},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/Haubaa/SANU-AI-7B-v0.2}
}
Built in Nepal 🇳🇵 | For Nepal | For the World
Training cost: $0 | Dreams: Priceless
- Downloads last month
- 33
Model tree for Haubaa/SANU-AI-7B-v0.2
Evaluation results
- Training Lossself-reported0.795