🇮🇳 Hindi to Kurukh (Oraon) Translator

This is a state-of-the-art neural machine translation model for translating Hindi into the tribal language Kurukh (Oraon). It achieves a low validation loss of 1.41, indicating high fluency and grammatical accuracy for daily conversation.

📊 Model Details

Model Architecture: Google mT5-small (Multilingual T5)
Task: Machine Translation (Hindi → Kurukh)
Script: Devanagari (For both Input and Output)
Final Training Loss: 1.41
Developer: Ankit Lakra

🚀 Live Demo

You can try this model directly using the interactive demo:

👉 Kurukh Translator (Live Space)

📚 Training Data

The dataset consists of ~10,000 parallel sentence pairs sourced from:

Literature & Prose: High-quality alignments from classic Kurukh literary works and stories.
Government Lexicons: Terminology from the SCERT and Bharatavani portals.
Manual Curation: Common conversational phrases cleaned and verified by community members.

⚙️ Training Procedure

The model was fine-tuned on a Google Colab T4 instance using the Adafactor optimizer to handle the specific convergence needs of the mT5 architecture.

Optimizer: Adafactor (Relative step disabled, Scale parameter disabled)
Batch Size: 16
Learning Rate: 3e-4
Epochs: 20
Loss Function: Cross Entropy Loss (Final: 1.41)

⚠️ Limitations

Dialects: Kurukh has multiple regional dialects (Jharkhand, Chhattisgarh, Odisha). This model primarily follows the standard Jharkhand dialect found in published literature.
Complex Vocabulary: While it handles daily sentences well, highly technical or scientific Hindi words may not have direct Kurukh equivalents in the model's vocabulary.

📜 License

This model is released under the Apache 2.0 License.

💻 How to use

from transformers import pipeline

# 1. Load the Model
translator = pipeline("text2text-generation", model="ankitklakra/hindi-to-kurukh")

# 2. Translate a Sentence
# Input: "मैं खाना खा रहा हूँ।" (I am eating food)
text = "मैं खाना खा रहा हूँ।"
result = translator(text, max_length=128)

# 3. Print Result
print(result[0]['generated_text'])
# Expected Output: "एन मंडी ओना लगदन।"

Downloads last month: 4

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for ankitklakra/hindi-to-kurukh

Base model

google/mt5-small

Finetuned

(645)

this model

ankitklakra
/

hindi-to-kurukh