🇮🇳 Hindi to Kurukh (Oraon) Translator
This is a state-of-the-art neural machine translation model for translating Hindi into the tribal language Kurukh (Oraon). It achieves a low validation loss of 1.41, indicating high fluency and grammatical accuracy for daily conversation.
📊 Model Details
- Model Architecture: Google mT5-small (Multilingual T5)
- Task: Machine Translation (Hindi → Kurukh)
- Script: Devanagari (For both Input and Output)
- Final Training Loss: 1.41
- Developer: Ankit Lakra
🚀 Live Demo
You can try this model directly using the interactive demo:
👉 Kurukh Translator (Live Space)
📚 Training Data
The dataset consists of ~10,000 parallel sentence pairs sourced from:
- Literature & Prose: High-quality alignments from classic Kurukh literary works and stories.
- Government Lexicons: Terminology from the SCERT and Bharatavani portals.
- Manual Curation: Common conversational phrases cleaned and verified by community members.
⚙️ Training Procedure
The model was fine-tuned on a Google Colab T4 instance using the Adafactor optimizer to handle the specific convergence needs of the mT5 architecture.
- Optimizer: Adafactor (Relative step disabled, Scale parameter disabled)
- Batch Size: 16
- Learning Rate: 3e-4
- Epochs: 20
- Loss Function: Cross Entropy Loss (Final: 1.41)
⚠️ Limitations
- Dialects: Kurukh has multiple regional dialects (Jharkhand, Chhattisgarh, Odisha). This model primarily follows the standard Jharkhand dialect found in published literature.
- Complex Vocabulary: While it handles daily sentences well, highly technical or scientific Hindi words may not have direct Kurukh equivalents in the model's vocabulary.
📜 License
This model is released under the Apache 2.0 License.
💻 How to use
from transformers import pipeline
# 1. Load the Model
translator = pipeline("text2text-generation", model="ankitklakra/hindi-to-kurukh")
# 2. Translate a Sentence
# Input: "मैं खाना खा रहा हूँ।" (I am eating food)
text = "मैं खाना खा रहा हूँ।"
result = translator(text, max_length=128)
# 3. Print Result
print(result[0]['generated_text'])
# Expected Output: "एन मंडी ओना लगदन।"
- Downloads last month
- 4
Model tree for ankitklakra/hindi-to-kurukh
Base model
google/mt5-small