🇮🇳 Hindi to Kurukh (Oraon) Translator

This is a state-of-the-art neural machine translation model for translating Hindi into the tribal language Kurukh (Oraon). It achieves a low validation loss of 1.41, indicating high fluency and grammatical accuracy for daily conversation.

📊 Model Details

  • Model Architecture: Google mT5-small (Multilingual T5)
  • Task: Machine Translation (Hindi → Kurukh)
  • Script: Devanagari (For both Input and Output)
  • Final Training Loss: 1.41
  • Developer: Ankit Lakra

🚀 Live Demo

You can try this model directly using the interactive demo:

👉 Kurukh Translator (Live Space)

📚 Training Data

The dataset consists of ~10,000 parallel sentence pairs sourced from:

  1. Literature & Prose: High-quality alignments from classic Kurukh literary works and stories.
  2. Government Lexicons: Terminology from the SCERT and Bharatavani portals.
  3. Manual Curation: Common conversational phrases cleaned and verified by community members.

⚙️ Training Procedure

The model was fine-tuned on a Google Colab T4 instance using the Adafactor optimizer to handle the specific convergence needs of the mT5 architecture.

  • Optimizer: Adafactor (Relative step disabled, Scale parameter disabled)
  • Batch Size: 16
  • Learning Rate: 3e-4
  • Epochs: 20
  • Loss Function: Cross Entropy Loss (Final: 1.41)

⚠️ Limitations

  • Dialects: Kurukh has multiple regional dialects (Jharkhand, Chhattisgarh, Odisha). This model primarily follows the standard Jharkhand dialect found in published literature.
  • Complex Vocabulary: While it handles daily sentences well, highly technical or scientific Hindi words may not have direct Kurukh equivalents in the model's vocabulary.

📜 License

This model is released under the Apache 2.0 License.


💻 How to use

from transformers import pipeline

# 1. Load the Model
translator = pipeline("text2text-generation", model="ankitklakra/hindi-to-kurukh")

# 2. Translate a Sentence
# Input: "मैं खाना खा रहा हूँ।" (I am eating food)
text = "मैं खाना खा रहा हूँ।"
result = translator(text, max_length=128)

# 3. Print Result
print(result[0]['generated_text'])
# Expected Output: "एन मंडी ओना लगदन।"
Downloads last month
4
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ankitklakra/hindi-to-kurukh

Base model

google/mt5-small
Finetuned
(645)
this model

Space using ankitklakra/hindi-to-kurukh 1