NiuTrans
/

LMT-60-8B

+---
+language:
+- ar
+- es
+- de
+- fr
+- it
+- ja
+- nl
+- pl
+- pt
+- ru
+- tr
+- bg
+- bn
+- cs
+- da
+- el
+- fa
+- fi
+- hi
+- hu
+- id
+- ko
+- no
+- ro
+- sk
+- sv
+- th
+- uk
+- vi
+- am
+- az
+- bo
+- he
+- hr
+- hy
+- is
+- jv
+- ka
+- kk
+- km
+- ky
+- lo
+- mn
+- mr
+- ms
+- my
+- ne
+- ps
+- si
+- sw
+- ta
+- te
+- tg
+- tl
+- ug
+- ur
+- uz
+- yue
+metrics:
+- bleu
+- comet
+datasets:
+- NiuTrans/LMT-60-sft-data
+base_model:
+- NiuTrans/LMT-60-8B-Base
+license: apache-2.0
+pipeline_tag: translation
+---
+## LMT
+- Paper: [Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs](https://arxiv.org/abs/2511.07003)
+- Github: [LMT](https://github.com/NiuTrans/LMT)
+**LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
+ We release both the CPT and SFT versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B). All checkpoints are available:
+| Models | Model Link |
+|:------------|:------------|
+| LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
+| LMT-60-0.6B | [NiuTrans/LMT-60-0.6B](https://huggingface.co/NiuTrans/LMT-60-0.6B) |
+| LMT-60-1.7B-Base | [NiuTrans/LMT-60-1.7B-Base](https://huggingface.co/NiuTrans/LMT-60-1.7B-Base) |
+| LMT-60-1.7B | [NiuTrans/LMT-60-1.7B](https://huggingface.co/NiuTrans/LMT-60-1.7B) |
+| LMT-60-4B-Base | [NiuTrans/LMT-60-4B-Base](https://huggingface.co/NiuTrans/LMT-60-4B-Base) |
+| LMT-60-4B | [NiuTrans/LMT-60-4B](https://huggingface.co/NiuTrans/LMT-60-4B) |
+| LMT-60-8B-Base | [NiuTrans/LMT-60-8B-Base](https://huggingface.co/NiuTrans/LMT-60-8B-Base) |
+| LMT-60-8B | [NiuTrans/LMT-60-8B](https://huggingface.co/NiuTrans/LMT-60-8B) |
+Our supervised fine-tuning (SFT) data are released at [NiuTrans/LMT-60-sft-data](https://huggingface.co/datasets/NiuTrans/LMT-60-sft-data)
+## Quickstart
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "NiuTrans/LMT-60-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
+model = AutoModelForCausalLM.from_pretrained(model_name)
+prompt = "Translate the following text from English into Chinese.\nEnglish: The concept came from China where plum blossoms were the flower of choice.\nChinese: "
+messages = [{"role": "user", "content": prompt}]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(**model_inputs, max_new_tokens=512, num_beams=5, do_sample=False)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
+print("response:", outputs)
+```
+## Support Languages
+| Resource Tier | Languages |
+| :---- | :---- |
+| High-resource Languages (13) | Arabic(ar), English(en), Spanish(es), German(de), French(fr), Italian(it), Japanese(ja), Dutch(nl), Polish(pl), Portuguese(pt), Russian(ru), Turkish(tr), Chinese(zh) |
+| Medium-resource Languages (18) | Bulgarian(bg), Bengali(bn), Czech(cs), Danish(da), Modern Greek(el), Persian(fa), Finnish(fi), Hindi(hi), Hungarian(hu), Indonesian(id), Korean(ko), Norwegian(no), Romanian(ro), Slovak(sk), Swedish(sv), Thai(th), Ukrainian(uk), Vietnamese(vi) |
+| Low-resouce Languages (29) | Amharic(am), Azerbaijani(az), Tibetan(bo), Modern Hebrew(he), Croatian(hr), Armenian(hy), Icelandic(is), Javanese(jv), Georgian(ka), Kazakh(kk), Central Khmer(km), Kirghiz(ky), Lao(lo), Chinese Mongolian(mn_cn), Marathi(mr), Malay(ms), Burmese(my), Nepali(ne), Pashto(ps), Sinhala(si), Swahili(sw), Tamil(ta), Telugu(te), Tajik(tg), Tagalog(tl), Uighur(ug), Urdu(ur), Uzbek(uz), Yue Chinese(yue) |
+## Citation
+If you find our paper useful for your research, please kindly cite our paper:
+```bash
+@misc{luoyf2025lmt,
+      title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
+      author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
+      year={2025},
+      eprint={2511.07003},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2511.07003},
+}
+```