quickmt
/

quickmt-en-pt

@@ -1,39 +1,39 @@
 ---
 language:
 - en
-- ru
 tags:
 - translation
 license: cc-by-4.0
 datasets:
-- quickmt/quickmt-train.ru-en
 model-index:
-- name: quickmt-en-ru
   results:
   - task:
-      name: Translation eng-rus
       type: translation
-      args: eng-rus
     dataset:
       name: flores101-devtest
       type: flores_101
-      args: eng_Latn rus_Cyrl devtest
     metrics:
     - name: BLEU
       type: bleu
-      value: 32.29
     - name: CHRF
       type: chrf
-      value: 59.12
     - name: COMET
       type: comet
-      value: 87.77
 ---
-# `quickmt-en-ru` Neural Machine Translation Model
-`quickmt-en-ru` is a reasonably fast and reasonably accurate neural machine translation model for translation from `en` into `ru`.
 ## Model Information
@@ -42,7 +42,7 @@ model-index:
 * 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
 * 20k sentencepiece vocabularies
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
-* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.ru-en/tree/main
 See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
@@ -57,7 +57,7 @@ Next, install the `quickmt` python library and download the model:
 git clone https://github.com/quickmt/quickmt.git
 pip install ./quickmt/
-quickmt-model-download quickmt/quickmt-en-ru ./quickmt-en-ru
 ```
 Finally use the model in python:
@@ -73,29 +73,15 @@ sample_text = 'Dr. Ehud Ur, professor of medicine at Dalhousie University in Hal
 t(sample_text, beam_size=5)
 ```
-> 'Доктор Эхуд Ур, профессор медицины в Университете Далхаузи в Галифаксе, Новая Шотландия, и председатель клинического и научного отдела Канадской диабетической ассоциации предупредил, что исследование все еще находится на ранних этапах.'
-```python
-# Get alternative translations by sampling
-# You can pass any cTranslate2 `translate_batch` arguments
-t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
-```
-> 'Доктор Ehud Ур (Ehud Ur), профессор медицины в университете Далхаузи в Галифаксе, Новая Шотландия, а также профессор кафедры клинической и научной литературы Канадской диабетической ассоциации предупреждает, что исследование еще проводится в ранние годы работы.'
-The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
 ## Metrics
-`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("eng_Latn"->"rus_Cyrl"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate (using `ctranslate2`) the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible with a larger batch size).
-|                                  |   bleu |   chrf2 |   comet22 |   Time (s) |
-|:---------------------------------|-------:|--------:|----------:|-----------:|
-| quickmt/quickmt-en-ru            |  32.29 |   59.12 |     87.77 |       1.43 |
-| Helsink-NLP/opus-mt-en-ru        |  26.59 |   54.91 |     85.26 |       4.37 |
-| facebook/nllb-200-distilled-600M |  28.79 |   56.58 |     87.58 |      26.71 |
-| facebook/nllb-200-distilled-1.3B |  31.5  |   58.63 |     89.26 |      46.57 |
-| facebook/m2m100_418M             |  23.16 |   51.73 |     82.12 |      20.51 |
-| facebook/m2m100_1.2B             |  28.88 |   56.61 |     87    |      41.15 |

 ---
 language:
 - en
+- pt
 tags:
 - translation
 license: cc-by-4.0
 datasets:
+- quickmt/quickmt-train.pt-en
 model-index:
+- name: quickmt-en-pt
   results:
   - task:
+      name: Translation eng-por
       type: translation
+      args: eng-por
     dataset:
       name: flores101-devtest
       type: flores_101
+      args: eng_Latn por_Latn devtest
     metrics:
     - name: BLEU
       type: bleu
+      value: 50.62
     - name: CHRF
       type: chrf
+      value: 71.79
     - name: COMET
       type: comet
+      value: 89.27
 ---
+# `quickmt-en-pt` Neural Machine Translation Model
+`quickmt-en-pt` is a reasonably fast and reasonably accurate neural machine translation model for translation from `en` into `pt`.
 ## Model Information
 * 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
 * 20k sentencepiece vocabularies
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
+* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.pt-en/tree/main
 See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
 git clone https://github.com/quickmt/quickmt.git
 pip install ./quickmt/
+quickmt-model-download quickmt/quickmt-en-pt ./quickmt-en-pt
 ```
 Finally use the model in python:
 t(sample_text, beam_size=5)
 ```
 ## Metrics
+`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("eng_Latn"->"por_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate (using `ctranslate2`) the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible with a larger batch size).
+|                                  |   bleu |   chrfs |   comet |   time |
+|:---------------------------------|-------:|--------:|--------:|-------:|
+| quickmt-en-pt                    |  50.62 |   71.79 |   89.27 |   0.97 |
+| facebook/nllb-200-distilled-600M |  47.68 |   70.28 |   89.05 |  23.75 |
+| facebook/nllb-200-distilled-1.3B |  48.92 |   70.96 |   89.77 |  41.13 |
+| facebook/m2m100_418M             |  41.14 |   65.85 |   85.49 |  19.08 |
+| facebook/m2m100_1.2B             |  46.56 |   69.41 |   88.53 |  37.42 |