Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +175 -161

README.md CHANGED Viewed

@@ -1,162 +1,176 @@
----
-library_name: transformers
-license: apache-2.0
-base_model: Qwen/Qwen2.5-7B
-datasets:
-- allenai/tulu-3-sft-mixture
----
-# Teleut 7b
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/UqIi8eztdptvt52Mak_1K.png)
-A replication attempt of Tulu 3 on the Qwen 2.5 base models.
-## Evals (so far)
-|                         | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
-|-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
-|BBH (3 shot, CoT)        |*64.4%*               |**67.9%**                 |21.7%                            |56.2%                    |47.0%<sup>NLL</sup>
-|GSM8K (8 shot, CoT)      |78.5%                 |76.2%                     |**83.8%**                        |*80.0%*                  |xx.x%
-|IFEval (prompt loose)    |66.3%                 |*72.8%*                   |**74.7%**                        |56.4%                    |53.0%
-|MMLU (0 shot, CoT)       |*73.2%*               |65.9%                     |**76.6%**                        |68.5%                    |30.7%<sup>5-shot</sup>
-|MMLU Pro (0 shot, CoT)   |*48.3%*               |44.3%                     |**56.3%**<sup>Unknown</sup>      |32.9%<sup>5-shot</sup>   |30.7%<sup>5-shot</sup>
-|PopQA (15 shot)          |18.9%                 |**29.3%**                 |18.1%                            |*20.2%*                  |xx.x%
-|TruthfulQA               |47.2%                 |46.8%                     |**63.1%**                        |*55.5%*                  |xx.x%
-## Credits
-Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
-Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!
-```
-@article{lambert2024tulu3,
-  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
-  author = {
-    Nathan Lambert and
-    Jacob Morrison and
-    Valentina Pyatkin and
-    Shengyi Huang and
-    Hamish Ivison and
-    Faeze Brahman and
-    Lester James V. Miranda and
-    Alisa Liu and
-    Nouha Dziri and
-    Shane Lyu and
-    Yuling Gu and
-    Saumya Malik and
-    Victoria Graf and
-    Jena D. Hwang and
-    Jiangjiang Yang and
-    Ronan Le Bras and
-    Oyvind Tafjord and
-    Chris Wilhelm and
-    Luca Soldaini and
-    Noah A. Smith and
-    Yizhong Wang and
-    Pradeep Dasigi and
-    Hannaneh Hajishirzi
-  },
-  year = {2024},
-  email = {tulu@allenai.org}
-}
-```
-## Training procedure
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3.5e-06
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 128
-- total_eval_batch_size: 64
-- optimizer: Use paged_ademamix_8bit and the args are:
-No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 370
-- num_epochs: 1
-### Framework versions
-- Transformers 4.46.3
-- Pytorch 2.5.1+cu124
-- Datasets 3.1.0
-- Tokenizers 0.20.3
-### Configuration
-<details><summary>See axolotl config</summary>
-axolotl version: `0.5.2`
-```yaml
-base_model: Qwen/Qwen2.5-7B
-plugins:
-  - axolotl.integrations.liger.LigerPlugin
-liger_rope: true
-liger_rms_norm: true
-liger_glu_activation: true
-liger_fused_linear_cross_entropy: true
-strict: false
-chat_template: chatml
-datasets:
-  - path: allenai/tulu-3-sft-mixture
-    type: chat_template
-    split: train
-    field_messages: messages
-dataset_prepared_path: last_run_prepared
-#val_set_size: 0.02
-output_dir: ./ckpts
-sequence_len: 8192
-#sample_packing: true
-pad_to_sequence_len: true
-wandb_project: qwen-2.5-7b-sft
-wandb_entity:
-wandb_watch:
-wandb_name:
-wandb_log_model:
-gradient_accumulation_steps: 2
-micro_batch_size: 8
-num_epochs: 1
-optimizer: paged_ademamix_8bit
-lr_scheduler: cosine
-learning_rate: 3.5e-6
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-gradient_checkpointing: true
-gradient_checkpointing_kwargs:
-  use_reentrant: false
-early_stopping_patience:
-resume_from_checkpoint:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-deepspeed: deepspeed_configs/zero3_bf16.json
-warmup_steps: 370
-#evals_per_epoch: 4
-eval_table_size:
-saves_per_epoch: 2
-debug:
-weight_decay: 0.0
-```
 </details><br>

+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B
+datasets:
+- allenai/tulu-3-sft-mixture
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+# Teleut 7b
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/UqIi8eztdptvt52Mak_1K.png)
+A replication attempt of Tulu 3 on the Qwen 2.5 base models.
+## Evals (so far)
+|                         | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
+|-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
+|BBH (3 shot, CoT)        |*64.4%*               |**67.9%**                 |21.7%                            |56.2%                    |47.0%<sup>NLL</sup>
+|GSM8K (8 shot, CoT)      |78.5%                 |76.2%                     |**83.8%**                        |*80.0%*                  |xx.x%
+|IFEval (prompt loose)    |66.3%                 |*72.8%*                   |**74.7%**                        |56.4%                    |53.0%
+|MMLU (0 shot, CoT)       |*73.2%*               |65.9%                     |**76.6%**                        |68.5%                    |30.7%<sup>5-shot</sup>
+|MMLU Pro (0 shot, CoT)   |*48.3%*               |44.3%                     |**56.3%**<sup>Unknown</sup>      |32.9%<sup>5-shot</sup>   |30.7%<sup>5-shot</sup>
+|PopQA (15 shot)          |18.9%                 |**29.3%**                 |18.1%                            |*20.2%*                  |xx.x%
+|TruthfulQA               |47.2%                 |46.8%                     |**63.1%**                        |*55.5%*                  |xx.x%
+## Credits
+Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
+Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!
+```
+@article{lambert2024tulu3,
+  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
+  author = {
+    Nathan Lambert and
+    Jacob Morrison and
+    Valentina Pyatkin and
+    Shengyi Huang and
+    Hamish Ivison and
+    Faeze Brahman and
+    Lester James V. Miranda and
+    Alisa Liu and
+    Nouha Dziri and
+    Shane Lyu and
+    Yuling Gu and
+    Saumya Malik and
+    Victoria Graf and
+    Jena D. Hwang and
+    Jiangjiang Yang and
+    Ronan Le Bras and
+    Oyvind Tafjord and
+    Chris Wilhelm and
+    Luca Soldaini and
+    Noah A. Smith and
+    Yizhong Wang and
+    Pradeep Dasigi and
+    Hannaneh Hajishirzi
+  },
+  year = {2024},
+  email = {tulu@allenai.org}
+}
+```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3.5e-06
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- total_eval_batch_size: 64
+- optimizer: Use paged_ademamix_8bit and the args are:
+No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 370
+- num_epochs: 1
+### Framework versions
+- Transformers 4.46.3
+- Pytorch 2.5.1+cu124
+- Datasets 3.1.0
+- Tokenizers 0.20.3
+### Configuration
+<details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
+```yaml
+base_model: Qwen/Qwen2.5-7B
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_fused_linear_cross_entropy: true
+strict: false
+chat_template: chatml
+datasets:
+  - path: allenai/tulu-3-sft-mixture
+    type: chat_template
+    split: train
+    field_messages: messages
+dataset_prepared_path: last_run_prepared
+#val_set_size: 0.02
+output_dir: ./ckpts
+sequence_len: 8192
+#sample_packing: true
+pad_to_sequence_len: true
+wandb_project: qwen-2.5-7b-sft
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+gradient_accumulation_steps: 2
+micro_batch_size: 8
+num_epochs: 1
+optimizer: paged_ademamix_8bit
+lr_scheduler: cosine
+learning_rate: 3.5e-6
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+deepspeed: deepspeed_configs/zero3_bf16.json
+warmup_steps: 370
+#evals_per_epoch: 4
+eval_table_size:
+saves_per_epoch: 2
+debug:
+weight_decay: 0.0
+```
 </details><br>