| | --- |
| | license: apache-2.0 |
| | language: |
| | - zh |
| | - en |
| | pipeline_tag: text-generation |
| | --- |
| | <div align="left"> |
| | <h1> |
| | Mengzi3-13B-Base |
| | </h1> |
| | </div> |
| |
|
| | <p align="center"> |
| | <img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="200"/> |
| | <p> |
| | |
| | <p align="center"> |
| | 🤗 <a href="https://huggingface.co/Langboat">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/Langboat">ModelScope</a> | <a href="https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md"><img src="https://www.mindspore.cn/_static/logo-zh-light.99fc9222.svg" width="50" style="white-space: nowrap;display: inline-block;overflow: hidden;max-width: 100%;"/></a> | <a href="https://wisemodel.cn/organization/Langboat">Wisemodel</a> | 💬 <a href="https://github.com/Langboat/Mengzi3/blob/main/assets/wechat.png">WeChat</a> | <a href="https://www.langboat.com/document/mengzi/mengzi-gpt/call">API</a> | <a href="https://www.langboat.com/portal/mengzi-gpt"><img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="16" style="white-space: nowrap;display: inline-block;overflow: hidden;max-width: 100%;"/> 孟子GPT</a> |
| | </p> |
| | |
| | # 模型介绍/Introduction |
| |
|
| | 本次开源Mengzi3 13B系列模型,模型的地址如下: |
| |
|
| | | | Mengzi3-13B-Base | Mengzi3-13B-Chat | |
| | | :-: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: | |
| | | 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [MindSpore](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) | 敬请期待 | |
| |
|
| | Mengzi3-13B模型基于Llama架构,语料精选自网页、百科、社交、媒体、新闻,以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练,模型的中文能力突出并且兼顾多语言能力。 |
| |
|
| | Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities. |
| |
|
| | # 快速开始/Quickstart |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True) |
| | inputs = tokenizer('指令:回答以下问题。输入:介绍一下孟子。输出:', return_tensors='pt') |
| | if torch.cuda.is_available(): |
| | inputs = inputs.to('cuda') |
| | pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id) |
| | print(tokenizer.decode(pred[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | 详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3) |
| |
|
| | Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat) |
| |
|
| | # 性能评测/Evaluation |
| |
|
| | Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比,语言能力成绩领先,数学和编程能力位于前列。 |
| |
|
| | Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests. |
| |
|
| | | | MMLU | CMMLU | OCNLI | GSM8K | HumanEval | |
| | | :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: | |
| | | Baichuan2-13B-Base | 0.530 | 0.489 | 0.433 | 0.528 | 0.171 | |
| | | Qwen-14B | 0.589 | 0.539 | 0.550 | 0.613 | 0.323 | |
| | | ChatGLM3-6B-base | 0.551 | 0.495 | 0.754 | 0.723 | - | |
| | | InternLM2-20B | 0.610 | 0.538 | 0.650 | 0.761 | 0.488 | |
| | | Skywork-13B-base | 0.557 | 0.524 | 0.426 | 0.558 | - | |
| | | LingoWhale-8B | 0.541 | 0.495 | 0.352 | 0.550 | 0.329 | |
| | | DeepSeek-7B | 0.436 | 0.424 | 0.356 | 0.174 | 0.262 | |
| | | DeepSeek-MoE-16B-base | 0.423 | 0.388 | 0.342 | 0.188 | 0.268 | |
| | | MindSource-7B | 0.498 | 0.425 | 0.528 | - | - | |
| | | **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 | 0.287 | |
| |
|
| | > 以上结果基于5-shot,MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/) |
| | > |
| | > The above results are based on 5-shot,MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/) |
| |
|
| | # 协议/License Agreement |
| |
|
| | Mengzi3-13B-Base依照Apache 2.0协议开源,对学术研究完全开放,同时支持免费商用。如需申请商业许可证,请[联系我们](https://www.langboat.com/form?p=3),其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。 |
| |
|
| | Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com). |
| |
|