Qwen-7B ?? | ??  | Qwen-7B-Chat ?? | ??  |  Demo  |  Report
通義千問-7B(Qwen-7B) 是阿里云研發(fā)的通義千問大模型系列的70億參數規(guī)模的模型。Qwen-7B是基于Transformer的大語言模型, 在超大規(guī)模的預訓練數據上進行訓練得到。預訓練數據類型多樣,覆蓋廣泛,包括大量網絡文本、專業(yè)書籍、代碼等。同時,在Qwen-7B的基礎上,我們使用對齊機制打造了基于大語言模型的AI助手Qwen-7B-Chat。本倉庫為Qwen-7B的倉庫。
通義千問-7B(Qwen-7B)主要有以下特點:
如果您想了解更多關于通義千問7B開源模型的細節(jié),我們建議您參閱Github代碼庫。
Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. This repository is the one for Qwen-7B.
The features of Qwen-7B include:
For more details about the open-source model of Qwen-7B, please refer to the Github code repository.
運行Qwen-7B,請確保機器環(huán)境torch版本不低于1.12,再執(zhí)行以下pip命令安裝依賴庫
To run Qwen-7B, please make sure that pytorch version is not lower than 1.12, and then execute the following pip commands to install the dependent libraries.
pip install modelscope -U
pip install transformers_stream_generator
另外,推薦安裝flash-attention
庫,以實現更高的效率和更低的顯存占用。
In addition, it is recommended to install the flash-attention
library for higher efficiency and lower memory usage.
git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
pip install csrc/layer_norm
pip install csrc/rotary
您可以通過以下代碼輕松調用:
You can easily call the model with the following code:
from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig
# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-7B", revision = 'v.1.0.4', trust_remote_code=True)
# We recommend checking the support of BF16 first. Run the command below:
# import torch
# torch.cuda.is_bf16_supported()
# use bf16
model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="auto", revision = 'v.1.0.4',trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="auto", revision = 'v.1.0.4', trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="cpu", revision = 'v.1.0.4', trust_remote_code=True).eval()
# use fp32
#model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="auto", revision = 'v.1.0.4',trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("qwen/Qwen-7B", revision = 'v.1.0.4',trust_remote_code=True) # 可指定不同的生成長度、top_p等相關超參
inputs = tokenizer('蒙古國的首都是烏蘭巴托(Ulaanbaatar)\n冰島的首都是雷克雅未克(Reykjavik)\n埃塞俄比亞的首都是', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# 蒙古國的首都是烏蘭巴托(Ulaanbaatar)\n冰島的首都是雷克雅未克(Reykjavik)\n埃塞俄比亞的首都是亞的斯亞貝巴(Addis Ababa)...
代碼鏈接: https://github.com/modelscope/swift/tree/main/examples/pytorch/llm
使用qlora sft qwen-7b的腳本 (需要16G顯存)
git clone https://github.com/modelscope/swift.git
cd swift/examples/pytorch/llm
# 16GB VRAM
CUDA_VISIBLE_DEVICES=0 \
python src/llm_sft.py \
--model_type qwen-7b \
--sft_type lora \
--dtype bf16 \
--output_dir runs \
--dataset alpaca-en,alpaca-zh \
--dataset_sample -1 \
--num_train_epochs 1 \
--max_length 1024 \
--quantization_bit 4 \
--lora_rank 64 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--batch_size 1 \
--weight_decay 0. \
--learning_rate 1e-4 \
--gradient_accumulation_steps 16 \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 10 \
--use_flash_attn false \
--push_to_hub false \
--hub_model_id qwen-7b-qlora \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
關于更多的使用說明,請參考我們的Github repo獲取更多信息。
For more information, please refer to our Github repo for more information.
Qwen-7B模型規(guī)?;厩闆r如下所示:
The details of the model architecture of Qwen-7B are listed as follows:
Hyperparameter | Value |
---|---|
n_layers | 32 |
n_heads | 32 |
d_model | 4096 |
vocab size | 151851 |
sequence length | 2048 |
在位置編碼、FFN激活函數和normalization的實現方式上,我們也采用了目前最流行的做法,
即RoPE相對位置編碼、SwiGLU激活函數、RMSNorm(可選安裝flash-attention加速)。
在分詞器方面,相比目前主流開源模型以中英詞表為主,Qwen-7B使用了超過15萬token大小的詞表。 該詞表在GPT-4使用的BPE詞表cl100k_base
基礎上,對中文、多語言進行了優(yōu)化,在對中、英、代碼數據的高效編解碼的基礎上,對部分多語言更加友好,方便用戶在不擴展詞表的情況下對部分語種進行能力增強。
詞表對數字按單個數字位切分。調用較為高效的tiktoken分詞庫進行分詞。
我們從部分語種各隨機抽取100萬個文檔語料,以對比不同模型的編碼壓縮率(以支持100語種的XLM-R為基準值1,越低越好),具體性能見圖。
可以看到Qwen-7B在保持中英代碼高效解碼的前提下,對部分使用人群較多的語種(泰語th、希伯來語he、阿拉伯語ar、韓語ko、越南語vi、日語ja、土耳其語tr、印尼語id、波蘭語pl、俄語ru、荷蘭語nl、葡萄牙語pt、意大利語it、德語de、西班牙語es、法語fr等)上也實現了較高的壓縮率,使得模型在這些語種上也具備較強的可擴展性和較高的訓練和推理效率。
在預訓練數據方面,Qwen-7B模型一方面利用了部分開源通用語料,
另一方面也積累了海量全網語料以及高質量文本內容,去重及過濾后的語料超過2.2T tokens。
囊括全網文本、百科、書籍、代碼、數學及各個領域垂類。
For position encoding, FFN activation function, and normalization methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).
For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-7B uses a vocabulary of over 150K tokens. It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary. It segments numbers by single digit, and calls the tiktoken tokenizer library for efficient tokenization.
We randomly selected 1 million document corpus of each language to test and compare the encoding compression rates of different models (with XLM-R, which supports 100 languages, as the base value 1). The specific performance is shown in the figure above.
As can be seen, while ensuring the efficient decoding of Chinese, English, and code, Qwen-7B also achieves a high compression rate for many other languages (such as th, he, ar, ko, vi, ja, tr, id, pl, ru, nl, pt, it, de, es, fr etc.), equipping the model with strong scalability as well as high training and inference efficiency in these languages.
For pre-training data, on the one hand, Qwen-7B uses part of the open-source generic corpus. On the other hand, it uses a massive amount of accumulated web corpus and high-quality text content. The scale of corpus reaches over 2.2T tokens after deduplication and filtration, encompassing web text, encyclopedias, books, code, mathematics, and various domain.
C-Eval是評測預訓練模型中文常識能力的常用測評框架,覆蓋人文、社科、理工、其他專業(yè)四個大方向共52個學科。
我們按照標準做法,以開發(fā)集樣本作為few-shot來源,評價Qwen-7B預訓練模型的5-shot驗證集與測試集準確率。
C-Eval is a common evaluation benchmark for testing the common sense capability of pre-trained models in Chinese. It covers 52 subjects in four major directions: humanities, social sciences, STEM, and other specialties. According to the standard practice, we use the development set samples as the source of few-shot, to evaluate the 5-shot validation set and test set accuracy of the Qwen-7B pre-trained model.
在C-Eval驗證集上,Qwen-7B模型和其他模型的準確率對比如下:
The accuracy comparison of Qwen-7B and the other models on the C-Eval validation set is shown as follows:
Model | Avg. |
---|---|
Alpaca-7B | 28.9 |
Vicuna-7B | 31.2 |
ChatGLM-6B | 37.1 |
Baichuan-7B | 42.7 |
ChatGLM2-6B | 50.9 |
InternLM-7B | 53.4 |
ChatGPT | 53.5 |
Claude-v1.3 | 55.5 |
Qwen-7B | 60.8 |
在C-Eval測試集上,Qwen-7B預訓練模型與其他模型的效果對比如下表所示:
The performance comparison of Qwen-7B and other models on the C-Eval test set is shown in the following table:
Model | Avg. | Avg. (Hard) | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|---|
ChatGLM-6B | 38.9 | 29.2 | 33.3 | 48.3 | 41.3 | 38.0 |
Chinese-Alpaca-Plus-13B | 41.5 | 30.5 | 36.6 | 49.7 | 43.1 | 41.2 |
Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
WestlakeLM-19B | 44.6 | 34.9 | 41.6 | 51.0 | 44.3 | 44.5 |
AndesLM-13B | 46.0 | 29.7 | 38.1 | 61.0 | 51.0 | 41.9 |
BatGPT-15B-sirius | 47.0 | 31.9 | 42.7 | 57.5 | 48.6 | 43.6 |
ChatGLM2-6B | 51.7 | 37.1 | 48.6 | 60.5 | 51.3 | 49.8 |
InternLM-7B | 52.8 | 37.1 | 48.0 | 67.4 | 55.4 | 45.8 |
Baichuan-13B | 53.6 | 36.7 | 47.0 | 66.8 | 57.3 | 49.8 |
Claude-v1.3 | 54.2 | 39.0 | 51.9 | 61.7 | 52.1 | 53.7 |
ChatGPT | 54.4 | 41.4 | 52.9 | 61.8 | 50.9 | 53.6 |
Qwen-7B | 59.6 | 41.0 | 52.8 | 74.1 | 63.1 | 55.2 |
可以看到,Qwen-7B在同等規(guī)?,F有模型中取得了最高的分數,甚至相比更大規(guī)模模型也具有較強競爭力。
As can be seen, Qwen-7B achieves the best performance out of all existing models with similar scale and even surpasses larger-scale models.
MMLU是目前評測英文綜合能力最權威的基準評測之一,同樣覆蓋了不同學科領域、不同難度層級的57個子任務。
Qwen-7B在MMLU 5-shot準確率表現如下表:
MMLU is currently one of the most recognized benchmarks for evaluating English comprehension abilities, covering 57 subtasks across different academic fields and difficulty levels. The MMLU 5-shot accuracy performance of Qwen-7B is shown in the following table:
Model | Avg. | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|
LLaMA-7B | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
Baichuan-7B | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
LLaMA2-7B | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
LLaMA-13B | 46.9 | 35.8 | 53.8 | 45.0 | 53.3 |
ChatGLM2-6B | 47.9 | 41.2 | 54.4 | 43.7 | 54.5 |
InternLM-7B | 51.0 | - | - | - | - |
Baichuan-13B | 51.6 | 41.6 | 60.9 | 47.4 | 58.5 |
LLaMA2-13B | 54.8 | 44.1 | 62.6 | 52.8 | 61.1 |
ChatGLM2-12B | 56.2 | 48.2 | 65.1 | 52.6 | 60.9 |
Qwen-7B | 56.7 | 47.6 | 65.9 | 51.5 | 64.7 |
在英文方面,Qwen-7B的效果同樣超過了目前國內外其他同類開源預訓練模型,同樣對比更大規(guī)模版本的模型也具有較強競爭力。
In terms of English, Qwen-7B also surpasses other similar open-source pre-trained models, and is competitive when compared to larger versions of other models.
我們在HumanEval(0-shot)上對比預訓練模型的代碼能力,結果如下:
We compared the code capabilities of pre-trained models on HumanEval, and the results are as follows:
Model | Pass@1 |
---|---|
Baichuan-7B | 9.2 |
ChatGLM2-6B | 9.2 |
InternLM-7B | 10.4 |
LLaMA-7B | 10.5 |
LLaMA2-7B | 12.8 |
Baichuan-13B | 12.8 |
LLaMA-13B | 15.8 |
MPT-7B | 18.3 |
LLaMA2-13B | 18.3 |
Qwen-7B | 24.4 |
數學能力使用常用的GSM8K數據集(8-shot)評價:
We compared the math capabilities of pre-trained models on GSM8K (8-shot), and the results are as follows:
Model | Acc. |
---|---|
MPT-7B | 6.8 |
Falcon-7B | 6.8 |
Baichuan-7B | 9.7 |
LLaMA-7B | 11.0 |
LLaMA2-7B | 14.6 |
LLaMA-13B | 17.8 |
Baichuan-13B | 26.6 |
LLaMA2-13B | 28.7 |
InternLM-7B | 31.2 |
ChatGLM2-6B | 32.4 |
ChatGLM2-12B | 40.9 |
Qwen-7B | 51.6 |
我們使用WMT22中-英(zh-en)和英-中(en-zh)數據集(5-shot BLEU)評測:
We compared the translation capabilities of pre-trained models on WMT22 zh-en and en-zh (5-shot BLEU), and the results are as follows:
Model | Avg. | zh-en | en-zh |
---|---|---|---|
InternLM-7B | 11.8 | 9.0 | 14.5 |
LLaMA-7B | 12.7 | 16.7 | 8.7 |
LLaMA-13B | 15.8 | 19.5 | 12.0 |
LLaMA2-7B | 19.9 | 21.9 | 17.9 |
Bloom-7B | 20.3 | 19.1 | 21.4 |
LLaMA2-13B | 23.3 | 22.4 | 24.2 |
PolyLM-13B | 23.6 | 20.2 | 27.0 |
Baichuan-7B | 24.6 | 22.6 | 26.6 |
Qwen-7B | 27.5 | 24.3 | 30.6 |
我們引入NTK插值,LogN注意力縮放,窗口注意力等技巧,將模型的上下文長度擴展到8K以上。在arXiv數據上使用PPL指標測試Qwen-7B在不同長度下的表現,結果如下:
(若要啟用NTK和LogN注意力縮放,請將config.json里的use_dynamc_ntk
和use_logn_attn
設置為true)
We introduce NTK-aware interpolation, LogN attention scaling, Window attention, etc. to extend the context length to over 8K tokens. We conduct language modeling experiments on the arXiv dataset with the PPL evaluation. Results are demonstrated below:
(To use NTK interpolation and LogN scaling, please set use_dynamic_ntk
and use_long_attn
to true in config.json.)
Model | 序列長度 Sequence Length | ||||
---|---|---|---|---|---|
1024 | 2048 | 4096 | 8192 | 16384 | |
Qwen-7B | 4.23 | 3.78 | 39.35 | 469.81 | 2645.09 |
+ dynamic_ntk | 4.23 | 3.78 | 3.59 | 3.66 | 5.71 |
+ dynamic_ntk + logn | 4.23 | 3.78 | 3.58 | 3.56 | 4.62 |
+ dynamic_ntk + logn + window_attn | 4.23 | 3.78 | 3.58 | 3.49 | 4.32 |
如希望使用更低精度的量化模型,如4比特和8比特的模型,我們提供了簡單的示例來說明如何快速使用量化模型:
To load the model in lower precision, e.g., 4 bits and 8 bits, we provide examples to show how to load by adding quantization configuration:
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from transformers import BitsAndBytesConfig
import torch
# int8
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
# int4
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16)
pipeline_ins = pipeline(
task=Tasks.text_generation, model='qwen/Qwen-7B', device_map='auto', quantization_config=quantization_config)
text = '蒙古國的首都是烏蘭巴托(Ulaanbaatar)\n冰島的首都是雷克雅未克(Reykjavik)\n埃塞俄比亞的首都是'
result = pipeline_ins(text)
print(result['text'])
上述方法可以讓我們將模型量化成NF4
和Int8
精度的模型進行讀取,幫助我們節(jié)省顯存開銷。我們也提供了相關性能數據。我們發(fā)現盡管模型在效果上存在損失,但模型的顯存開銷大幅降低。
With this method, it is available to load Qwen-7B in NF4
and Int8
, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
Precision | MMLU | Memory |
---|---|---|
BF16 | 56.7 | 16.2G |
Int8 | 52.8 | 10.1G |
NF4 | 48.9 | 7.4G |
我們提供了評測腳本,方便大家復現模型效果,詳見鏈接。提示:由于硬件和框架造成的舍入誤差,復現結果如有小幅波動屬于正常現象。
We have provided evaluation scripts to reproduce the performance of our model, details as link.
我們的代碼和模型權重對學術研究完全開放,并支持商用。請查看LICENSE了解具體的開源協(xié)議細節(jié)。
Our code and checkpoints are open to research purpose, and they are allowed for commercial purposes. Check LICENSE for more details about the license.
如果你想給我們的研發(fā)團隊和產品團隊留言,請通過郵件(qianwen_opensource@alibabacloud.com)聯系我們。
If you are interested to leave a message to either our research team or product team, feel free to send an email to qianwen_opensource@alibabacloud.com.