Qwen-7B-Chat

Qwen-7B ?? | ?? ｜ Qwen-7B-Chat ?? | ?? ｜ Demo ｜ Report

介紹（Introduction）

通義千問(wèn)-7B（Qwen-7B） 是阿里云研發(fā)的通義千問(wèn)大模型系列的70億參數(shù)規(guī)模的模型。Qwen-7B是基于Transformer的大語(yǔ)言模型, 在超大規(guī)模的預(yù)訓(xùn)練數(shù)據(jù)上進(jìn)行訓(xùn)練得到。預(yù)訓(xùn)練數(shù)據(jù)類型多樣，覆蓋廣泛，包括大量網(wǎng)絡(luò)文本、專業(yè)書(shū)籍、代碼等。同時(shí)，在Qwen-7B的基礎(chǔ)上，我們使用對(duì)齊機(jī)制打造了基于大語(yǔ)言模型的AI助手Qwen-7B-Chat。本倉(cāng)庫(kù)為Qwen-7B-Chat的倉(cāng)庫(kù)。

如果您想了解更多關(guān)于通義千問(wèn)-7B開(kāi)源模型的細(xì)節(jié)，我們建議您參閱Github代碼庫(kù)。

Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen-7B`is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. This repository is the one for Qwen-7B-Chat.

For more details about the open-source model of Qwen-7B, please refer to the Github code repository.

依賴項(xiàng)（Dependency）

運(yùn)行Qwen-7B-Chat，請(qǐng)確保機(jī)器環(huán)境pytorch版本不低于1.12，再執(zhí)行以下pip命令安裝依賴庫(kù)

To run Qwen-7B-Chat, please make sure that pytorch version is not lower than 1.12, and then execute the following pip commands to install the dependent libraries.

pip install modelscope
pip install transformers_stream_generator

另外，推薦安裝flash-attention庫(kù)，以實(shí)現(xiàn)更高的效率和更低的顯存占用。

In addition, it is recommended to install the flash-attention library for higher efficiency and lower memory usage.

git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
pip install csrc/layer_norm
pip install csrc/rotary

快速使用（Quickstart）

下面我們展示了一個(gè)使用Qwen-7B-Chat模型，進(jìn)行多輪對(duì)話交互的樣例（非流式）：

We show an example of multi-turn interaction with Qwen-7B-Chat in the following code:

from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-7B-Chat", revision = 'v1.0.5',trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B-Chat", revision = 'v1.0.5',device_map="auto", trust_remote_code=True,fp16 = True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat",revision = 'v1.0.5', trust_remote_code=True) # 可指定不同的生成長(zhǎng)度、top_p等相關(guān)超參

response, history = model.chat(tokenizer, "你好", history=None)
print(response)
response, history = model.chat(tokenizer, "浙江的省會(huì)在哪里？", history=history) 
print(response)
response, history = model.chat(tokenizer, "它有什么好玩的景點(diǎn)", history=history)
print(response)

下面我們展示了一個(gè)使用Qwen-7B-Chat模型，進(jìn)行多輪對(duì)話交互的樣例（流式）：

We show an example of multi-turn interaction with Qwen-7B-Chat in the following code:

import os
import platform
from modelscope import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model_id = 'qwen/Qwen-7B-Chat'
revision = 'v1.0.5'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision, trust_remote_code=True)
# use fp16
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", revision=revision, 
                                             trust_remote_code=True, fp16=True).eval()
model.generation_config = GenerationConfig.from_pretrained(model_id,
                                                           trust_remote_code=True)  # 可指定不同的生成長(zhǎng)度、top_p等相關(guān)超參

stop_stream = False


def clear_screen():
    if platform.system() == "Windows":
        os.system("cls")
    else:
        os.system("clear")


def print_history(history):
    for pair in history:
        print(f"\nUser：{pair[0]}\nQwen-7B：{pair[1]}")


def main():
    history, response = [], ''
    global stop_stream
    clear_screen()
    print("歡迎使用 Qwen-7B 模型，輸入內(nèi)容即可進(jìn)行對(duì)話，clear 清空對(duì)話歷史，stop 終止程序")
    while True:
        query = input("\nUser：")
        if query.strip() == "stop":
            break
        if query.strip() == "clear":
            history = []
            clear_screen()
            print("歡迎使用 Qwen-7B 模型，輸入內(nèi)容即可進(jìn)行對(duì)話，clear 清空對(duì)話歷史，stop 終止程序")
            continue
        for response in model.chat_stream(tokenizer, query, history=history):
            if stop_stream:
                stop_stream = False
                break
            else:
                clear_screen()
                print_history(history)
                print(f"\nUser: {query}")
                print("\nQwen-7B：", end="")
                print(response)

        history.append((query, response))


if __name__ == "__main__":
    main()

關(guān)于更多的使用說(shuō)明，請(qǐng)參考我們的Github repo獲取更多信息。

For more information, please refer to our Github repo for more information.

模型細(xì)節(jié)（Model）

與Qwen-7B預(yù)訓(xùn)練模型相同，Qwen-7B-Chat模型規(guī)?；厩闆r如下所示

The details of the model architecture of Qwen-7B-Chat are listed as follows

Hyperparameter	Value
n_layers	32
n_heads	32
d_model	4096
vocab size	151851
sequence length	2048

在位置編碼、FFN激活函數(shù)和normalization的實(shí)現(xiàn)方式上，我們也采用了目前最流行的做法，
即RoPE相對(duì)位置編碼、SwiGLU激活函數(shù)、RMSNorm（可選安裝flash-attention加速）。

在分詞器方面，相比目前主流開(kāi)源模型以中英詞表為主，Qwen-7B-Chat使用了約15萬(wàn)token大小的詞表。
該詞表在GPT-4使用的BPE詞表cl100k_base基礎(chǔ)上，對(duì)中文、多語(yǔ)言進(jìn)行了優(yōu)化，在對(duì)中、英、代碼數(shù)據(jù)的高效編解碼的基礎(chǔ)上，對(duì)部分多語(yǔ)言更加友好，方便用戶在不擴(kuò)展詞表的情況下對(duì)部分語(yǔ)種進(jìn)行能力增強(qiáng)。
詞表對(duì)數(shù)字按單個(gè)數(shù)字位切分。調(diào)用較為高效的tiktoken分詞庫(kù)進(jìn)行分詞。

For position encoding, FFN activation function, and normalization calculation methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).

For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-7B-Chat uses a vocabulary of over 150K tokens.
It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary.
It segments numbers by single digit, and calls the tiktoken tokenizer library for efficient tokenization.

評(píng)測(cè)效果（Evaluation）

對(duì)于Qwen-7B-Chat模型，我們同樣評(píng)測(cè)了常規(guī)的中文理解（C-Eval）、英文理解（MMLU）、代碼（HumanEval）和數(shù)學(xué)（GSM8K）等權(quán)威任務(wù)，同時(shí)包含了長(zhǎng)序列任務(wù)的評(píng)測(cè)結(jié)果。由于Qwen-7B-Chat模型經(jīng)過(guò)對(duì)齊后，激發(fā)了較強(qiáng)的外部系統(tǒng)調(diào)用能力，我們還進(jìn)行了工具使用能力方面的評(píng)測(cè)。

提示：由于硬件和框架造成的舍入誤差，復(fù)現(xiàn)結(jié)果如有波動(dòng)屬于正?，F(xiàn)象。

For Qwen-7B-Chat, we also evaluate the model on C-Eval, MMLU, HumanEval, GSM8K, etc., as well as the benchmark evaluation for long-context understanding, and tool usage.

Note: Due to rounding errors caused by hardware and framework, differences in reproduced results are possible.

中文評(píng)測(cè)（Chinese Evaluation）

C-Eval

在C-Eval驗(yàn)證集上，我們?cè)u(píng)價(jià)了Qwen-7B-Chat模型的zero-shot準(zhǔn)確率

We demonstrate the zero-shot accuracy of Qwen-7B-Chat on C-Eval validation set

Model	Avg. Acc.
LLaMA2-7B-Chat	31.9
LLaMA2-13B-Chat	40.6
Chinese-Alpaca-2-7B	41.3
Chinese-Alpaca-Plus-13B	43.3
Baichuan-13B-Chat	50.4
ChatGLM2-6B-Chat	50.7
InternLM-7B-Chat	53.2
Qwen-7B-Chat	54.2

C-Eval測(cè)試集上，Qwen-7B-Chat模型的zero-shot準(zhǔn)確率結(jié)果如下：

The zero-shot accuracy of Qwen-7B-Chat on C-Eval testing set is provided below:

Model	Avg.	STEM	Social Sciences	Humanities	Others
Chinese-Alpaca-Plus-13B	41.5	36.6	49.7	43.1	41.2
Chinese-Alpaca-2-7B	40.3	-	-	-	-
ChatGLM2-6B-Chat	50.1	46.4	60.4	50.6	46.9
Baichuan-13B-Chat	51.5	43.7	64.6	56.2	49.2
Qwen-7B-Chat	54.6	47.8	67.6	59.3	50.6

在7B規(guī)模模型上，經(jīng)過(guò)人類指令對(duì)齊的Qwen-7B-Chat模型，準(zhǔn)確率在同類相近規(guī)模模型中仍然處于前列。

Compared with other pretrained models with comparable model size, the human-aligned Qwen-7B-Chat performs well in C-Eval accuracy.

英文評(píng)測(cè)（English Evaluation）

MMLU

MMLU評(píng)測(cè)集上，Qwen-7B-Chat模型的zero-shot準(zhǔn)確率如下，效果同樣在同類對(duì)齊模型中同樣表現(xiàn)較優(yōu)。

The zero-shot accuracy of Qwen-7B-Chat on MMLU is provided below.
The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.

Model	Avg. Acc.
ChatGLM2-6B-Chat	45.5
LLaMA2-7B-Chat	47.0
InternLM-7B-Chat	50.8
Baichuan-13B-Chat	52.1
ChatGLM2-12B-Chat	52.1
Qwen-7B-Chat	53.9

代碼評(píng)測(cè)（Coding Evaluation）

Qwen-7B-Chat在HumanEval的zero-shot Pass@1效果如下

The zero-shot Pass@1 of Qwen-7B-Chat on HumanEval is demonstrated below

Model	Pass@1
LLaMA2-7B-Chat	12.2
InternLM-7B-Chat	14.0
Baichuan-13B-Chat	16.5
LLaMA2-13B-Chat	18.9
Qwen-7B-Chat	21.3

數(shù)學(xué)評(píng)測(cè)

在評(píng)測(cè)數(shù)學(xué)能力的GSM8K上，Qwen-7B-Chat的準(zhǔn)確率結(jié)果如下

The accuracy of Qwen-7B-Chat on GSM8K is shown below

Model	Zero-shot Acc.	4-shot Acc.
ChatGLM2-6B-Chat	-	28.0
LLaMA2-7B-Chat	20.4	28.2
LLaMA2-13B-Chat	29.4	36.7
InternLM-7B-Chat	32.6	34.5
Baichuan-13B-Chat	-	36.3
ChatGLM2-12B-Chat	-	38.1
Qwen-7B-Chat	41.1	43.5

長(zhǎng)序列評(píng)測(cè)（Long-Context Understanding）

通過(guò)NTK插值，LogN注意力縮放可以擴(kuò)展Qwen-7B-Chat的上下文長(zhǎng)度。在長(zhǎng)文本摘要數(shù)據(jù)集VCSUM上（文本平均長(zhǎng)度在15K左右），Qwen-7B-Chat的Rouge-L結(jié)果如下：

(若要啟用這些技巧，請(qǐng)將config.json里的use_dynamc_ntk和use_logn_attn設(shè)置為true)

We introduce NTK-aware interpolation, LogN attention scaling to extend the context length of Qwen-7B-Chat. The Rouge-L results of Qwen-7B-Chat on long-text summarization dataset VCSUM (The average length of this dataset is around 15K) are shown below:

(To use these tricks, please set use_dynamic_ntk and use_long_attn to true in config.json.)

Model	VCSUM (zh)
GPT-3.5-Turbo-16k	16.0
LLama2-7B-Chat	0.2
InternLM-7B-Chat	13.0
ChatGLM2-6B-Chat	16.3
Qwen-7B-Chat	16.6

工具使用能力的評(píng)測(cè)（Tool Usage）

ReAct Prompting

千問(wèn)支持通過(guò) ReAct Prompting 調(diào)用插件/工具/API。ReAct 也是 LangChain 框架采用的主要方式之一。在即將開(kāi)源的、用于評(píng)估工具使用能力的自建評(píng)測(cè)基準(zhǔn)上，千問(wèn)的表現(xiàn)如下：

Qwen-7B-Chat supports calling plugins/tools/APIs through ReAct Prompting. ReAct is also one of the main approaches used by the LangChain framework. In the soon-to-be-released evaluation benchmark for assessing tool usage capabilities, Qwen-7B-Chat’s performance is as follows:

Model	Tool Selection (Acc.↑)	Tool Input (Rouge-L↑)	False Positive Error↓
GPT-4	95%	0.90	15%
GPT-3.5	85%	0.88	75%
Qwen-7B-Chat	99%	0.89	8.5%

評(píng)測(cè)基準(zhǔn)中出現(xiàn)的插件均沒(méi)有出現(xiàn)在千問(wèn)的訓(xùn)練集中。該基準(zhǔn)評(píng)估了模型在多個(gè)候選插件中選擇正確插件的準(zhǔn)確率、傳入插件的參數(shù)的合理性、以及假陽(yáng)率。假陽(yáng)率（False Positive）定義：在處理不該調(diào)用插件的請(qǐng)求時(shí)，錯(cuò)誤地調(diào)用了插件。

The plugins that appear in the evaluation set do not appear in the training set of Qwen-7B-Chat. This benchmark evaluates the accuracy of the model in selecting the correct plugin from multiple candidate plugins, the rationality of the parameters passed into the plugin, and the false positive rate. False Positive: Incorrectly invoking a plugin when it should not have been called when responding to a query.

關(guān)于 ReAct Prompting 的 prompt 怎么寫(xiě)、怎么使用，請(qǐng)參考 ReAct 樣例說(shuō)明。使用工具能使模型更好地完成任務(wù)。基于千問(wèn)的工具使用能力，我們能實(shí)現(xiàn)下圖所展示的效果：

For how to write and use prompts for ReAct Prompting, please refer to the ReAct examples. The use of tools can enable the model to better perform tasks, as shown in the following figures:

量化（Quantization）

如希望使用更低精度的量化模型，如4比特和8比特的模型，我們提供了簡(jiǎn)單的示例來(lái)說(shuō)明如何快速使用量化模型：

To load the model in lower precision, e.g., 4 bits and 8 bits, we provide examples to show how to load by adding quantization configuration:

from modelscope import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig
import torch
from modelscope import GenerationConfig
quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type='nf4',
            bnb_4bit_compute_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-7B-Chat", revision = 'v1.0.1',trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B-Chat", revision = 'v1.0.1', quantization_config=quantization_config, device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat",revision = 'v1.0.1', trust_remote_code=True) # 可指定不同>的生成長(zhǎng)度、top_p等相關(guān)超參
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
response, history = model.chat(tokenizer, "浙江的省會(huì)在哪里？", history=history)
print(response)
response, history = model.chat(tokenizer, "它有什么好玩的景點(diǎn)", history=history)
print(response)

上述方法可以讓我們將模型量化成NF4和Int8精度的模型進(jìn)行讀取，幫助我們節(jié)省顯存開(kāi)銷。我們也提供了相關(guān)性能數(shù)據(jù)。我們發(fā)現(xiàn)盡管模型在效果上存在損失，但模型的顯存開(kāi)銷大幅降低。

With this method, it is available to load Qwen-7B-Chat in NF4and Int8, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.

Precision	MMLU	Memory
BF16	56.7	16.2G
Int8	52.8	10.1G
NF4	48.9	7.4G

使用協(xié)議（License Agreement）

我們的代碼和模型權(quán)重對(duì)學(xué)術(shù)研究完全開(kāi)放，并支持商用。請(qǐng)查看LICENSE了解具體的開(kāi)源協(xié)議細(xì)節(jié)。

Our code and checkpoints are open to research purpose, and they are allowed for commercial purposes. Check LICENSE for more details about the license.

聯(lián)系我們（Contact Us）

如果你想給我們的研發(fā)團(tuán)隊(duì)和產(chǎn)品團(tuán)隊(duì)留言，請(qǐng)通過(guò)郵件（qianwen_opensource@alibabacloud.com）聯(lián)系我們。

If you are interested to leave a message to either our research team or product team, feel free to send an email to qianwen_opensource@alibabacloud.com.

五月天成人小说,中文字幕亚洲欧美专区,久久妇女,亚洲伊人久久大香线蕉综合,日日碰狠狠添天天爽超碰97