Baichuan-13B-Chat為Baichuan-13B系列模型中對齊后的版本,預(yù)訓(xùn)練模型可見Baichuan-13B-Base。
Baichuan-13B 是由百川智能繼 Baichuan-7B 之后開發(fā)的包含 130 億參數(shù)的開源可商用的大規(guī)模語言模型,在權(quán)威的中文和英文 benchmark 上均取得同尺寸最好的效果。本次發(fā)布包含有預(yù)訓(xùn)練 (Baichuan-13B-Base) 和對齊 (Baichuan-13B-Chat) 兩個版本。Baichuan-13B 有如下幾個特點:
Baichuan-13B-Chat is the aligned version in the Baichuan-13B series of models, and the pre-trained model can be found at Baichuan-13B-Base.
Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following Baichuan-7B. With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:
Baichuan-13B 支持 int8 和 int4 量化,用戶只需在推理代碼中簡單修改兩行即可實現(xiàn)。請注意,如果是為了節(jié)省顯存而進行量化,應(yīng)加載原始精度模型到 CPU 后再開始量化;避免在 from_pretrained
時添加 device_map='auto'
或者其它會導(dǎo)致把原始精度模型直接加載到 GPU 的行為的參數(shù)。
Baichuan-13B supports int8 and int4 quantization, users only need to make a simple two-line change in the inference code to implement it. Please note, if quantization is done to save GPU memory, the original precision model should be loaded onto the CPU before starting quantization. Avoid adding parameters such as device_map='auto'
or others that could cause the original precision model to be loaded directly onto the GPU when executing from_pretrained
.
使用 int8 量化 (To use int8 quantization):
import torch
from modelscope import snapshot_download, Model
model_dir = snapshot_download("baichuan-inc/Baichuan-13B-Chat", revision='v1.0.4')
model = Model.from_pretrained(model_dir, device_map="balanced", trust_remote_code=True, torch_dtype=torch.float16)
model = model.quantize(8).cuda()
同樣的,如需使用 int4 量化 (Similarly, to use int4 quantization):
import torch
from modelscope import snapshot_download, Model
model_dir = snapshot_download("baichuan-inc/Baichuan-13B-Chat", revision='v1.0.4')
model = Model.from_pretrained(model_dir, device_map="balanced", trust_remote_code=True, torch_dtype=torch.float16)
model = model.quantize(4).cuda()
商業(yè)用途(For commercial use): 請通過上述Email聯(lián)系申請書面授權(quán)。(Contact us via Email above to apply for written authorization.)
整體模型基于Baichuan-7B,為了獲得更好的推理性能,Baichuan-13B 使用了 ALiBi 線性偏置技術(shù),相對于 Rotary Embedding 計算量更小,對推理性能有顯著提升;與標準的 LLaMA-13B 相比,生成 2000 個 tokens 的平均推理速度 (tokens/s),實測提升 31.6%:
Model | tokens/s |
---|---|
LLaMA-13B | 19.4 |
Baichuan-13B | 25.4 |
具體參數(shù)和見下表
模型名稱 | 隱含層維度 | 層數(shù) | 頭數(shù) | 詞表大小 | 總參數(shù)量 | 訓(xùn)練數(shù)據(jù)(tokens) | 位置編碼 | 最大長度 |
---|---|---|---|---|---|---|---|---|
Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2萬億 | RoPE | 4,096 |
Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4萬億 | ALiBi | 4,096 |
The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
Model | tokens/s |
---|---|
LLaMA-13B | 19.4 |
Baichuan-13B | 25.4 |
The specific parameters are as follows:
Model Name | Hidden Size | Num Layers | Num Attention Heads | Vocab Size | Total Params | Training Dats(tokens) | Position Embedding | Max Length |
---|---|---|---|---|---|---|---|---|
Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2萬億 | RoPE | 4,096 |
Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4萬億 | ALiBi | 4,096 |
import torch
from modelscope import snapshot_download, Model
model_dir = snapshot_download("baichuan-inc/Baichuan-13B-Chat", revision='v1.0.4')
model = Model.from_pretrained(model_dir, device_map="balanced", trust_remote_code=True, torch_dtype=torch.float16)
messages = []
messages.append({"role": "user", "content": "世界上第二高的山峰是哪一座?"})
response = model(messages)
print(response)
messages = response['history'].copy()
messages.append({"role": "user", "content": "世界上第一高的山峰是哪一座?"})
response = model(messages)
print(response)
我們同時開源出了和本模型配套的訓(xùn)練代碼,允許進行高效的Finetune用于下游任務(wù),具體參見Baichuan-13B。
We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to Baichuan-13B.
我們在此聲明,我們的開發(fā)團隊并未基于 Baichuan-13B 模型開發(fā)任何應(yīng)用,無論是在 iOS、Android、網(wǎng)頁或任何其他平臺。我們強烈呼吁所有使用者,不要利用 Baichuan-13B 模型進行任何危害國家社會安全或違法的活動。另外,我們也要求使用者不要將 Baichuan-13B 模型用于未經(jīng)適當(dāng)安全審查和備案的互聯(lián)網(wǎng)服務(wù)。我們希望所有的使用者都能遵守這個原則,確??萍嫉陌l(fā)展能在規(guī)范和合法的環(huán)境下進行。
我們已經(jīng)盡我們所能,來確保模型訓(xùn)練過程中使用的數(shù)據(jù)的合規(guī)性。然而,盡管我們已經(jīng)做出了巨大的努力,但由于模型和數(shù)據(jù)的復(fù)雜性,仍有可能存在一些無法預(yù)見的問題。因此,如果由于使用 Baichuan-13B 開源模型而導(dǎo)致的任何問題,包括但不限于數(shù)據(jù)安全問題、公共輿論風(fēng)險,或模型被誤導(dǎo)、濫用、傳播或不當(dāng)利用所帶來的任何風(fēng)險和問題,我們將不承擔(dān)任何責(zé)任。
We hereby declare that our development team has not developed any applications based on the Baichuan-13B model, whether on iOS, Android, the web, or any other platform. We strongly urge all users not to use the Baichuan-13B model for any activities that harm national social security or are illegal. In addition, we also ask users not to use the Baichuan-13B model for internet services that have not undergone appropriate security review and filing. We hope that all users will adhere to this principle to ensure that technological development takes place in a regulated and legal environment.
We have done our utmost to ensure the compliance of the data used in the model training process. However, despite our great efforts, due to the complexity of the model and data, there may still be some unforeseen issues. Therefore, we will not take any responsibility for any issues arising from the use of the Baichuan-13B open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, misused, disseminated, or improperly exploited.
訓(xùn)練具體設(shè)置參見Baichuan-13B。
For specific training settings, please refer to Baichuan-13B.
Model 5-shot | STEM | Social Sciences | Humanities | Others | Average |
---|---|---|---|---|---|
Baichuan-7B | 38.2 | 52.0 | 46.2 | 39.3 | 42.8 |
Chinese-Alpaca-Plus-13B | 35.2 | 45.6 | 40.0 | 38.2 | 38.8 |
Vicuna-13B | 30.5 | 38.2 | 32.5 | 32.5 | 32.8 |
Chinese-LLaMA-Plus-13B | 30.3 | 38.0 | 32.9 | 29.1 | 32.1 |
Ziya-LLaMA-13B-Pretrain | 27.6 | 34.4 | 32.0 | 28.6 | 30.0 |
LLaMA-13B | 27.0 | 33.6 | 27.7 | 27.6 | 28.5 |
moss-moon-003-base (16B) | 27.0 | 29.1 | 27.2 | 26.9 | 27.4 |
Baichuan-13B-Base | 45.9 | 63.5 | 57.2 | 49.3 | 52.4 |
Baichuan-13B-Chat | 43.7 | 64.6 | 56.2 | 49.2 | 51.5 |
Model 5-shot | STEM | Social Sciences | Humanities | Others | Average |
---|---|---|---|---|---|
Vicuna-13B | 40.4 | 60.5 | 49.5 | 58.4 | 52.0 |
LLaMA-13B | 36.1 | 53.0 | 44.0 | 52.8 | 46.3 |
Chinese-Alpaca-Plus-13B | 36.9 | 48.9 | 40.5 | 50.5 | 43.9 |
Ziya-LLaMA-13B-Pretrain | 35.6 | 47.6 | 40.1 | 49.4 | 42.9 |
Baichuan-7B | 35.6 | 48.9 | 38.4 | 48.1 | 42.3 |
Chinese-LLaMA-Plus-13B | 33.1 | 42.8 | 37.0 | 44.6 | 39.2 |
moss-moon-003-base (16B) | 22.4 | 22.8 | 24.2 | 24.4 | 23.6 |
Baichuan-13B-Base | 41.6 | 60.9 | 47.4 | 58.5 | 51.6 |
Baichuan-13B-Chat | 40.9 | 60.9 | 48.8 | 59.0 | 52.1 |
說明:我們采用了 MMLU 官方的評測方案。
Model 5-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average |
---|---|---|---|---|---|---|
Baichuan-7B | 34.4 | 47.5 | 47.6 | 46.6 | 44.3 | 44.0 |
Vicuna-13B | 31.8 | 36.2 | 37.6 | 39.5 | 34.3 | 36.3 |
Chinese-Alpaca-Plus-13B | 29.8 | 33.4 | 33.2 | 37.9 | 32.1 | 33.4 |
Chinese-LLaMA-Plus-13B | 28.1 | 33.1 | 35.4 | 35.1 | 33.5 | 33.0 |
Ziya-LLaMA-13B-Pretrain | 29.0 | 30.7 | 33.8 | 34.4 | 31.9 | 32.1 |
LLaMA-13B | 29.2 | 30.8 | 31.6 | 33.0 | 30.5 | 31.2 |
moss-moon-003-base (16B) | 27.2 | 30.4 | 28.8 | 32.6 | 28.7 | 29.6 |
Baichuan-13B-Base | 41.7 | 61.1 | 59.8 | 59.0 | 56.4 | 55.3 |
Baichuan-13B-Chat | 42.8 | 62.6 | 59.7 | 59.0 | 56.1 | 55.8 |
說明:CMMLU 是一個綜合性的中文評估基準,專門用于評估語言模型在中文語境下的知識和推理能力。我們采用了其官方的評測方案。