Baichuan-13B-Base

介紹

Baichuan-13B-Base為Baichuan-13B系列模型中的預(yù)訓(xùn)練版本，經(jīng)過(guò)對(duì)齊后的模型可見(jiàn)Baichuan-13B-Chat。

Baichuan-13B 是由百川智能繼 Baichuan-7B 之后開(kāi)發(fā)的包含 130 億參數(shù)的開(kāi)源可商用的大規(guī)模語(yǔ)言模型，在權(quán)威的中文和英文 benchmark 上均取得同尺寸最好的效果。本次發(fā)布包含有預(yù)訓(xùn)練 (Baichuan-13B-Base) 和對(duì)齊 (Baichuan-13B-Chat) 兩個(gè)版本。Baichuan-13B 有如下幾個(gè)特點(diǎn)：

更大尺寸、更多數(shù)據(jù)：Baichuan-13B 在 Baichuan-7B 的基礎(chǔ)上進(jìn)一步擴(kuò)大參數(shù)量到 130 億，并且在高質(zhì)量的語(yǔ)料上訓(xùn)練了 1.4 萬(wàn)億 tokens，超過(guò) LLaMA-13B 40%，是當(dāng)前開(kāi)源 13B 尺寸下訓(xùn)練數(shù)據(jù)量最多的模型。支持中英雙語(yǔ)，使用 ALiBi 位置編碼，上下文窗口長(zhǎng)度為 4096。
同時(shí)開(kāi)源預(yù)訓(xùn)練和對(duì)齊模型：預(yù)訓(xùn)練模型是適用開(kāi)發(fā)者的”基座“，而廣大普通用戶對(duì)有對(duì)話功能的對(duì)齊模型具有更強(qiáng)的需求。因此本次開(kāi)源我們同時(shí)發(fā)布了對(duì)齊模型（Baichuan-13B-Chat），具有很強(qiáng)的對(duì)話能力，開(kāi)箱即用，幾行代碼即可簡(jiǎn)單的部署。
更高效的推理：為了支持更廣大用戶的使用，我們本次同時(shí)開(kāi)源了 int8 和 int4 的量化版本，相對(duì)非量化版本在幾乎沒(méi)有效果損失的情況下大大降低了部署的機(jī)器資源門檻，可以部署在如 Nvidia 3090 這樣的消費(fèi)級(jí)顯卡上。
開(kāi)源免費(fèi)可商用：Baichuan-13B 不僅對(duì)學(xué)術(shù)研究完全開(kāi)放，開(kāi)發(fā)者也僅需郵件申請(qǐng)并獲得官方商用許可后，即可以免費(fèi)商用。

Baichuan-13B-Base is the pre-training version in the Baichuan-13B series of models, and the aligned model can be found at Baichuan-13B-Chat.

Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following Baichuan-7B. With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:

Larger size, more data: Baichuan-13B further expands the parameter volume to 13 billion based on Baichuan-7B, and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096.
Open-source pre-training and alignment models simultaneously: The pre-training model is a “base” suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code.
More efficient inference: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the Nvidia 3090 with almost no performance loss.
Open-source, free, and commercially usable: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email.

模型詳情

模型描述

Developed by: 百川智能(Baichuan Intelligent Technology)
Email: opensource@baichuan-inc.com
Language(s) (NLP): Chinese/English
License: 【Community License for Baichuan-13B Model】(ZH|
EN)

商業(yè)用途(For commercial use): 請(qǐng)通過(guò)上述Email聯(lián)系申請(qǐng)書(shū)面授權(quán)。(Contact us via Email above to apply for written authorization.)

模型結(jié)構(gòu)

整體模型基于Baichuan-7B，為了獲得更好的推理性能，Baichuan-13B 使用了 ALiBi 線性偏置技術(shù)，相對(duì)于 Rotary Embedding 計(jì)算量更小，對(duì)推理性能有顯著提升；與標(biāo)準(zhǔn)的 LLaMA-13B 相比，生成 2000 個(gè) tokens 的平均推理速度 (tokens/s)，實(shí)測(cè)提升 31.6%：

Model	tokens/s
LLaMA-13B	19.4
Baichuan-13B	25.4

具體參數(shù)和見(jiàn)下表

模型名稱	隱含層維度	層數(shù)	頭數(shù)	詞表大小	總參數(shù)量	訓(xùn)練數(shù)據(jù)（tokens）	位置編碼	最大長(zhǎng)度
Baichuan-7B	4,096	32	32	64,000	7,000,559,616	1.2萬(wàn)億	RoPE	4,096
Baichuan-13B	5,120	40	40	64,000	13,264,901,120	1.4萬(wàn)億	ALiBi	4,096

The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:

Model	tokens/s
LLaMA-13B	19.4
Baichuan-13B	25.4

The specific parameters are as follows:

Model Name	Hidden Size	Num Layers	Num Attention Heads	Vocab Size	Total Params	Training Dats（tokens）	Position Embedding	Max Length
Baichuan-7B	4,096	32	32	64,000	7,000,559,616	1.2萬(wàn)億	RoPE	4,096
Baichuan-13B	5,120	40	40	64,000	13,264,901,120	1.4萬(wàn)億	ALiBi	4,096

使用

示例代碼

from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
import torch
from modelscope import snapshot_download, Model
model_dir = snapshot_download("baichuan-inc/Baichuan-13B-Base", revision='v1.0.1')
model = Model.from_pretrained(model_dir, device_map="balanced", trust_remote_code=True, torch_dtype=torch.float16)
text_generation_zh  = pipeline(task=Tasks.text_generation, model=model)
text_generation_zh._model_prepare = True
result_zh = text_generation_zh('今天天氣是真的', min_length=10, max_length=512, num_beams=3,temperature=0.8,do_sample=False, early_stopping=True,top_k=50,top_p=0.8, repetition_penalty=1.2, length_penalty=1.2, no_repeat_ngram_size=6)
print(result_zh)

SFT

代碼鏈接: https://github.com/modelscope/swift/tree/main/examples/pytorch/llm

支持的sft方法: lora, qlora, 全參數(shù)微調(diào), …
支持的模型: baichuan-7b, baichuan-13b
支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpoint, 梯度累加, 支持推送modelscope hub, 支持自定義數(shù)據(jù)集, 兼容notebook, …

使用qlora sft baichuan-13b的腳本 (需要11G顯存)

git clone https://github.com/modelscope/swift.git
cd swift/examples/pytorch/llm

CUDA_VISIBLE_DEVICES=0 \
python llm_sft.py \
    --model_type baichuan-13b \
    --sft_type lora \
    --output_dir runs \
    --dataset alpaca-en,alpaca-zh \
    --dataset_sample 20000 \
    --max_length 1024 \
    --quantization_bit 4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.1 \
    --batch_size 1 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 10 \

下游使用

我們同時(shí)開(kāi)源出了和本模型配套的訓(xùn)練代碼，允許進(jìn)行高效的Finetune用于下游任務(wù)，具體參見(jiàn)Baichuan-13B。

We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to Baichuan-13B.

免責(zé)聲明

我們?cè)诖寺暶鳎覀兊拈_(kāi)發(fā)團(tuán)隊(duì)并未基于 Baichuan-13B 模型開(kāi)發(fā)任何應(yīng)用，無(wú)論是在 iOS、Android、網(wǎng)頁(yè)或任何其他平臺(tái)。我們強(qiáng)烈呼吁所有使用者，不要利用 Baichuan-13B 模型進(jìn)行任何危害國(guó)家社會(huì)安全或違法的活動(dòng)。另外，我們也要求使用者不要將 Baichuan-13B 模型用于未經(jīng)適當(dāng)安全審查和備案的互聯(lián)網(wǎng)服務(wù)。我們希望所有的使用者都能遵守這個(gè)原則，確?？萍嫉陌l(fā)展能在規(guī)范和合法的環(huán)境下進(jìn)行。

我們已經(jīng)盡我們所能，來(lái)確保模型訓(xùn)練過(guò)程中使用的數(shù)據(jù)的合規(guī)性。然而，盡管我們已經(jīng)做出了巨大的努力，但由于模型和數(shù)據(jù)的復(fù)雜性，仍有可能存在一些無(wú)法預(yù)見(jiàn)的問(wèn)題。因此，如果由于使用 Baichuan-13B 開(kāi)源模型而導(dǎo)致的任何問(wèn)題，包括但不限于數(shù)據(jù)安全問(wèn)題、公共輿論風(fēng)險(xiǎn)，或模型被誤導(dǎo)、濫用、傳播或不當(dāng)利用所帶來(lái)的任何風(fēng)險(xiǎn)和問(wèn)題，我們將不承擔(dān)任何責(zé)任。

We hereby declare that our development team has not developed any applications based on the Baichuan-13B model, whether on iOS, Android, the web, or any other platform. We strongly urge all users not to use the Baichuan-13B model for any activities that harm national social security or are illegal. In addition, we also ask users not to use the Baichuan-13B model for internet services that have not undergone appropriate security review and filing. We hope that all users will adhere to this principle to ensure that technological development takes place in a regulated and legal environment.

We have done our utmost to ensure the compliance of the data used in the model training process. However, despite our great efforts, due to the complexity of the model and data, there may still be some unforeseen issues. Therefore, we will not take any responsibility for any issues arising from the use of the Baichuan-13B open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, misused, disseminated, or improperly exploited.

訓(xùn)練詳情

訓(xùn)練具體設(shè)置參見(jiàn)Baichuan-13B。

For specific training settings, please refer to Baichuan-13B.

測(cè)評(píng)結(jié)果

C-Eval

Model 5-shot	STEM	Social Sciences	Humanities	Others	Average
Baichuan-7B	38.2	52.0	46.2	39.3	42.8
Chinese-Alpaca-Plus-13B	35.2	45.6	40.0	38.2	38.8
Vicuna-13B	30.5	38.2	32.5	32.5	32.8
Chinese-LLaMA-Plus-13B	30.3	38.0	32.9	29.1	32.1
Ziya-LLaMA-13B-Pretrain	27.6	34.4	32.0	28.6	30.0
LLaMA-13B	27.0	33.6	27.7	27.6	28.5
moss-moon-003-base (16B)	27.0	29.1	27.2	26.9	27.4
Baichuan-13B-Base	45.9	63.5	57.2	49.3	52.4
Baichuan-13B-Chat	43.7	64.6	56.2	49.2	51.5

MMLU

Model 5-shot	STEM	Social Sciences	Humanities	Others	Average
Vicuna-13B	40.4	60.5	49.5	58.4	52.0
LLaMA-13B	36.1	53.0	44.0	52.8	46.3
Chinese-Alpaca-Plus-13B	36.9	48.9	40.5	50.5	43.9
Ziya-LLaMA-13B-Pretrain	35.6	47.6	40.1	49.4	42.9
Baichuan-7B	35.6	48.9	38.4	48.1	42.3
Chinese-LLaMA-Plus-13B	33.1	42.8	37.0	44.6	39.2
moss-moon-003-base (16B)	22.4	22.8	24.2	24.4	23.6
Baichuan-13B-Base	41.6	60.9	47.4	58.5	51.6
Baichuan-13B-Chat	40.9	60.9	48.8	59.0	52.1

說(shuō)明：我們采用了 MMLU 官方的評(píng)測(cè)方案。

CMMLU

Model 5-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
Baichuan-7B	34.4	47.5	47.6	46.6	44.3	44.0
Vicuna-13B	31.8	36.2	37.6	39.5	34.3	36.3
Chinese-Alpaca-Plus-13B	29.8	33.4	33.2	37.9	32.1	33.4
Chinese-LLaMA-Plus-13B	28.1	33.1	35.4	35.1	33.5	33.0
Ziya-LLaMA-13B-Pretrain	29.0	30.7	33.8	34.4	31.9	32.1
LLaMA-13B	29.2	30.8	31.6	33.0	30.5	31.2
moss-moon-003-base (16B)	27.2	30.4	28.8	32.6	28.7	29.6
Baichuan-13B-Base	41.7	61.1	59.8	59.0	56.4	55.3
Baichuan-13B-Chat	42.8	62.6	59.7	59.0	56.1	55.8

說(shuō)明：CMMLU 是一個(gè)綜合性的中文評(píng)估基準(zhǔn)，專門用于評(píng)估語(yǔ)言模型在中文語(yǔ)境下的知識(shí)和推理能力。我們采用了其官方的評(píng)測(cè)方案。

加入我們的討論群

Baichuan討論群

五月天成人小说,中文字幕亚洲欧美专区,久久妇女,亚洲伊人久久大香线蕉综合,日日碰狠狠添天天爽超碰97