baichuan-7B

baichuan-7B是由百川智能開發(fā)的一個(gè)開源的大規(guī)模預(yù)訓(xùn)練模型?；赥ransformer結(jié)構(gòu)，在大約1.2萬(wàn)億tokens上訓(xùn)練的70億參數(shù)模型，支持中英雙語(yǔ)，上下文窗口長(zhǎng)度為4096。在標(biāo)準(zhǔn)的中文和英文權(quán)威benchmark（C-EVAL/MMLU）上均取得同尺寸最好的效果。

如果希望使用baichuan-7B（如進(jìn)行推理、Finetune等），我們推薦使用配套代碼庫(kù)baichuan-7B。

baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. Based on the Transformer architecture, it is a model with 7 billion parameters trained on approximately 1.2 trillion tokens. It supports both Chinese and English, with a context window length of 4096. It achieves the best performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU).

If you wish to use baichuan-7B (for inference, finetuning, etc.), we recommend using the accompanying code library baichuan-7B.

Why use baichuan-7B

在同尺寸模型中baichuan-7B達(dá)到了目前SOTA的水平，參考下面MMLU指標(biāo)
baichuan-7B使用自有的中英文雙語(yǔ)語(yǔ)料進(jìn)行訓(xùn)練，在中文上進(jìn)行優(yōu)化，在C-Eval達(dá)到SOTA水平
不同于LLaMA完全禁止商業(yè)使用，baichuan-7B使用更寬松的開源協(xié)議，允許用于商業(yè)目的
Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
Unlike LLaMA, which completely prohibits commercial use, baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.

Model Details

Model Description

Developed by: 百川智能(Baichuan Intelligent Technology)
Email: opensource@baichuan-inc.com
Language(s) (NLP): Chinese/English
License: baichuan-7B License

Model Sources

整體模型基于標(biāo)準(zhǔn)的Transformer結(jié)構(gòu)，我們采用了和LLaMA一樣的模型設(shè)計(jì)

Position Embedding：采用rotary-embedding，是現(xiàn)階段被大多數(shù)模型采用的位置編碼方案，具有很好的外推性。
Feedforward Layer：采用SwiGLU，F(xiàn)eedforward變化為(8/3)倍的隱含層大小，即11008。
Layer Normalization: 基于RMSNorm的Pre-Normalization。

具體參數(shù)和見下表

Hyperparameter	Value
n_parameters	7000559616
n_layers	32
n_heads	32
d_model	4096
vocab size	64000
sequence length	4096

The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:

Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
Layer Normalization: Pre-Normalization based on RMSNorm.

The specific parameters are as follows:

Hyperparameter	Value
n_parameters	7000559616
n_layers	32
n_heads	32
d_model	4096
vocab size	64000
sequence length	4096

Uses

示例代碼

from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
text_generation_zh  = pipeline(task=Tasks.text_generation, model='baichuan-inc/baichuan-7B', device_map='auto',model_revision='v1.0.5')
text_generation_zh._model_prepare = True
result_zh = text_generation_zh('今天天氣是真的', min_length=10, max_length=512, num_beams=3,temperature=0.8,do_sample=False, early_stopping=True,top_k=50,top_p=0.8, repetition_penalty=1.2, length_penalty=1.2, no_repeat_ngram_size=6)
print(result_zh)

Downstream Use

我們同時(shí)開源出了和本模型配套的訓(xùn)練代碼，允許進(jìn)行高效的Finetune用于下游任務(wù)，具體參見baichuan-7B。

We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to baichuan-7B.

Out-of-Scope Use

在沒有充分評(píng)估風(fēng)險(xiǎn)和采取緩解措施的情況下投入生產(chǎn)使用；任何可能被視為不負(fù)責(zé)任或有害的使用案例。

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

Bias, Risks, and Limitations

baichuan-7B可能會(huì)產(chǎn)生事實(shí)上不正確的輸出，不應(yīng)依賴它產(chǎn)生事實(shí)上準(zhǔn)確的信息。baichuan-7B是在各種公共數(shù)據(jù)集上進(jìn)行訓(xùn)練的。盡管我們已經(jīng)做出了巨大的努力來(lái)清洗預(yù)訓(xùn)練數(shù)據(jù)，但這個(gè)模型可能會(huì)生成淫穢、偏見或其他冒犯性的輸出。

baichuan-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. baichuan-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Training Details

訓(xùn)練具體設(shè)置參見baichuan-7B。

For specific training settings, please refer to baichuan-7B.

Evaluation

中文評(píng)測(cè)

C-Eval

CEval數(shù)據(jù)集是一個(gè)全面的中文基礎(chǔ)模型評(píng)測(cè)數(shù)據(jù)集，涵蓋了52個(gè)學(xué)科和四個(gè)難度的級(jí)別。我們使用該數(shù)據(jù)集的dev集作為few-shot的來(lái)源，在test集上進(jìn)行了5-shot測(cè)試。

Model 5-shot	Average	Avg(Hard)	STEM	Social Sciences	Humanities	Others
GPT-4	68.7	54.9	67.1	77.6	64.5	67.8
ChatGPT	54.4	41.4	52.9	61.8	50.9	53.6
Claude-v1.3	54.2	39.0	51.9	61.7	52.1	53.7
Claude-instant-v1.0	45.9	35.5	43.1	53.8	44.2	45.4
moss-moon-003-base (16B)	27.4	24.5	27.0	29.1	27.2	26.9
Ziya-LLaMA-13B-pretrain	30.2	22.7	27.7	34.4	32.0	28.9
LLaMA-7B-hf	27.1	25.9	27.1	26.8	27.9	26.3
ChatGLM-6B	34.5	23.1	30.4	39.6	37.4	34.5
Falcon-7B	25.8	24.3	25.8	26.0	25.8	25.6
Open-LLaMA-v2-pretrain (7B)	24.0	22.5	23.1	25.3	25.2	23.2
TigerBot-7B-base	25.7	27.0	27.3	24.7	23.4	26.1
Aquila-7B^*	25.5	25.2	25.6	24.6	25.2	26.6
BLOOM-7B	22.8	20.2	21.8	23.3	23.9	23.3
BLOOMZ-7B	35.7	25.8	31.3	43.5	36.6	35.6
baichuan-7B	42.8	31.5	38.2	52.0	46.2	39.3

Gaokao

Gaokao 是一個(gè)以中國(guó)高考題作為評(píng)測(cè)大語(yǔ)言模型能力的數(shù)據(jù)集，用以評(píng)估模型的語(yǔ)言能力和邏輯推理能力。
我們只保留了其中的單項(xiàng)選擇題，并對(duì)所有模型進(jìn)行統(tǒng)一5-shot測(cè)試。

以下是測(cè)試的結(jié)果。

Model	Average
Open-LLaMA-v2-pretrain	21.41
Ziya-LLaMA-13B-pretrain	23.17
Falcon-7B	23.98
TigerBot-7B-base	25.94
LLaMA-7B	27.81
ChatGLM-6B	21.41
BLOOM-7B	26.96
BLOOMZ-7B	28.72
Aquila-7B^*	24.39
baichuan-7B	36.24

AGIEval

AGIEval 旨在評(píng)估模型的認(rèn)知和解決問(wèn)題相關(guān)的任務(wù)中的一般能力。
我們只保留了其中的四選一單項(xiàng)選擇題，隨機(jī)劃分后對(duì)所有模型進(jìn)行了統(tǒng)一5-shot測(cè)試。

Model	Average
Open-LLaMA-v2-pretrain	23.49
Ziya-LLaMA-13B-pretrain	27.64
Falcon-7B	27.18
TigerBot-7B-base	25.19
LLaMA-7B	28.17
ChatGLM-6B	23.49
BLOOM-7B	26.55
BLOOMZ-7B	30.27
Aquila-7B^*	25.58
baichuan-7B	34.44

^*其中Aquila模型來(lái)源于智源官方網(wǎng)站，僅做參考

English Leaderboard

In addition to Chinese, we also tested the model’s performance in English.

MMLU

MMLU is an English evaluation dataset that includes 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc. The difficulty ranges from high school level to expert level, making it a mainstream LLM evaluation dataset.

We adopted the open-source evaluation scheme, and the final 5-shot results are as follows:

Model	Humanities	Social Sciences	STEM	Other	Average
LLaMA-7B²	34.0	38.3	30.5	38.1	35.1
Falcon-7B¹	-	-	-	-	35.0
mpt-7B¹	-	-	-	-	35.6
ChatGLM-6B⁰	35.4	41.0	31.3	40.5	36.9
BLOOM 7B⁰	25.0	24.4	26.5	26.4	25.5
BLOOMZ 7B⁰	31.3	42.1	34.4	39.0	36.1
moss-moon-003-base (16B)⁰	24.2	22.8	22.4	24.4	23.6
moss-moon-003-sft (16B)⁰	30.5	33.8	29.3	34.4	31.9
baichuan-7B⁰	38.4	48.9	35.6	48.1	42.3

The superscript in the Model column indicates the source of the results.

0:reimplemented
1:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
2:https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

加入我們的討論群

baichuan-7B討論群

五月天成人小说,中文字幕亚洲欧美专区,久久妇女,亚洲伊人久久大香线蕉综合,日日碰狠狠添天天爽超碰97