五月天成人小说,中文字幕亚洲欧美专区,久久妇女,亚洲伊人久久大香线蕉综合,日日碰狠狠添天天爽超碰97

姜子牙-LLaMA-13B-v1
姜子牙通用大模型V1是基于LLaMa的130億參數(shù)的大規(guī)模預(yù)訓(xùn)練模型,具備翻譯,編程,文本分類,信息抽取,摘要,文案生成,常識問答和數(shù)學(xué)計(jì)算等能力。目前姜子牙通用大模型已完成大規(guī)模預(yù)訓(xùn)練、多任務(wù)有監(jiān)督微調(diào)和人類反饋學(xué)習(xí)三階段的訓(xùn)練過程。
  • 模型資訊
  • 模型資料

Ziya-LLaMA-13B-v1

姜子牙系列模型

簡介 Brief Introduction

姜子牙通用大模型V1是基于LLaMa的130億參數(shù)的大規(guī)模預(yù)訓(xùn)練模型,具備翻譯,編程,文本分類,信息抽取,摘要,文案生成,常識問答和數(shù)學(xué)計(jì)算等能力。目前姜子牙通用大模型已完成大規(guī)模預(yù)訓(xùn)練、多任務(wù)有監(jiān)督微調(diào)和人類反饋學(xué)習(xí)三階段的訓(xùn)練過程。

The Ziya-LLaMA-13B-v1 is a large-scale pre-trained model based on LLaMA with 13 billion parameters. It has the ability to perform tasks such as translation, programming, text classification, information extraction, summarization, copywriting, common sense Q&A, and mathematical calculation. The Ziya-LLaMA-13B-v1 has undergone three stages of training: large-scale continual pre-training (PT), multi-task supervised fine-tuning (SFT), and human feedback learning (RM, PPO).

模型分類 Model Taxonomy

需求 Demand 任務(wù) Task 系列 Series 模型 Model 參數(shù) Parameter 額外 Extra
通用 General AGI模型 姜子牙 Ziya LLaMA 13B English&Chinese

模型信息 Model Information

繼續(xù)預(yù)訓(xùn)練 Continual pretraining

原始數(shù)據(jù)包含英文和中文,其中英文數(shù)據(jù)來自openwebtext、Books、Wikipedia和Code,中文數(shù)據(jù)來自清洗后的悟道數(shù)據(jù)集、自建的中文數(shù)據(jù)集。在對原始數(shù)據(jù)進(jìn)行去重、模型打分、數(shù)據(jù)分桶、規(guī)則過濾、敏感主題過濾和數(shù)據(jù)評估后,最終得到125B tokens的有效數(shù)據(jù)。

為了解決LLaMA原生分詞對中文編解碼效率低下的問題,我們在LLaMA詞表的基礎(chǔ)上增加了7k+個常見中文字,通過和LLaMA原生的詞表去重,最終得到一個39410大小的詞表,并通過復(fù)用Transformers里L(fēng)lamaTokenizer來實(shí)現(xiàn)了這一效果。

在增量訓(xùn)練過程中,我們使用了160張40GB的A100,采用2.6M tokens的訓(xùn)練集樣本數(shù)量和FP 16的混合精度,吞吐量達(dá)到118 TFLOP per GPU per second。因此我們能夠在8天的時(shí)間里在原生的LLaMA-13B模型基礎(chǔ)上,增量訓(xùn)練110B tokens的數(shù)據(jù)。

訓(xùn)練期間,雖然遇到了機(jī)器宕機(jī)、底層框架bug、loss spike等各種問題,但我們通過快速調(diào)整,保證了增量訓(xùn)練的穩(wěn)定性。我們也放出訓(xùn)練過程的loss曲線,讓大家了解可能出現(xiàn)的問題。

The original data contains both English and Chinese, with English data from openwebtext, Books, Wikipedia, and Code, and Chinese data from the cleaned Wudao dataset and self-built Chinese dataset. After deduplication, model scoring, data bucketing, rule filtering, sensitive topic filtering, and data evaluation, we finally obtained 125 billion tokens of valid data.

To address the issue of low efficiency in Chinese encoding and decoding caused by the native word segmentation of LLaMa, we added 8,000 commonly used Chinese characters to the LLaMa vocabulary. By removing duplicates with the original LLaMa vocabulary, we finally obtained a vocabulary of size 39,410. We achieved this by reusing the LlamaTokenizer in Transformers.

During the incremental training process, we used 160 A100s with a total of 40GB memory, using a training dataset with 2.6 million tokens and mixed precision of FP16. The throughput reached 118 TFLOP per GPU per second. As a result, we were able to incrementally train 110 billion tokens of data on top of the native LLaMa-13B model in just 8 days.

Throughout the training process, we encountered various issues such as machine crashes, underlying framework bugs, and loss spikes. However, we ensured the stability of the incremental training by making rapid adjustments. We have also released the loss curve during the training process to help everyone understand the potential issues that may arise.

多任務(wù)有監(jiān)督微調(diào) Supervised finetuning

在多任務(wù)有監(jiān)督微調(diào)階段,采用了課程學(xué)習(xí)(curiculum learning)和增量訓(xùn)練(continual learning)的策略,用大模型輔助劃分已有的數(shù)據(jù)難度,然后通過“Easy To Hard”的方式,分多個階段進(jìn)行SFT訓(xùn)練。

SFT訓(xùn)練數(shù)據(jù)包含多個高質(zhì)量的數(shù)據(jù)集,均經(jīng)過人工篩選和校驗(yàn):

  • Self-Instruct構(gòu)造的數(shù)據(jù)(約2M):BELLE、Alpaca、Alpaca-GPT4等多個數(shù)據(jù)集
  • 內(nèi)部收集Code數(shù)據(jù)(300K):包含leetcode、多種Code任務(wù)形式
  • 內(nèi)部收集推理/邏輯相關(guān)數(shù)據(jù)(500K):推理、申論、數(shù)學(xué)應(yīng)用題、數(shù)值計(jì)算等
  • 中英平行語料(2M):中英互譯語料、COT類型翻譯語料、古文翻譯語料等
  • 多輪對話語料(500K):Self-Instruct生成、任務(wù)型多輪對話、Role-Playing型多輪對話等

During the supervised fine-tuning (SFT) phase of multitask learning, we used a strategy of curriculum learning and incremental training. We used the large model to assist in partitioning the existing data by difficulty and then conducted SFT training in multiple stages using the “easy to hard” approach.

The SFT training data consists of multiple high-quality datasets that have been manually selected and verified, including approximately 2 million samples from datasets such as BELLE, Alpaca, and Alpaca-GPT4, 300,000 samples of internally collected code data including LeetCode and various code tasks, 500,000 samples of internally collected inference/logic-related data such as reasoning, argumentative essays, mathematical application questions, and numerical calculations, 2 million samples of Chinese-English parallel corpora including translation, COT-type translation, and classical Chinese translation, and 500,000 samples of multi-turn dialogue corpora including self-instructed generation, task-oriented multi-turn dialogue, and role-playing multi-turn dialogue.

人類反饋學(xué)習(xí) Human-Feedback training

為了進(jìn)一步提升模型的綜合表現(xiàn),使其能夠充分理解人類意圖、減少“幻覺”和不安全的輸出,基于指令微調(diào)后的模型,進(jìn)行了人類反饋訓(xùn)練(Human-Feedback Training,HFT)。在訓(xùn)練中,我們采用了以人類反饋強(qiáng)化學(xué)習(xí)(RM、PPO)為主,結(jié)合多種其他手段聯(lián)合訓(xùn)練的方法,手段包括人類反饋微調(diào)(Human-Feedback Fine-tuning,HFFT)、后見鏈微調(diào)(Chain-of-Hindsight Fine-tuning,COHFT)、AI反饋(AI Feedback)和基于規(guī)則的獎勵系統(tǒng)(Rule-based Reward System,RBRS)等,用來彌補(bǔ)PPO方法的短板,加速訓(xùn)練。

我們在內(nèi)部自研的框架上實(shí)現(xiàn)了HFT的訓(xùn)練流程,該框架可以利用最少8張40G的A100顯卡完成Ziya-LLaMA-13B-v1的全參數(shù)訓(xùn)練。在PPO訓(xùn)練中,我們沒有限制生成樣本的長度,以確保長文本任務(wù)的獎勵準(zhǔn)確性。每次訓(xùn)練的總經(jīng)驗(yàn)池尺寸超過100k樣本,確保了訓(xùn)練的充分性。

To further improve the overall performance of the model, enabling it to fully understand human intentions, reduce “hallucinations” and unsafe outputs, we conducted Human-Feedback Training (HFT) based on the model fine-tuned with instructions. In the training process, we used a variety of methods, including human feedback reinforcement learning (RM, PPO), combined with other methods such as Human-Feedback Fine-tuning (HFFT), Chain-of-Hindsight Fine-tuning (COHFT), AI feedback, and Rule-based Reward System (RBRS), to supplement the shortcomings of the PPO method and accelerate training.

We implemented the HFT training process on an internally developed framework, which can use a minimum of 8 40GB A100 GPUs to complete the full parameter training of Ziya-LLaMA-13B-v1. In the PPO training, we did not limit the length of the generated samples to ensure the accuracy of rewards for long-text tasks. The total experience pool size for each training exceeded 100k samples, ensuring the sufficiency of the training.

效果評估 Performance

示例代碼

from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
pipe = pipeline(task=Tasks.text_generation, model='Fengshenbang/Ziya-LLaMA-13B-v1', model_revision='v1.0.7', device_map='auto')
query="幫我寫一份去西安的旅游計(jì)劃"
inputs = '<human>:' + query.strip() + '\n<bot>:'
result = pipe(inputs, max_new_tokens=1024, do_sample=True, top_p=0.85, temperature=1.0, repetition_penalty=1., eos_token_id=2, bos_token_id=1, pad_token_id=0)
print(result['text'])

引用 Citation

如果您在您的工作中使用了我們的模型,可以引用我們的論文

If you are using the resource for your work, please cite the our paper:

@article{fengshenbang,
  author    = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
  title     = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
  journal   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}
}

You can also cite our website:

歡迎引用我們的網(wǎng)站:

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}