2024年3月6日发(作者:做汤圆作文)
22个大型语言模型,
从Bert到GPT-4
大型语言模型(LLM)是人工智能领域机器学习技术在自然语言处理(NLP)方向上的产物,通常包含一个参数众多的神经元网络,使用大量的文本数据进行训练。这些模型已经开始展露出完成多种语言任务的能力,包括理解、编写、推理和对话等,并且有可能超越语言范围成为更多人工智能工具的有力的基础。[1]
本文收集了迄今为止公布的大型语言模型,共22个,并给出了其基本信息,基本技术参数和参考资源,详见下表。
序号
1
2
3
4
5
6
7
名称
BERT
GPT-2
GPT-3
GPT-Neo
GPT-J
发布日期
2018年10月
2019年2月
2020年5月
2021年3月
2021年6月
开发者
OpenAI
OpenAI
EleutherAI
EleutherAI
Microsoft
and
Nvidia
百度
Baidu
Anthropic
DeepMind
EleutherAI
DeepMind
Meta
Yandex
参数数据集
版权数量 大小
类型
(亿) (Tokens)
3.4
15
1750
27
60
5300
2600
520
12000
2800
1370
200
700
5400
1750
1000
5400
33亿个词
40GB
100亿
4990亿
825GB
4020亿
3386亿
4TB
4000亿
1.6万亿
3000亿
1.56万亿个词
2.81万亿
825GB
1.4万亿
7680亿
1800亿
1.7TB
385亿
A-1
A-2
B-1
A-2
A-1
C-1
A-1
C-2
C-2
C-2
C-2
A-1
C-2
C-2
A-4
A-1
C-2
参考
资源
[5]-[8]
[9][10]
[11][12]
[15][16]
[17][18]
[19][20]
[21][46]
[22][23]
[24][25]
[26][27]
[28][29]
[30]
[31][32]
[32][33]
[34][35]
[36]
[37][38]
Megatron-2021年10月
Turing NLG
2021年12月
2021年12月
2021年12月
2021年12月
2022年1月
2022年2月
2022年3月
2022年4月
2022年5月
2022年6月
2022年6月
Ernie 3.0
Titan
8 Claude
9 GLaM
10 Gopher
11 LaMDA
12
13
14
15
16
17
GPT-NeoX
Chinchilla
PaLM
OPT
YaLM 100B
Minerva
18 BLOOM
19
20
21
22
AlexaTM
ChatGPT
LLaMA
GPT-4
2022年7月
2022年11月
2022年11月
2023年2月
2023年3月
Hugging
Face
Amazon
OpenAI
Meta
OpenAI
1750
200
<1750
650
未知
3500亿
1.6TB
1.3万亿
未知
1.4万亿
未知
A-3
B-1
B-1
A-4
B-1
[39]
[40]-[42]
[12]-[14]
[43][44]
[45]
表1. 22个大型语言模型
参数说明
参数数量指模型中可以由训练过程不断更新的参数个数,代表着模型的学习潜力。使用参数数量进行对比的前提是这些模型具有相似的体系结构,并且使用同样的基本模块Transformer。同一名称的模型有时包含一系列不同规模的子模型,上表中统一选取最大的子模型的参数数量。
数据集大小是用于训练模型的数据集的大小,这些数据集都是未经压缩的文本数据。其大小有三种不同的计算方式,Token数、词个数和存储空间大小。其中Token是文本经预处理转化后生成的作为模型输入的数据的基本单元。这三种计算方式的粗略的换算关系如下:
1 Token ≈ 0.75 个词 ≈ 4 字节
数据集大小代表了模型学习的广度。
另一个重要的考量是模型的学习深度,但没有统一的可对比的指标,读者可以根据参考资源中的测试结果来做推测和对比。
版权类型分为以下几类:
A-1: 开源,Apach2.0协议[2];
A-2: 开源,MIT协议[3];
A-3: 开源,Responsible AI协议[4];
A-4: 开源,限于非商业研究,访问需要申请;
B-1: 私有,提供开放的基于Web的API;
C-1: 私有,提供受限的基于Web的访问;
C-2: 私有,无开放接口。
参考资源主要是论文,其次是官方发表的技术文档。如果是开源模型,我们会给出对应的Github仓库或者模型说明书的链接。
参考文献
[1]
Wikipedia. “Large language model”. . Retrieved 2023-03-28.
[2]
Apache. “Apache Licen, Version 2.0”. . Retrieved 2023-03-28..
[3]
Opensource Initiative. “The MIT Licen”. . Retrieved 2023-03-28.
[4]
Big Science. “The BigScience RAIL Licen”. . Retrieved 2023-03-28.
[5]
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding".
arXiv:1810.04805v2.
[6]
"Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language
Processing". Google AI Blog. Retrieved 2019-11-27.
[7]
Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We
Know About How BERT Works". Transactions of the Association for Computational
Linguistics. 8: 842–866.
arXiv:2002.12327
[8]
Hugging Face, Model Card for BERT ba model.
[9]
Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and
Sutskever, Ilya. (2019). "Language Models are Unsupervid Multitask Learners".
[10]
[11]
OpenAI. GPT-2 Model Card.
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared;
Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda;
Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon;
Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hes, Christopher; Chen,
Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner,
Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020).
"Language Models are Few-Shot Learners". arXiv:2005.14165
[12]
[13]
[14]
OpenAI, “API Reference”. . Retrieved 2023-03-28.
OpenAI (November 30, 2022). "ChatGPT: Optimizing Language Models for
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela
Dialogue". Retrieved December 5, 2022.
Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob
Hilton, Frar Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul
Christiano, Jan Leike, Ryan Lowe (Mar 2022). "Training language models to follow
instructions with human feedback".
arXiv:2203.02155.
[15]
Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster,
Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presr, Shawn; Leahy,
Connor (31 December 2020). "The Pile: An 800GB Datat of Diver Text for Language
Modeling". arXiv:2101.00027.
[16]
[17]
[18]
[19]
Eleuther AI. Github repository for GPT Neo.
Forefront Team. "GPT-J-6B: An Introduction to the Largest Open Source GPT Model |
Eleuther AI. GPT-J 6B Model Card.
Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train
Forefront". . Retrieved 2023-02-28.
Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language
Model". Microsoft Rearch..
[20]
Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari,
Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti,
Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia
(2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
Large-Scale Generative Language Model".
arXiv:2201.11990.
[21]
Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng,
Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang;
Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu,
Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang,
Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge
Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731.
[22]
[23]
[24]
[25]
Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General
Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022).
Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with
Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong
Language Assistant as a Laboratory for Alignment".
arXiv:2112.00861.
"Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073
GLaM". . Retrieved 2023-03-09.
Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam
Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie
Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc
V Le, Yonghui Wu, Zhifeng Chen, Claire Cui. "GLaM: Efficient Scaling of Language Models
with Mixture-of-Experts".
arXiv:2112.06905.
[26]
[27]
[28]
Jack Rae, Geoffrey Irving, Laura Weidinger.
"Language modelling at scale: Gopher,
Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022).
Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe,
ethical considerations, and retrieval". . Retrieved 20 March 2023.
"Training Compute-Optimal Large Language Models".
arXiv:2203.15556
Grounded, and High-Quality Dialog Models for Everything". .
Retrieved 2023-03-09.
[29]
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv
Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang
Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping
Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng
Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor
Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tule Doshi, Renelito Delos Santos, Toju Duke, Johnny
Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen
Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena
Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel
Bernstein, Ray Kurzweil, Blai Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, Quoc Le.
"LaMDA: Language Models for Dialog Applications".
arXiv:2201.08239.
[30]
[31]
[32]
Eleuther AI. Github repository for GPT Neox.
Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022).
Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April
"Training Compute-Optimal Large Language Models". arXiv:2203.15556
2022). "An empirical analysis of compute-optimal large language model training". Deepmind
Blog.
[33]
Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model
(PaLM): Scaling to 540 Billion Parameters for Breakthrough
Performance". . Retrieved 2023-03-09.
[34]
[35]
Susan Zhang, Mona Diab, Luke Zettlemoyer. "Democratizing access to large-scale
Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen,
language models with OPT-175B". . Retrieved 2023-03-28.
Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott,
Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang,
Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language
Models". arXiv:2205.01068.
[36]
[37]
Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-Lewkowycz, Aitor; Andreasn, Anders; Dohan, David; Dyer, Ethan; Michalewski,
22). Github Reposity for YaLM 100B. . Retrieved 2023-03-18.
Henryk; Ramash, Vinay; Slone, Ambro; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo;
Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving
Quantitative Reasoning Problems with Language Models". arXiv:2206.14858.
[38]
[39]
[40]
"Minerva: Solving Quantitative Reasoning Problems with Language
"bigscience/bloom · Hugging Face". . Retrieved 2023-03-28.
"20B-parameter Alexa model ts new marks in few-shot learning". Amazon Science. 2
Models". . Retrieved 20 March 2023
August 2022. Retrieved 2023-03-28.
[41]
Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022).
"AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
Model". arXiv:2208.01448.
[42]
[43]
[44]
"AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine
"Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
Learning Blog". . 17 November 2022. Retrieved 13 March 2023.
AI. 24 February 2023.
Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal
Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. “LLaMA:
Open and Efficient Foundation Language Models”.
arXiv:2302.13971.
[45]
[46]
"GPT-4 Technical Report". OpenAI. 2023. Retrieved March 14, 2023.
百度. ERNIE 3.0. . Retrieved 2023-03-28.
本文发布于:2024-03-06 16:34:17,感谢您对本站的认可!
本文链接:https://www.wtabcd.cn/zhishi/a/1709714058252992.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文word下载地址:22个大型语言模型(LLM),从Bert到GPT-4.doc
本文 PDF 下载地址:22个大型语言模型(LLM),从Bert到GPT-4.pdf
留言与评论(共有 0 条评论) |