许可证:apache-2.0
语言:
- 法语
- 意大利语
- 德语
- 西班牙语
- 英语
标签:
- 混合专家系统
Mixtral-8x22B 模型卡
Mistral AI 终于将权重发布至 官方 Mistral AI 组织,包含基础模型和指令调优版本。
mistralai/Mixtral-8x22B-v0.1 和 mistralai/Mixtral-8x22B-Instruct-v0.1
HuggingFace 团队将此仓库克隆至官方新仓库 mistral-community/Mixtral-8x22B-v0.1,您也可以从该处下载。
感谢 HF 团队标注我的贡献!
另外,这里有一首非常可爱的音乐!owo...
使用此脚本转换为 HuggingFace Transformers 格式。
Mixtral-8x22B 大型语言模型(LLM)是一个预训练的生成式稀疏混合专家模型。
运行模型
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "你好,我的名字是"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
默认情况下,transformers 会以全精度加载模型。因此,您可能希望通过 HF 生态系统提供的优化进一步降低运行模型的内存需求:
半精度
注意 float16
精度仅适用于 GPU 设备
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(0)
text = "你好,我的名字是"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用 bitsandbytes
降低精度(8位和4位)
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
text = "你好,我的名字是"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用 Flash Attention 2 加载模型
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True)
text = "你好,我的名字是"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
注意
Mixtral-8x22B-v0.1 是一个预训练的基础模型,因此不包含任何内容审查机制。
Mistral AI 团队
Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall.