XGLM-4.5B开源多语言模型 - 45亿参数支持134种语言交流应用

首页

Xglm 4.5B

由 facebook 开发

XGLM-4.5B 是一个多语言自回归语言模型，拥有45亿参数，基于134种语言的平衡语料库训练而成。

大型语言模型

Transformers

支持多种语言开源协议:MIT #多语言生成 #零样本推理 #45亿参数

下载量 78

发布时间 : 3/2/2022

模型简介

该模型是一个多语言自回归语言模型，支持多种语言的自然语言处理任务，特别适合少样本学习场景。

模型特点

多语言支持

支持134种语言的文本生成和理解。

少样本学习

在少样本学习场景下表现优异。

大规模参数

拥有45亿参数，具备强大的语言理解和生成能力。

模型能力

文本生成

语言理解

少样本学习

多语言处理

使用案例

自然语言处理

COPA任务

评估模型在合理替代选择任务上的表现。

在英语、中文和印地语示例中表现良好。

教育

多语言学习辅助

帮助学生学习多种语言的语法和词汇。

🚀 XGLM-4.5B

XGLM-4.5B 是一个多语言自回归语言模型（拥有 45 亿个参数），它在由 134 种不同语言组成的平衡语料库上进行训练。该模型在论文 Few-shot Learning with Multilingual Language Models 中被提出，论文作者包括 Xi Victoria Lin*、Todor Mihaylov、Mikel Artetxe、Tianlu Wang、Shuohui Chen、Daniel Simig、Myle Ott、Naman Goyal、Shruti Bhosale、Jingfei Du、Ramakanth Pasunuru、Sam Shleifer、Punit Singh Koura、Vishrav Chaudhary、Brian O'Horo、Jeff Wang、Luke Zettlemoyer、Zornitsa Kozareva、Mona Diab、Veselin Stoyanov、Xian Li*（* 同等贡献）。其原始实现发布在此仓库中。

✨ 主要特性

多语言支持：支持包括英语、俄语、中文、德语、西班牙语等 134 种语言。
大规模参数：拥有 45 亿个参数，具备强大的语言理解和生成能力。

📦 安装指南

文档未提及安装步骤，跳过此章节。

💻 使用示例

基础用法

以下代码展示了如何在合理替代选择（COPA）任务上对模型（GPT - 3 风格，零样本）进行评估，使用了英语、中文和印地语的示例。

import torch
import torch.nn.functional as F

from transformers import XGLMTokenizer, XGLMForCausalLM

tokenizer = XGLMTokenizer.from_pretrained("facebook/xglm-4.5B")
model = XGLMForCausalLM.from_pretrained("facebook/xglm-4.5B")

data_samples = {
    'en': [
        {
            "premise": "I wanted to conserve energy.",
            "choice1": "I swept the floor in the unoccupied room.",
            "choice2": "I shut off the light in the unoccupied room.",
            "question": "effect",
            "label": "1"
        },
        {
            "premise": "The flame on the candle went out.",
            "choice1": "I blew on the wick.",
            "choice2": "I put a match to the wick.",
            "question": "cause",
            "label": "0"
        }
    ],
    'zh': [
        {
            "premise": "我想节约能源。",
            "choice1": "我在空着的房间里扫了地板。",
            "choice2": "我把空房间里的灯关了。",
            "question": "effect",
            "label": "1"
        },
        {
            "premise": "蜡烛上的火焰熄灭了。",
            "choice1": "我吹灭了灯芯。",
            "choice2": "我把一根火柴放在灯芯上。",
            "question": "cause",
            "label": "0"
        }
    ],
    'hi': [
        {
            "premise": "M te vle konsève enèji.",
            "choice1": "Mwen te fin baleye chanm lib la.",
            "choice2": "Mwen te femen limyè nan chanm lib la.",
            "question": "effect",
            "label": "1"
        },
        {
            "premise": "Flam bouji a te etenn.",
            "choice1": "Mwen te soufle bouji a.",
            "choice2": "Mwen te limen mèch bouji a.",
            "question": "cause",
            "label": "0"
        }
    ]
}

def get_logprobs(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids, output_ids = inputs["input_ids"], inputs["input_ids"][:, 1:]
    outputs = model(**inputs, labels=input_ids)
    logits = outputs.logits
    logprobs = torch.gather(F.log_softmax(logits, dim=2), 2, output_ids.unsqueeze(2))
    return logprobs

# Zero-shot evaluation for the Choice of Plausible Alternatives (COPA) task.
# A return value of 0 indicates that the first alternative is more plausible,
# while 1 indicates that the second alternative is more plausible.
def COPA_eval(prompt, alternative1, alternative2):
    lprob1 = get_logprobs(prompt + "\n" + alternative1).sum()
    lprob2 = get_logprobs(prompt + "\n" + alternative2).sum()
    return 0 if lprob1 > lprob2 else 1

for lang in data_samples_long:
    for idx, example in enumerate(data_samples_long[lang]):
        predict = COPA_eval(example["premise"], example["choice1"], example["choice2"])
        print(f'{lang}-{idx}', predict, example['label'])
        
# en-0 1 1
# en-1 0 0
# zh-0 1 1
# zh-1 0 0
# hi-0 1 1
# hi-1 0 0