DistilBERT开源问答模型 - 免费部署，参数少速度快且性能出众！

首页

Distilbert Base Uncased Distilled Squad

由 distilbert 开发

DistilBERT是BERT的轻量级蒸馏版本，参数量减少40%，速度提升60%，在GLUE基准测试中保持BERT 95%以上的性能。本模型专为问答任务微调。

问答系统

Transformers

英语开源协议:Apache-2.0 #问答系统 #轻量级BERT #知识蒸馏

下载量 154.39k

发布时间 : 3/2/2022

模型简介

基于DistilBERT-base-uncased的微调模型，使用SQuAD v1.1数据集通过知识蒸馏训练，适用于英语问答任务。

模型特点

高效轻量

相比原始BERT模型，参数量减少40%，推理速度提升60%

高性能

在GLUE基准测试中保持BERT 95%以上的性能表现

问答优化

专门针对SQuAD问答任务进行微调，在SQuAD v1.1上达到86.9 F1分数

模型能力

抽取式问答

文本理解

答案定位

使用案例

问答系统

基于文档的问答

从给定文本中提取问题答案

在SQuAD v1.1数据集上达到86.9 F1分数

知识检索

从知识库中查找相关信息

🚀 DistilBERT基础无大小写区分微调SQuAD模型

DistilBERT基础无大小写区分微调SQuAD模型是基于知识蒸馏技术，在SQuAD v1.1数据集上对DistilBERT进行微调的模型。它在保持较高性能的同时，具有更小的参数规模和更快的推理速度，可用于英文问答任务。

🚀 快速开始

使用以下代码开始使用该模型：

基础用法

>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')

>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example     of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """

>>> result = question_answerer(question="What is a good example of a question answering dataset?",     context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)

Answer: 'SQuAD dataset', score: 0.4704, start: 147, end: 160

高级用法

PyTorch

from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

answer_start_index = torch.argmax(outputs.start_logits)
answer_end_index = torch.argmax(outputs.end_logits)

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

TensorFlow

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

✨ 主要特性

轻量高效：DistilBERT模型参数比 bert-base-uncased 少40%，运行速度快60%，同时在GLUE语言理解基准测试中保留了BERT超过95%的性能。
微调优化：该模型是 DistilBERT-base-uncased 的微调版本，在 SQuAD v1.1 上进行了知识蒸馏微调。

📚 详细文档

模型详情

DistilBERT模型在博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT 和论文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 中被提出。它是一个小型、快速、低成本且轻量级的Transformer模型，通过蒸馏BERT基础模型进行训练。

属性	详情
开发者	Hugging Face
模型类型	基于Transformer的语言模型
语言	英文
许可证	Apache 2.0
相关模型	DistilBERT-base-uncased
更多信息资源	- 有关Distil*（包括此模型的一类压缩模型）的更多信息，请参阅此仓库 - 有关知识蒸馏和训练过程的更多信息，请参阅 Sanh et al. (2019)

用途

该模型可用于问答任务。

滥用和超出范围使用

该模型不应被用于故意为人们创造敌对或排斥性的环境。此外，该模型并非用于生成事实性或真实的人物或事件描述，因此使用该模型生成此类内容超出了其能力范围。

风险、限制和偏差

⚠️ 重要提示

读者应注意，此模型生成的语言可能会让一些人感到不安或冒犯，并且可能会传播历史和当前的刻板印象。

大量研究已经探讨了语言模型的偏差和公平性问题（例如，参见 Sheng et al. (2021) 和 Bender et al. (2021)）。模型生成的预测可能包含针对受保护类别、身份特征以及敏感、社会和职业群体的令人不安和有害的刻板印象。

训练

训练数据

distilbert-base-uncased模型的训练数据描述如下：

DistilBERT在与BERT相同的数据上进行预训练，即 BookCorpus（一个包含11,038本未出版书籍的数据集）和英文维基百科（不包括列表、表格和标题）。

有关SQuAD v1.1数据集的更多信息，请参阅 SQuAD v1.1数据卡。

训练过程

预处理

更多详细信息请参阅 distilbert-base-uncased模型卡片。

预训练

更多详细信息请参阅 distilbert-base-uncased模型卡片。

评估

如模型仓库中所述：

该模型在 [SQuAD v1.1] 开发集上的F1分数达到86.9（相比之下，Bert bert-base-uncased版本的F1分数为88.5）。

环境影响

可以使用 Lacoste et al. (2019) 中提出的机器学习影响计算器来估算碳排放。以下是基于相关论文给出的硬件类型和使用时长。请注意，这些细节仅针对DistilBERT的训练，不包括在SQuAD上的微调。

属性	详情
硬件类型	8个16GB V100 GPU
使用时长	90小时
云服务提供商	未知
计算区域	未知
碳排放	未知

技术规格

有关建模架构、目标、计算基础设施和训练细节的详细信息，请参阅相关论文。

引用信息

@inproceedings{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  booktitle={NeurIPS EMC^2 Workshop},
  year={2019}
}

APA格式：

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.