模型简介
模型特点
模型能力
使用案例
base_model: microsoft/deberta-v3-small datasets:
- nyu-mll/glue
- aps/super_glue
- facebook/anli
- tasksource/babi_nli
- sick
- snli
- scitail
- hans
- alisawuffles/WANLI
- tasksource/recast
- sileod/probability_words_nli
- joey234/nan-nli
- pietrolesci/nli_fever
- pietrolesci/breaking_nli
- pietrolesci/conj_nli
- pietrolesci/fracas
- pietrolesci/dialogue_nli
- pietrolesci/mpe
- pietrolesci/dnc
- pietrolesci/recast_white
- pietrolesci/joci
- pietrolesci/robust_nli
- pietrolesci/robust_nli_is_sd
- pietrolesci/robust_nli_li_ts
- pietrolesci/gen_debiased_nli
- pietrolesci/add_one_rte
- tasksource/imppres
- hlgd
- paws
- medical_questions_pairs
- Anthropic/model-written-evals
- truthful_qa
- nightingal3/fig-qa
- tasksource/bigbench
- blimp
- cos_e
- cosmos_qa
- dream
- openbookqa
- qasc
- quartz
- quail
- head_qa
- sciq
- social_i_qa
- wiki_hop
- wiqa
- piqa
- hellaswag
- pkavumba/balanced-copa
- 12ml/e-CARE
- art
- winogrande
- codah
- ai2_arc
- definite_pronoun_resolution
- swag
- math_qa
- metaeval/utilitarianism
- mteb/amazon_counterfactual
- SetFit/insincere-questions
- SetFit/toxic_conversations
- turingbench/TuringBench
- trec
- tals/vitaminc
- hope_edi
- strombergnlp/rumoureval_2019
- ethos
- tweet_eval
- discovery
- pragmeval
- silicone
- lex_glue
- papluca/language-identification
- imdb
- rotten_tomatoes
- ag_news
- yelp_review_full
- financial_phrasebank
- poem_sentiment
- dbpedia_14
- amazon_polarity
- app_reviews
- hate_speech18
- sms_spam
- humicroedit
- snips_built_in_intents
- hate_speech_offensive
- yahoo_answers_topics
- pacovaldez/stackoverflow-questions
- zapsdcn/hyperpartisan_news
- zapsdcn/sciie
- zapsdcn/citation_intent
- go_emotions
- allenai/scicite
- liar
- relbert/lexical_relation_classification
- tasksource/linguisticprobing
- tasksource/crowdflower
- metaeval/ethics
- emo
- google_wellformed_query
- tweets_hate_speech_detection
- has_part
- blog_authorship_corpus
- launch/open_question_type
- health_fact
- commonsense_qa
- mc_taco
- ade_corpus_v2
- prajjwal1/discosense
- circa
- PiC/phrase_similarity
- copenlu/scientific-exaggeration-detection
- quarel
- mwong/fever-evidence-related
- numer_sense
- dynabench/dynasent
- raquiba/Sarcasm_News_Headline
- sem_eval_2010_task_8
- demo-org/auditor_review
- medmcqa
- RuyuanWan/Dynasent_Disagreement
- RuyuanWan/Politeness_Disagreement
- RuyuanWan/SBIC_Disagreement
- RuyuanWan/SChem_Disagreement
- RuyuanWan/Dilemmas_Disagreement
- lucasmccabe/logiqa
- wiki_qa
- tasksource/cycic_classification
- tasksource/cycic_multiplechoice
- tasksource/sts-companion
- tasksource/commonsense_qa_2.0
- tasksource/lingnli
- tasksource/monotonicity-entailment
- tasksource/arct
- tasksource/scinli
- tasksource/naturallogic
- onestop_qa
- demelin/moral_stories
- corypaik/prost
- aps/dynahate
- metaeval/syntactic-augmentation-nli
- tasksource/autotnli
- lasha-nlp/CONDAQA
- openai/webgpt_comparisons
- Dahoas/synthetic-instruct-gptj-pairwise
- metaeval/scruples
- metaeval/wouldyourather
- metaeval/defeasible-nli
- tasksource/help-nli
- metaeval/nli-veridicality-transitivity
- tasksource/lonli
- tasksource/dadc-limit-nli
- ColumbiaNLP/FLUTE
- tasksource/strategy-qa
- openai/summarize_from_feedback
- tasksource/folio
- yale-nlp/FOLIO
- tasksource/tomi-nli
- tasksource/avicenna
- stanfordnlp/SHP
- GBaker/MedQA-USMLE-4-options-hf
- sileod/wikimedqa
- declare-lab/cicero
- amydeng2000/CREAK
- tasksource/mutual
- inverse-scaling/NeQA
- inverse-scaling/quote-repetition
- inverse-scaling/redefine-math
- tasksource/puzzte
- tasksource/implicatures
- race
- tasksource/race-c
- tasksource/spartqa-yn
- tasksource/spartqa-mchoice
- tasksource/temporal-nli
- riddle_sense
- tasksource/clcd-english
- maximedb/twentyquestions
- metaeval/reclor
- tasksource/counterfactually-augmented-imdb
- tasksource/counterfactually-augmented-snli
- metaeval/cnli
- tasksource/boolq-natural-perturbations
- metaeval/acceptability-prediction
- metaeval/equate
- tasksource/ScienceQA_text_only
- Jiangjie/ekar_english
- tasksource/implicit-hate-stg1
- metaeval/chaos-mnli-ambiguity
- IlyaGusev/headline_cause
- tasksource/logiqa-2.0-nli
- tasksource/oasst2_dense_flat
- sileod/mindgames
- metaeval/ambient
- metaeval/path-naturalness-prediction
- civil_comments
- AndyChiang/cloth
- AndyChiang/dgen
- tasksource/I2D2
- webis/args_me
- webis/Touche23-ValueEval
- tasksource/starcon
- PolyAI/banking77
- tasksource/ConTRoL-nli
- tasksource/tracie
- tasksource/sherliic
- tasksource/sen-making
- tasksource/winowhy
- tasksource/robustLR
- CLUTRR/v1
- tasksource/logical-fallacy
- tasksource/parade
- tasksource/cladder
- tasksource/subjectivity
- tasksource/MOH
- tasksource/VUAC
- tasksource/TroFi
- sharc_modified
- tasksource/conceptrules_v2
- metaeval/disrpt
- tasksource/zero-shot-label-nli
- tasksource/com2sense
- tasksource/scone
- tasksource/winodict
- tasksource/fool-me-twice
- tasksource/monli
- tasksource/corr2cause
- lighteval/lsat_qa
- tasksource/apt
- zeroshot/twitter-financial-news-sentiment
- tasksource/icl-symbol-tuning-instruct
- tasksource/SpaceNLI
- sihaochen/propsegment
- HannahRoseKirk/HatemojiBuild
- tasksource/regset
- tasksource/esci
- lmsys/chatbot_arena_conversations
- neurae/dnd_style_intents
- hitachi-nlp/FLD.v2
- tasksource/SDOH-NLI
- allenai/scifact_entailment
- tasksource/feasibilityQA
- tasksource/simple_pair
- tasksource/AdjectiveScaleProbe-nli
- tasksource/resnli
- tasksource/SpaRTUN
- tasksource/ReSQ
- tasksource/semantic_fragments_nli
- MoritzLaurer/dataset_train_nli
- tasksource/stepgame
- tasksource/nlgraph
- tasksource/oasst2_pairwise_rlhf_reward
- tasksource/hh-rlhf
- tasksource/ruletaker
- qbao775/PARARULE-Plus
- tasksource/proofwriter
- tasksource/logical-entailment
- tasksource/nope
- tasksource/LogicNLI
- kiddothe2b/contract-nli
- AshtonIsNotHere/nli4ct_semeval2024
- tasksource/lsat-ar
- tasksource/lsat-rc
- AshtonIsNotHere/biosift-nli
- tasksource/brainteasers
- Anthropic/persuasion
- erbacher/AmbigNQ-clarifying-question
- tasksource/SIGA-nli
- unigram/FOL-nli
- tasksource/goal-step-wikihow
- GGLab/PARADISE
- tasksource/doc-nli
- tasksource/mctest-nli
- tasksource/patent-phrase-similarity
- tasksource/natural-language-satisfiability
- tasksource/idioms-nli
- tasksource/lifecycle-entailment
- nvidia/HelpSteer
- nvidia/HelpSteer2
- sadat2307/MSciNLI
- pushpdeep/UltraFeedback-paired
- tasksource/AES2-essay-scoring
- tasksource/english-grading
- tasksource/wice
- Dzeniks/hover
- sileod/missing-item-prediction
- tasksource/tasksource_dpo_pairs
language: en library_name: transformers license: apache-2.0 metrics:
- accuracy pipeline_tag: zero-shot-classification tags:
- deberta-v3-small
- deberta-v3
- deberta
- text-classification
- nli
- natural-language-inference
- multitask
- multi-task
- pipeline
- extreme-multi-task
- extreme-mtl
- tasksource
- zero-shot
- rlhf
DeBERTa-v3-small-tasksource-nli 模型卡片
基于DeBERTa-v3-small模型,上下文长度扩展至1680个标记,在tasksource数据集上进行了25万步微调。针对长文本NLI任务(如ConTRoL、doc-nli)进行了过采样。训练数据包含HelpSteer v1/v2、逻辑推理任务(FOLIO、FOL-nli、LogicNLI等)、OASST、hh/rlhf、语言学导向的NLI任务、tasksource-dpo以及事实核查任务。
该模型适用于长上下文NLI任务,或作为奖励模型及分类器微调的基础模型。
此检查点在众多任务上展现出强大的零样本验证性能(例如WNLI达到70%准确率),可用于:
- 基于蕴含关系的零样本任意标签分类[ZS]
- 自然语言推理[NLI]
- 新任务或tasksource任务(分类、标记分类或多选)的进一步微调[FT]
测试名称 | 准确率 |
---|---|
anli/a1 | 57.2 |
anli/a2 | 46.1 |
anli/a3 | 47.2 |
nli_fever | 71.7 |
FOLIO | 47.1 |
ConTRoL-nli | 52.2 |
cladder | 52.8 |
zero-shot-label-nli | 70.0 |
chatbot_arena_conversations | 67.8 |
oasst2_pairwise_rlhf_reward | 75.6 |
doc-nli | 75.0 |
GPT-4在FOLIO(逻辑推理)上的零样本得分为61%,在cladder(概率推理)上为62%,在ConTRoL(长上下文NLI)上为56.4%。
[ZS] 零样本分类流程
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="tasksource/deberta-small-long-nli")
text = "有一天我会环游世界"
candidate_labels = ['旅行', '烹饪', '舞蹈']
classifier(text, candidate_labels)
该模型的NLI训练数据包含专门构建的label-nli数据集,旨在提升此类零样本分类性能。
[NLI] 自然语言推理流程
from transformers import pipeline
pipe = pipeline("text-classification", model="tasksource/deberta-small-long-nli")
pipe([dict(text='有只猫',
text_pair='有只黑猫')]) # (前提,假设)列表
# [{'label': 'neutral', 'score': 0.9952911138534546}]
[FT] Tasknet:三行代码实现微调
# !pip install tasknet
import tasknet as tn
hparams=dict(model_name='tasksource/deberta-small-long-nli', learning_rate=2e-5)
model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
trainer.train()
软件与训练细节
模型在600个任务上训练了25万步,批量大小为384,峰值学习率为2e-5。在Nvidia A30 24GB GPU上训练耗时14天。这是带有MNLI分类器的共享模型。每个任务都有特定的CLS嵌入,10%的概率会被丢弃以便于模型在没有该嵌入时使用。所有多选题模型共享相同的分类层。对于分类任务,如果标签匹配则模型共享权重。
https://github.com/sileod/tasksource/
https://github.com/sileod/tasknet/
训练代码:https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
引用
更多细节参见文章:
@inproceedings{sileo-2024-tasksource,
title = "tasksource: 结构化数据集预处理框架下的NLP任务大型集合",
author = "Sileo, Damien",
booktitle = "2024年计算语言学国际联合会议暨语言资源与评估会议论文集(LREC-COLING 2024)",
month = 5,
year = "2024",
address = "意大利都灵",
publisher = "ELRA和ICCL",
url = "https://aclanthology.org/2024.lrec-main.1361",
pages = "15655--15684",
}
模型卡片联系方式
damien.sileo@inria.fr


