language:
- 希伯来语
pipeline_tag: 零样本分类
datasets:
- HeTree/MevakerConcTree
license: apache-2.0
widget:
- text: "上周我升级了手机系统版本。"
candidate_labels: "通话设备, 账单支付, 足球狂热"
multi_class: false
example_title: "设备升级"
希伯来语交叉编码器模型
使用方法
from sentence_transformers import CrossEncoder
model = CrossEncoder('HeTree/HeCross')
scores = model.predict([('柏林有多少人口?', '柏林在891.82平方公里区域内登记有3,520,031名居民。'),
('柏林有多少人口?', '纽约市以大都会艺术博物馆闻名。')])
print(scores)
使用Transformers的AutoModel
您也可以直接使用Transformers库(无需SentenceTransformers库):
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
model = AutoModelForSequenceClassification.from_pretrained('HeTree/HeCross')
tokenizer = AutoTokenizer.from_pretrained('HeTree/HeCross')
features = tokenizer(['柏林有多少人口?', '柏林有多少人口?'],
['柏林在891.82平方公里区域内登记有3,520,031名居民。', '纽约市以大都会艺术博物馆闻名。'],
padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = sigmoid(model(**features).logits)
print(scores)
零样本分类
此模型也可用于零样本分类:
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model='HeTree/HeCross')
sent = "上周我升级了手机系统版本。"
candidate_labels = ["通话设备", "网站", "账单支付", "银行账户访问"]
res = classifier(sent, candidate_labels)
print(res)
引用
如果您在研究中使用了HeCross,请引用Mevaker: Conclusion Extraction and Allocation Resources for the Hebrew Language。
@article{shalumov2024mevaker,
title={Mevaker: Conclusion Extraction and Allocation Resources for the Hebrew Language},
author={Vitaly Shalumov and Harel Haskey and Yuval Solaz},
year={2024},
eprint={2403.09719},
archivePrefix={arXiv},
primaryClass={cs.CL}
}