许可证: mit
数据集:
- hotchpotch/JQaRA
- shunk031/JGLUE
- miracl/miracl
- castorini/mr-tydi
- unicamp-dl/mmarco
语言:
- ja
库名称: sentence-transformers
任务标签: text-ranking
新版本: hotchpotch/japanese-reranker-xsmall-v2
hotchpotch/japanese-reranker-cross-encoder-xsmall-v1
这是用日语训练的Reranker(交叉编码器)系列模型。
关于Reranker的技术报告和评估,请参考以下内容:
使用方法
SentenceTransformers
from sentence_transformers import CrossEncoder
import torch
MODEL_NAME = "hotchpotch/japanese-reranker-cross-encoder-xsmall-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = CrossEncoder(MODEL_NAME, max_length=512, device=device)
if device == "cuda":
model.model.half()
query = "关于感人的电影"
passages = [
"虽然主题深刻,但能打动观众的名作。角色心理描写出色,结局让人泪目。",
"虽然信息传达很重要,但故事太阴暗,让人心情低落。如果能多一些明亮的元素就好了。",
"总觉得剧情缺乏真实感,希望能看到更有深度的人性剧。",
"动作场面非常精彩,看得很过瘾。剧情虽然简单,但反而很好。",
]
scores = model.predict([(query, passage) for passage in passages])
HuggingFace Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.nn import Sigmoid
MODEL_NAME = "hotchpotch/japanese-reranker-cross-encoder-xsmall-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model.to(device)
model.eval()
if device == "cuda":
model.half()
query = "关于感人的电影"
passages = [
"虽然主题深刻,但能打动观众的名作。角色心理描写出色,结局让人泪目。",
"虽然信息传达很重要,但故事太阴暗,让人心情低落。如果能多一些明亮的元素就好了。",
"总觉得剧情缺乏真实感,希望能看到更有深度的人性剧。",
"动作场面非常精彩,看得很过瘾。剧情虽然简单,但反而很好。",
]
inputs = tokenizer(
[(query, passage) for passage in passages],
padding=True,
truncation=True,
max_length=512,
return_tensors="pt",
)
inputs = {k: v.to(device) for k, v in inputs.items()}
logits = model(**inputs).logits
activation = Sigmoid()
scores = activation(logits).squeeze().tolist()
评估结果
许可证
MIT许可证