base_model: allenai/longformer-base-4096
datasets:
- sentence-transformers/all-nli
- sentence-transformers/stsb
- sentence-transformers/quora-duplicates
- sentence-transformers/natural-questions
language:
- en
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:65749
- loss:MultipleNegativesRankingLoss
- loss:SoftmaxLoss
- loss:CoSENTLoss
widget:
- source_sentence: 一名建筑工人站在起重机上,正在将一根巨大的吊臂安装到在建的雕像顶部。
sentences:
- source_sentence: 一个穿红色衣服的男孩从充气滑梯上滑下。
sentences:
- 一个拿着电钻的男人站在一个拿着吸尘器软管的女孩旁边
- 一个男孩正在充气游乐设施上玩耍
- 一个男孩用刀刺穿了充气游乐设施
- source_sentence: 一只动物正在咀嚼某物。
sentences:
- 一只系着红色牵引绳的狗在草地上追逐网球
- 一个男人正在吃东西
- 一只动物正在咀嚼钥匙链
- source_sentence: 有哪些适合入门机器学习的好书或参考资料?
sentences:
- 大英帝国衰落的原因是什么?
- 我应该如何学习机器学习?
- 随着宇宙膨胀,能否产生无限量的暗能量或真空能量或引力能量?
- source_sentence: 如何吸引女孩?
sentences:
- 我怎样才能吸引女孩?
- 为什么我的iPhone 5充不了电?
- 如果希特勒的德国赢得战争,2016年的世界会是什么样子?
基于allenai/longformer-base-4096的SentenceTransformer模型
这是一个基于allenai/longformer-base-4096微调的sentence-transformers模型,训练数据集包括all-nli-pair、all-nli-pair-class、all-nli-pair-score、all-nli-triplet、stsb、quora和natural-questions。该模型将句子和段落映射到768维密集向量空间,可用于语义文本相似度、语义搜索、复述挖掘、文本分类、聚类等任务。
模型详情
模型描述
模型来源
完整模型架构
SentenceTransformer(
(0): Transformer({'max_seq_length': 4098, 'do_lower_case': False}) with Transformer model: LongformerModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
使用方式
直接使用(Sentence Transformers)
首先安装Sentence Transformers库:
pip install -U sentence-transformers
然后可以加载该模型并运行推理:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Leo1212/longformer-base-4096-sentence-transformers-all-nli-stsb-quora-nq")
sentences = [
'如何吸引女孩?',
'我怎样才能吸引女孩?',
"为什么我的iPhone 5充不了电?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
训练详情
训练数据集
all-nli-pair
all-nli-pair-class
all-nli-pair-score
all-nli-triplet
stsb
- 数据集: stsb
- 大小: 5,749个训练样本
- 列:
sentence1
, sentence2
, 和 score
- 使用损失函数:
CoSENTLoss
quora
natural-questions
评估数据集
all-nli-triplet
stsb
- 数据集: stsb
- 大小: 1,500个评估样本
- 列:
sentence1
, sentence2
, 和 score
- 使用损失函数:
CoSENTLoss
quora
natural-questions
训练超参数
非默认超参数
overwrite_output_dir
: True
eval_strategy
: steps
num_train_epochs
: 5
load_best_model_at_end
: True
训练日志
Epoch |
Step |
训练损失 |
all-nli-triplet损失 |
stsb损失 |
natural-questions损失 |
quora损失 |
0.0487 |
200 |
2.0928 |
- |
- |
- |
- |
0.0973 |
400 |
2.2013 |
- |
- |
- |
- |
0.1460 |
600 |
1.7404 |
- |
- |
- |
- |
0.1946 |
800 |
1.9134 |
- |
- |
- |
- |
0.2433 |
1000 |
2.043 |
0.5161 |
6.2815 |
0.1172 |
0.0192 |
0.2920 |
1200 |
1.8817 |
- |
- |
- |
- |
0.3406 |
1400 |
1.7734 |
- |
- |
- |
- |
0.3893 |
1600 |
1.5935 |
- |
- |
- |
- |
0.4380 |
1800 |
1.6762 |
- |
- |
- |
- |
0.4866 |
2000 |
1.7031 |
0.4555 |
6.3907 |
0.0726 |
0.0198 |
0.5353 |
2200 |
1.8561 |
- |
- |
- |
- |
0.5839 |
2400 |
1.6742 |
- |
- |
- |
- |
0.6326 |
2600 |
1.456 |
- |
- |
- |
- |
0.6813 |
2800 |
1.6122 |
- |
- |
- |
- |
0.7299 |
3000 |
1.8851 |
0.4975 |
6.1758 |
0.0841 |
0.0208 |
0.7786 |
3200 |
1.5684 |
- |
- |
- |
- |
0.8273 |
3400 |
1.6535 |
- |
- |
- |
- |
0.8759 |
3600 |
1.5043 |
- |
- |
- |
- |
0.9246 |
3800 |
1.4768 |
- |
- |
- |
- |
0.9732 |
4000 |
1.686 |
0.4912 |
6.1600 |
0.0795 |
0.0170 |
框架版本
- Python: 3.11.9
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.3.1+cu121
- Accelerate: 1.0.0
- Datasets: 3.0.1
- Tokenizers: 0.20.0
引用
BibTeX
Sentence Transformers和SoftmaxLoss
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
CoSENTLoss
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}