🚀 ReasonIR-8B模型
ReasonIR-8B是首个专门为通用推理任务训练的检索器,在BRIGHT(推理密集型检索)上实现了最先进的检索性能。在用于检索增强生成(RAG)时,ReasonIR-8B还能在MMLU和GPQA上带来显著提升。
🚀 快速开始
使用该模型前,请确保先安装 transformers>=4.47.0
!
✨ 主要特性
- ReasonIR-8B是首个专门针对通用推理任务训练的检索器,在推理密集型检索任务BRIGHT上达到了当前最优的检索性能。
- 在检索增强生成(RAG)任务中,能在MMLU和GPQA上带来显著的性能提升。
📦 安装指南
使用前请确保安装 transformers>=4.47.0
。
💻 使用示例
基础用法
Transformers库使用示例
from transformers import AutoModel
model = AutoModel.from_pretrained("reasonir/ReasonIR-8B", torch_dtype="auto", trust_remote_code=True)
model = model.to("cuda")
model.eval()
query = "The quick brown fox jumps over the lazy dog."
document = "The quick brown fox jumps over the lazy dog."
query_instruction = ""
doc_instruction = ""
query_emb = model.encode(query, instruction=query_instruction)
doc_emb = model.encode(document, instruction=doc_instruction)
sim = query_emb @ doc_emb.T
使用 AutoModel
时,需要注意:
- 包含
trust_remote_code=True
以确保使用自定义的双向编码架构。
- 使用
torch_dtype="auto"
以激活 bf16
(默认情况下torch将使用 fp32
)。
Sentence Transformers库使用示例
from sentence_transformers import SentenceTransformer
model_kwargs = {"torch_dtype": "auto"}
model = SentenceTransformer("reasonir/ReasonIR-8B", trust_remote_code=True, model_kwargs=model_kwargs)
query = "The quick brown fox jumps over the lazy dog."
document = "The quick brown fox jumps over the lazy dog."
query_instruction = ""
doc_instruction = ""
query_emb = model.encode(query, prompt=query_instruction)
doc_emb = model.encode(document, prompt=doc_instruction)
sim = model.similarity(query_emb, doc_emb)
同样需要包含 trust_remote_code=True
和 torch_dtype="auto"
。
高级用法
使用 SentenceTransformer
调用模型时,由于模型转换为 bfloat16
数据类型的方式,可能会存在一些非常细微的浮点差异,但一般不会影响结果。
from sentence_transformers import SentenceTransformer
model_kwargs = {"torch_dtype": "auto"}
model = SentenceTransformer("reasonir/ReasonIR-8B", trust_remote_code=True, model_kwargs=model_kwargs)
query = "The quick brown fox jumps over the lazy dog."
document = "The quick brown fox jumps over the lazy dog."
query_instruction = ""
doc_instruction = ""
query_emb = model.encode(query, prompt=query_instruction)
doc_emb = model.encode(document, prompt=doc_instruction)
sim = model.similarity(query_emb, doc_emb)
📚 详细文档
- 论文:https://arxiv.org/abs/2504.20595
- 代码仓库:https://github.com/facebookresearch/ReasonIR
- 数据:https://huggingface.co/datasets/reasonir/reasonir-data
📄 许可证
本模型遵循CC BY-NC 4.0许可证。
🔧 技术细节
属性 |
详情 |
基础模型 |
meta-llama/Llama-3.1-8B |
语言 |
英文 |
许可证 |
CC BY-NC 4.0 |
任务类型 |
特征提取 |
库名称 |
transformers |
标签 |
sentence-transformers |
📚 引用
@article{shao2025reasonir,
title={ReasonIR: Training Retrievers for Reasoning Tasks},
author={Rulin Shao and Rui Qiao and Varsha Kishore and Niklas Muennighoff and Xi Victoria Lin and Daniela Rus and Bryan Kian Hsiang Low and Sewon Min and Wen-tau Yih and Pang Wei Koh and Luke Zettlemoyer},
year={2025},
journal={arXiv preprint arXiv:2504.20595},
url={https://arxiv.org/abs/2504.20595},
}