Qwen2-7B-Instruct-embed-base开源模型 - 免费部署生成高质量文本嵌入

首页

Qwen2 7B Instruct Embed Base

由 ssmits 开发

Qwen2系列中的7B参数规模嵌入模型，专为生成高质量文本嵌入设计

文本嵌入

Safetensors

英语开源协议:Apache-2.0 #大语言模型嵌入 #高维语义编码 #多语言支持

下载量 2,895

发布时间 : 6/7/2024

模型简介

基于Transformer架构的预训练语言模型，移除lm_head层后专门用于生成文本嵌入向量，适用于需要语义表示的下游任务

模型特点

改进的分词器

自适应多种自然语言和代码处理需求

高级注意力机制

采用分组查询注意力技术提升效率

专用嵌入模型

移除lm_head层优化嵌入向量生成

模型能力

文本嵌入生成

语义相似度计算

上下文理解

使用案例

语义搜索

文档检索

通过嵌入相似度实现精准文档匹配

文本分类

情感分析

利用嵌入向量作为分类器输入特征

🚀 Qwen2-7B-Instruct-embed-base

Qwen2-7B-Instruct-embed-base是一个用于文本分类的预训练模型。它基于Transformer架构，具备多种先进特性，可用于生成文本嵌入，在自然语言处理领域有广泛的应用价值。

🚀 快速开始

本部分将指导你如何快速使用Qwen2-7B-Instruct-embed-base模型进行推理。

✨ 主要特性

Qwen2是一个语言模型系列，包含不同模型大小的解码器语言模型。针对每个大小，都会发布基础语言模型和对齐的聊天模型。
基于Transformer架构，采用SwiGLU激活函数、注意力QKV偏差、分组查询注意力等技术。
拥有改进的分词器，适用于多种自然语言和代码。

📦 安装指南

Qwen2的代码已集成在最新的Hugging face transformers库中。建议你安装transformers>=4.37.0，否则可能会遇到以下错误：

KeyError: 'qwen2'

💻 使用示例

基础用法

此模型的lm_head层已被移除，这意味着它可用于生成嵌入。不过，由于需要进一步微调，其性能可能不是最佳，可参考 intfloat/e5-mistral-7b-instruct 进行优化。

from sentence_transformers import SentenceTransformer
import torch

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("ssmits/Qwen2-7B-embed-base") # device = "cpu" when <= 24 GB VRAM

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 3584)

# 3. Calculate the embedding similarities
# Assuming embeddings is a numpy array, convert it to a torch tensor
embeddings_tensor = torch.tensor(embeddings)

# Using torch to compute cosine similarity matrix
similarities = torch.nn.functional.cosine_similarity(embeddings_tensor.unsqueeze(0), embeddings_tensor.unsqueeze(1), dim=2)

print(similarities)
# tensor([[1.0000, 0.8608, 0.6609],
#         [0.8608, 1.0000, 0.7046],
#         [0.6609, 0.7046, 1.0000]])

⚠️ 重要提示

在测试中发现，该模型推理时显存使用超过24GB（RTX 4090），因此建议使用A100或A6000进行推理。

高级用法

不使用sentence-transformers库时，你可以按以下方式使用该模型：首先将输入传递给Transformer模型，然后对上下文词嵌入应用适当的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch

#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('ssmits/Qwen2-7B-Instruct-embed-base')
model = AutoModel.from_pretrained('ssmits/Qwen2-7B-Instruct-embed-base') # device = "cpu" when <= 24 GB VRAM

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

启用多GPU的方法

from transformers import AutoModel
from torch.nn import DataParallel

model = AutoModel.from_pretrained("ssmits/Qwen2-7B-Instruct-embed-base")
for module_key, module in model._modules.items():
    model._modules[module_key] = DataParallel(module)