Drug_Ollama_v3-2开源大语言模型 - 免费部署专用于药物领域文本生成

首页

Drug Ollama V3 2

由 Ketak-ZoomRx 开发

该模型是基于open_llama_3b使用H2O LLM Studio训练的大语言模型，专注于药物相关领域的文本生成任务。

大型语言模型

Transformers

英语#医药问答 #低参数量 #精准医疗

下载量 99

发布时间 : 11/17/2023

模型简介

这是一个基于Llama架构的大语言模型，专门针对药物领域进行了优化训练，能够生成与药物相关的文本内容。

模型特点

药物领域优化

针对药物相关领域进行了专门的训练优化

高效推理

支持量化加载(4bit/8bit)和多GPU分片推理

可控生成

提供多种参数控制生成结果，如temperature、repetition_penalty等

模型能力

药物相关文本生成

问答系统

文本补全

使用案例

医疗健康

药物信息问答

回答关于药物作用、副作用等专业问题

医疗报告生成

辅助生成医疗相关的报告文本

🚀 模型卡片

本模型使用 H2O LLM Studio 进行训练，可用于自然语言处理任务，基于预训练模型进行微调，能根据输入生成相关文本。

🚀 快速开始

本模型可与 transformers 库结合使用，以下是使用前的准备步骤和使用示例。

📦 安装指南

若要在配备 GPU 的机器上使用 transformers 库调用此模型，需先确保已安装 transformers、accelerate 和 torch 库。可使用以下命令进行安装：

pip install transformers==4.29.2
pip install einops==0.6.1
pip install accelerate==0.19.0
pip install torch==2.0.0

💻 使用示例

基础用法

以下代码展示了如何使用 pipeline 调用模型生成文本：

import torch
from transformers import pipeline

generate_text = pipeline(
    model="Ketak-ZoomRx/Drug_Ollama_v3-2",
    torch_dtype="auto",
    trust_remote_code=True,
    use_fast=True,
    device_map={"": "cuda:0"},
)

res = generate_text(
    "Why is drinking water so healthy?",
    min_new_tokens=2,
    max_new_tokens=256,
    do_sample=False,
    num_beams=1,
    temperature=float(0.0),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)
print(res[0]["generated_text"])

你可以打印预处理步骤后的示例提示，查看其如何输入到分词器中：

print(generate_text.preprocess("Why is drinking water so healthy?")["prompt_text"])

输出结果如下：

<|prompt|>Why is drinking water so healthy?</s><|answer|>

高级用法

你可以下载 h2oai_pipeline.py，将其与你的笔记本放在同一目录下，然后从加载的模型和分词器自行构建管道。如果模型和分词器在 transformers 包中得到完全支持，你可以将 trust_remote_code 设置为 False：

import torch
from h2oai_pipeline import H2OTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Ketak-ZoomRx/Drug_Ollama_v3-2",
    use_fast=True,
    padding_side="left",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "Ketak-ZoomRx/Drug_Ollama_v3-2",
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=True,
)
generate_text = H2OTextGenerationPipeline(model=model, tokenizer=tokenizer)

res = generate_text(
    "Why is drinking water so healthy?",
    min_new_tokens=2,
    max_new_tokens=256,
    do_sample=False,
    num_beams=1,
    temperature=float(0.0),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)
print(res[0]["generated_text"])

你也可以从加载的模型和分词器自行构建管道，并考虑预处理步骤：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Ketak-ZoomRx/Drug_Ollama_v3-2"  # either local folder or huggingface model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
prompt = "<|prompt|>How are you?</s><|answer|>"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    use_fast=True,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=True,
)
model.cuda().eval()
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

# generate configuration can be modified to your needs
tokens = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    min_new_tokens=2,
    max_new_tokens=256,
    do_sample=False,
    num_beams=1,
    temperature=float(0.0),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)

🔧 技术细节

量化与分片

你可以通过指定 load_in_8bit=True 或 load_in_4bit=True 来使用量化方式加载模型。此外，通过设置 device_map=auto 可以在多个 GPU 上进行分片加载。

模型架构

模型架构如下：

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 3200, padding_idx=0)
    (layers): ModuleList(
      (0-25): 26 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (k_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (v_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (o_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=3200, out_features=8640, bias=False)
          (down_proj): Linear(in_features=8640, out_features=3200, bias=False)
          (up_proj): Linear(in_features=3200, out_features=8640, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=3200, out_features=32000, bias=False)
)

模型配置

本模型使用 H2O LLM Studio 进行训练，具体配置见 cfg.yaml。你可以访问 H2O LLM Studio 了解如何训练自己的大语言模型。

📚 详细文档

免责声明

在使用本仓库提供的大语言模型之前，请仔细阅读本免责声明。使用该模型即表示你同意以下条款和条件：

偏见与冒犯性内容：该大语言模型基于多种互联网文本数据进行训练，这些数据可能包含有偏见、种族主义、冒犯性或其他不适当的内容。使用此模型即表示你承认并接受生成的内容有时可能存在偏见或产生冒犯性、不适当的内容。本仓库的开发者不支持、认可或推广任何此类内容或观点。
局限性：大语言模型是基于人工智能的工具，并非人类。它可能会产生错误、无意义或不相关的回复。用户有责任批判性地评估生成的内容，并自行决定是否使用。
风险自负：使用此大语言模型的用户必须对使用该工具可能产生的任何后果承担全部责任。本仓库的开发者和贡献者对因使用或滥用所提供的模型而导致的任何损害、损失或伤害不承担任何责任。
道德考量：鼓励用户负责任且符合道德地使用该大语言模型。使用此模型即表示你同意不将其用于宣扬仇恨言论、歧视、骚扰或任何非法或有害活动的目的。
问题反馈：如果你遇到大语言模型生成的任何有偏见、冒犯性或其他不适当的内容，请通过提供的渠道向仓库维护者报告。你的反馈将有助于改进模型并减少潜在问题。
免责声明的变更：本仓库的开发者保留在任何时候修改或更新本免责声明的权利，无需事先通知。用户有责任定期查看免责声明，以了解任何变更。

使用本仓库提供的大语言模型即表示你同意接受并遵守本免责声明中规定的条款和条件。如果你不同意本免责声明的任何部分，则不应使用该模型及其生成的任何内容。