Prometheus-13B-V1.0开源评估型语言模型 - 免费定制评估文本，替代GPT-4

首页

Prometheus 13b V1.0

由 prometheus-eval 开发

普罗米修斯是基于Llama-2-Chat微调的评估型语言模型，擅长根据定制标准评估文本质量，可作为GPT-4评估的经济替代方案。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #细粒度评估 #RLHF奖励模型 #GPT-4替代方案

下载量 1,726

发布时间 : 10/12/2023

模型简介

该模型经过10万条反馈数据微调，能根据参考答案和评分标准对文本回答进行细粒度评估，性能接近GPT-4。同时可作为RLHF的奖励模型。

模型特点

细粒度评估能力

通过参考答案和定制评分标准实现比通用模型更精确的文本评估

经济高效替代方案

评估性能超越GPT-3.5-Turbo，与GPT-4持平但成本更低

多标准适应性

支持儿童可读性、文化敏感性、创造力等定制化评估标准

模型能力

文本质量评估

反馈生成

奖励建模

多维度评分

使用案例

模型评估

大语言模型输出评估

根据特定标准评估不同LLM生成文本的质量

在多项基准测试中与GPT-4评估结果高度一致

强化学习

RLHF奖励模型

为人类反馈强化学习提供自动化的奖励信号

🚀 Prometheus模型

Prometheus是一种用于对基础大语言模型（LLM）进行细粒度评估的工具，可作为GPT - 4评估的替代方案。同时，它还能作为基于人类反馈的强化学习（RLHF）中的奖励模型，为模型训练提供有价值的反馈。

🚀 快速开始

Prometheus是一个以[Llama - 2 - Chat](https://huggingface.co/meta - llama/Llama - 2 - 13b - chat - hf)为基础模型的语言模型，它在[反馈集合](https://huggingface.co/datasets/kaist - ai/Feedback - Collection)中的100K反馈数据上进行了微调。由于在大量反馈数据上进行了微调，它在评估长篇回复方面表现出色，在各种基准测试中，性能优于GPT - 3.5 - Turbo和Llama - 2 - Chat 70B，与GPT - 4相当。最重要的是，通过附加2份参考材料（参考答案和定制评分标准），使得该模型的评估更加准确和有效。Prometheus是GPT - 4评估的一种经济高效且强大的替代方案，用户可以使用它根据定制标准（如儿童可读性、文化敏感性、创造力）评估大语言模型。此外，它还可以用作基于人类反馈的强化学习（RLHF）的奖励模型。

✨ 主要特性

细粒度评估：能够对基础大语言模型进行细粒度评估，作为GPT - 4评估的替代方案。
奖励模型：可作为基于人类反馈的强化学习（RLHF）中的奖励模型。
性能出色：在评估长篇回复方面表现出色，优于GPT - 3.5 - Turbo和Llama - 2 - Chat 70B，与GPT - 4相当。
定制评估：可以根据定制标准（如儿童可读性、文化敏感性、创造力）评估大语言模型。

📦 安装指南

文档未提供具体安装步骤，可参考相关代码示例中的依赖安装部分，如：

pip install accelerate
pip install bitsandbytes accelerate

💻 使用示例

基础用法

以下是在transformers库中使用Prometheus模型的示例脚本，展示了如何在CPU上运行该模型：

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto")

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

高级用法

在GPU上运行模型

# pip install accelerate
import torch
from transformers import AutoTokenizer, LlamaForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto")

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, sample=True, temperature=1.0, top_p=0.9, max_new_tokens=256, repetition_penalty=1.03)
print(tokenizer.decode(outputs[0]))

在GPU上使用不同精度运行模型

FP16精度

# pip install accelerate
import torch
from transformers import AutoTokenizer, LlamaForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto", torch_dtype=torch.float16)

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

INT8精度

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, LlamaForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto", load_in_8bit=True)

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

📚 详细文档

参考链接

主页：https://github.com/kaistAI/Prometheus
仓库：https://github.com/kaistAI/Prometheus
论文：https://arxiv.org/abs/2310.08491
联系人：seungone@kaist.ac.kr

模型详情

模型描述

属性	详情
模型类型	语言模型
语言 (NLP)	英语
许可证	Apache 2.0
相关模型	[所有Prometheus检查点](https://huggingface.co/models?search=kaist - ai/Prometheus)
更多信息资源	研究论文、GitHub仓库

Prometheus有两种不同大小的模型（7B和13B）。你可以在[此页面](https://huggingface.co/kaist - ai/prometheus - 7b - v1.0)查看7B大小的语言模型。同时，也可以在[此页面](https://huggingface.co/datasets/kaist - ai/Feedback - Collection)查看数据集。

提示格式

Prometheus的输入需要包含4个组件：指令、待评估的回复、评分标准和参考答案。你可以参考以下提示格式：

###任务描述:
给定一个指令（可能包含输入）、一个待评估的回复、一个得分为5的参考答案以及一个代表评估标准的评分标准。
1. 严格根据给定的评分标准，撰写一份详细的反馈，评估回复的质量，而不是进行一般性评估。
2. 撰写反馈后，给出一个1到5之间的整数评分。你应该参考评分标准。
3. 输出格式应如下所示："反馈: (为标准撰写的反馈) [RESULT] (1到5之间的整数)"
4. 请不要生成任何其他开头、结尾和解释。

###待评估的指令:
{instruction}

###待评估的回复:
{response}

###参考答案 (得分5):
{reference_answer}

###评分标准:
[{criteria_description}]
得分1: {score1_description}
得分2: {score2_description}
得分3: {score3_description}
得分4: {score4_description}
得分5: {score5_description}

###反馈:

之后，你需要应用Llama - 2 - Chat的对话模板（不应用可能会导致意外行为）。你可以在[此链接](https://github.com/lm - sys/FastChat/blob/main/fastchat/conversation.py)找到对话类。

conv = get_conv_template("llama-2")
conv.set_system_message("You are a fair evaluator language model.")
conv.append_message(conv.roles[0], dialogs['instruction'])
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()

x = tokenizer(prompt,truncation=False)

最终，会生成一个反馈和评分，以分隔短语[RESULT] 分隔。

许可证

反馈集合和Prometheus生成的数据需遵循OpenAI的使用条款。如果你怀疑有任何违规行为，请联系我们。

🔧 技术细节

Prometheus以[Llama - 2 - Chat](https://huggingface.co/meta - llama/Llama - 2 - 13b - chat - hf)为基础模型，在[反馈集合](https://huggingface.co/datasets/kaist - ai/Feedback - Collection)中的100K反馈数据上进行了微调。通过附加2份参考材料（参考答案和定制评分标准），使得模型在评估长篇回复方面表现出色。在各种基准测试中，其性能优于GPT - 3.5 - Turbo和Llama - 2 - Chat 70B，与GPT - 4相当。

📄 许可证

反馈集合和Prometheus生成的数据需遵循OpenAI的使用条款。如果你怀疑有任何违规行为，请联系我们。

📚 引用

如果你发现该模型有帮助，请考虑引用我们的论文！

BibTeX：

@misc{kim2023prometheus,
    title={Prometheus: Inducing Fine-grained Evaluation Capability in Language Models},
    author={Seungone Kim and Jamin Shin and Yejin Cho and Joel Jang and Shayne Longpre and Hwaran Lee and Sangdoo Yun and Seongjin Shin and Sungdong Kim and James Thorne and Minjoon Seo},
    year={2023},
    eprint={2310.08491},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}