Qwen3-8B-grpo-medmcqa开源医学AI模型 - 免费部署精准回答医学选择题

首页

Qwen3 8B Grpo Medmcqa

由 mlxha 开发

基于Qwen/Qwen3-8B在medmcqa-grpo数据集上微调的版本，专注于医学选择题回答任务

大型语言模型

Transformers

#医疗问答推理 #GRPO优化 #TRL微调

下载量 84

发布时间 : 5/8/2025

模型简介

该模型是基于Qwen/Qwen3-8B在medmcqa-grpo数据集上使用TRL和GRPO方法微调的版本，主要用于医学领域的选择题回答任务

模型特点

GRPO训练方法

采用GRPO(Generalized Reinforcement Policy Optimization)方法训练，该方法首次发表于DeepSeekMath论文

医学领域优化

在medmcqa-grpo医学选择题数据集上微调，针对医学领域问题有更好的表现

TRL框架训练

使用TRL(Transformer Reinforcement Learning)框架进行训练

模型能力

医学选择题回答

文本生成

医学知识推理

使用案例

医学教育

医学考试辅助

帮助医学生准备医学考试中的选择题部分

医学知识问答

回答医学相关选择题，提供解释和推理过程

🚀 Qwen3-8B-grpo-medmcqa

本项目基于预训练模型Qwen3-8B，在医学问答数据集medmcqa上进行微调，使用了GRPO方法和TRL库进行训练，可用于医学领域的问答任务。

🚀 快速开始

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="mlxha/Qwen3-8B-grpo-medmcqa", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ 主要特性

基于Qwen3-8B模型进行微调，在医学问答数据集medmcqa上训练，适用于医学领域问答。
使用GRPO方法和TRL库进行训练。

📦 安装指南

文档未提及安装步骤，暂不提供。

💻 使用示例

基础用法

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="mlxha/Qwen3-8B-grpo-medmcqa", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

高级用法

文档未提及高级用法代码示例，暂不提供。

📚 详细文档

训练过程

本模型使用GRPO方法进行训练，该方法在论文DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models中被提出。

框架版本

TRL: 0.18.0.dev0
Transformers: 4.52.0.dev0
Pytorch: 2.6.0
Datasets: 3.6.0
Tokenizers: 0.21.1

🔧 技术细节

文档未提供详细技术细节，暂不提供。

📄 许可证

此项目遵循指定的许可协议，具体请查看license文件。

📚 引用信息

GRPO引用

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

TRL引用

@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}