🚀 JPharmatron-7B-base
JPharmatron-7B-base是一个70亿参数的大语言模型,专为制药应用和研究而设计。
🚀 快速开始
本模型未经过包括指令微调在内的任何后训练。因此,不建议直接将此模型用于下游任务。此外,该模型未经过医疗用途或其他对风险敏感的用途验证。
✨ 主要特性
- 领域针对性:专为制药应用和研究设计。
- 多语言支持:支持日语和英语。
📚 详细文档
🔍 模型详情
模型描述
JPharmatron-7B-base基于Qwen2.5-7B,使用来自日语数据集的20亿个标记进行持续预训练。
- 开发者:EQUES Inc.
- 资助方:GENIAC项目
- 模型类型:自回归解码器
- 支持语言:日语、英语
- 许可证:CC-BY-SA-4.0
模型来源
📖 引用信息
BibTeX:
@misc{sukeda_japanese_2025,
title = {A {Japanese} {Language} {Model} and {Three} {New} {Evaluation} {Benchmarks} for {Pharmaceutical} {NLP}},
url = {http://arxiv.org/abs/2505.16661},
doi = {10.48550/arXiv.2505.16661},
abstract = {We present a Japanese domain-specific language model for the pharmaceutical field, developed through continual pretraining on 2 billion Japanese pharmaceutical tokens and 8 billion English biomedical tokens. To enable rigorous evaluation, we introduce three new benchmarks: YakugakuQA, based on national pharmacist licensing exams; NayoseQA, which tests cross-lingual synonym and terminology normalization; and SogoCheck, a novel task designed to assess consistency reasoning between paired statements. We evaluate our model against both open-source medical LLMs and commercial models, including GPT-4o. Results show that our domain-specific model outperforms existing open models and achieves competitive performance with commercial ones, particularly on terminology-heavy and knowledge-based tasks. Interestingly, even GPT-4o performs poorly on SogoCheck, suggesting that cross-sentence consistency reasoning remains an open challenge. Our benchmark suite offers a broader diagnostic lens for pharmaceutical NLP, covering factual recall, lexical variation, and logical consistency. This work demonstrates the feasibility of building practical, secure, and cost-effective language models for Japanese domain-specific applications, and provides reusable evaluation resources for future research in pharmaceutical and healthcare NLP. Our model, codes, and datasets are released at https://github.com/EQUES-Inc/pharma-LLM-eval.},
urldate = {2025-05-30},
publisher = {arXiv},
author = {Sukeda, Issey and Fujii, Takuro and Buma, Kosei and Sasaki, Shunsuke and Ono, Shinnosuke},
month = may,
year = {2025},
note = {arXiv:2505.16661 [cs]},
annote = {Comment: 15 pages, 9 tables, 5 figures}
}
ℹ️ 更多信息
请参阅我们的预印本:A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP。
👥 模型卡片作者
@shinnosukeono
📄 许可证
本模型使用CC-BY-SA-4.0许可证。
📋 信息表格