stanford-deidentifier-only-radiology-reports开源系统 - 自动对放射报告去标识化，保护隐私

首页

Stanford Deidentifier Only Radiology Reports

由 StanfordAIMI 开发

基于转换器与规则方法的放射报告自动化去标识化系统，能检测PHI实体并用拟真值替换

序列标注

Transformers

英语开源协议:MIT #放射报告去标识化 #PHI实体检测 #生物医学NLP

下载量 26

发布时间 : 6/9/2022

模型简介

该模型专门用于医疗放射学报告的去标识化处理，通过结合PubMedBERT转换器模型和规则方法，自动检测受保护健康信息(PHI)并进行安全替换，符合HIPAA隐私标准。

模型特点

混合方法架构

结合PubMedBERT转换器模型与'隐于市'规则方法，实现高精度PHI检测与替换

多机构验证

在6193份跨机构医疗文档上训练，包含胸片、CT报告和医疗记录

生产级准确度

在已知机构放射报告上达到97.9 F1值，新机构测试集达99.6 F1值

模型能力

医疗实体识别

受保护健康信息检测

拟真值替换

放射报告处理

跨机构泛化

使用案例

医疗数据隐私保护

放射报告去标识化

自动识别并替换患者姓名、医生姓名、联系方式等PHI信息

在i2b2 2014数据集上超越人工标注性能

多中心研究数据共享

安全处理跨机构医疗文档以符合隐私法规要求

支持MedClinical等医疗数据传输系统

🚀 斯坦福去标识符工具

斯坦福去标识符工具在多种放射学和生物医学文档上进行了训练，旨在自动化去识别过程，同时达到足以用于生产环境的令人满意的准确率。相关论文正在发表中。

🚀 快速开始

该工具主要用于放射学和生物医学文档的去识别处理，通过训练模型自动检测并替换受保护的健康信息（PHI）实体。

✨ 主要特性

多领域适用性：在多种放射学和生物医学文档上进行训练，适用于不同类型的医疗报告。
高精度：在多个测试集上取得了较高的F1分数，能够准确检测和替换PHI实体。
自动化处理：实现了去识别过程的自动化，提高了处理效率。

📚 详细文档

技术标签

该项目涉及以下技术标签：

token-classification
sequence-tagger-model
pytorch
transformers
pubmedbert
uncased
radiology
biomedical

示例报告内容

以下是一些示例报告的内容：

检查流程：胸部X光检查。对比：上次检查于2020年1月1日，还有2019年3月1日的记录。检查结果：片状肺野模糊影。印象：2020年1月1日的胸部X光检查结果最令人担忧。患者被转至UH医疗中心的另一个科室，由Perez医生负责。我们于2020年2月1日使用MedClinical数据传输系统发送了数据，ID为5874233。我们收到了Perez医生的确认信息。他的联系电话是567 - 493 - 1234。
Curt Langlotz医生选择在6月23日安排一次会议。

关联仓库

关联的GitHub仓库：https://github.com/MIDRC/Stanford_Penn_Deidentifier

📄 许可证

本项目采用MIT许可证。

📚 引用

如果您使用了本项目，请引用以下论文：

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}