stanford - deidentifier - only - i2b2开源模型 - 自动去除放射学报告敏感信息，精准高效！

首页

Stanford Deidentifier Only I2b2

由 StanfordAIMI 开发

基于转换器模型的放射学报告自动去标识化系统，结合规则方法实现高精度PHI识别与替换

序列标注

Transformers

英语开源协议:MIT #放射报告去标识化 #PHI高精度检测 #生物医学文本处理

下载量 98

发布时间 : 6/9/2022

模型简介

专门用于生物医学放射报告的去标识化处理，能自动检测受保护健康信息(PHI)并替换为仿真内容，满足HIPAA隐私要求

模型特点

高精度PHI检测

在已知机构放射报告上达到97.9 F1分数，新机构达99.6，超越人工标注水平

跨机构适应性

在i2b2 2006/2014等多个测试集上验证了优秀的泛化能力

混合方法设计

结合PubMedBERT转换器模型与'大隐于市'规则方法，实现精准识别与自然替换

大规模训练数据

基于6193份多机构跨领域文档(含6193份放射报告和医疗记录)训练

模型能力

放射报告PHI实体识别

受保护健康信息自动替换

多类型PHI检测(日期、医生姓名、机构等)

跨机构文档处理

使用案例

医疗隐私保护

放射报告去标识化

自动处理胸部X光/CT报告中的敏感信息

PHI核心内容识别召回率达99.1%

研究数据共享

为医学研究提供符合HIPAA标准的匿名化数据

在i2b2 2014数据上超越人工标注水平

医疗信息系统

电子病历处理

集成到医疗信息系统实现自动化去标识流程

支持MedClinical等医疗数据传输系统

🚀 斯坦福去标识符工具

斯坦福去标识符工具在多种放射学和生物医学文档上进行训练，旨在自动化去识别过程，同时达到足以投入生产使用的令人满意的准确率。相关论文正在发表中。

🚀 快速开始

斯坦福去标识符工具可用于对放射学和生物医学文档进行去识别处理，以保护患者的健康信息。

✨ 主要特性

多类型文档支持：可处理多种放射学和生物医学文档。
自动化去识别：能够自动检测受保护的健康信息（PHI）实体，并将其替换为看似合理的替代信息。
高准确率：在多个测试集上取得了出色的F1分数，能有效识别和处理PHI信息。

📚 详细文档

示例数据

widget:
- text: "PROCEDURE: Chest xray. COMPARISON: last seen on 1/1/2020 and also record dated of March 1st, 2019. FINDINGS: patchy airspace opacities. IMPRESSION: The results of the chest xray of January 1 2020 are the most concerning ones. The patient was transmitted to another service of UH Medical Center under the responsability of Dr. Perez. We used the system MedClinical data transmitter and sent the data on 2/1/2020, under the ID 5874233. We received the confirmation of Dr Perez. He is reachable at 567-493-1234."
- text: "Dr. Curt Langlotz chose to schedule a meeting on 06/23."
tags:
- token-classification
- sequence-tagger-model
- pytorch
- transformers
- pubmedbert
- uncased
- radiology
- biomedical
datasets:
- radreports
language:
  - en
license: mit

关联仓库

关联的GitHub仓库：https://github.com/MIDRC/Stanford_Penn_Deidentifier

引用信息

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}