pyannote-segmentation开源说话人分割模型 - 免费实现语音活动、重叠检测等任务

首页

Pyannote Segmentation

由 philschmid 开发

这是一个端到端的说话人分割模型，支持语音活动检测、重叠语音检测和重分割任务。

说话人处理

PyTorch

开源协议:MIT #重叠语音检测 #说话人分割 #端到端重分割

下载量 427

发布时间 : 11/8/2022

模型简介

该模型主要用于音频处理中的说话人分割任务，能够检测语音活动、识别重叠语音区域，并支持对基线分割结果进行优化重分割。

模型特点

端到端说话人分割

采用端到端架构直接处理说话人分割任务，简化处理流程

重叠语音检测

能够准确识别音频中多个说话人同时讲话的重叠区域

重分割优化

可对基线分割结果进行优化，提高分割准确性

多数据集验证

在AMI、DIHARD3和VoxConverse等多个标准数据集上验证效果

模型能力

语音活动检测

重叠语音识别

说话人分割优化

音频特征提取

使用案例

会议记录

会议语音分割

自动分割会议录音中的不同说话人片段

在AMI数据集上验证有效

语音分析

重叠语音检测

识别对话中多人同时说话的情况

在DIHARD3数据集上验证有效

语音处理优化

分割结果优化

对现有语音分割结果进行优化改进

在VoxConverse数据集上验证有效

🚀 🎹 说话人分割

本模型用于解决音频中说话人分割的问题，能够进行语音活动检测、重叠语音检测和重分割等操作，为音频处理提供了有效的解决方案。

🚀 快速开始

本模型来自Hervé Bredin和Antoine Laurent的论文 End-to-end speaker segmentation for overlap-aware resegmentation 。

你可以在 Hugging Face Space 上查看在线演示。

✨ 主要特性

多任务处理：支持语音活动检测、重叠语音检测和重分割等多种任务。
基于研究成果：依托于相关学术研究，具有一定的科学性和有效性。

📦 安装指南

本模型依赖于目前正在开发中的 pyannote.audio 2.0，具体安装说明请参考安装指南。

💻 使用示例

基础用法

语音活动检测

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
HYPER_PARAMETERS = {
  # onset/offset activation thresholds
  "onset": 0.5, "offset": 0.5,
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

重叠语音检测

from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions

重分割

from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation="pyannote/segmentation", 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# where `baseline` should be provided as a pyannote.core.Annotation instance

原始得分

from pyannote.audio import Inference
inference = Inference("pyannote/segmentation")
segmentation = inference("audio.wav")
# `segmentation` is a pyannote.core.SlidingWindowFeature
# instance containing raw segmentation scores like the 
# one pictured above (output)

高级用法

为了复现论文 "End-to-end speaker segmentation for overlap-aware resegmentation" 的结果，使用 pyannote/segmentation@Interspeech2021 并搭配以下超参数：

语音活动检测

数据集	`onset`	`offset`	`min_duration_on`	`min_duration_off`
AMI Mix-Headset	0.684	0.577	0.181	0.037
DIHARD3	0.767	0.377	0.136	0.067
VoxConverse	0.767	0.713	0.182	0.501

重叠语音检测

数据集	`onset`	`offset`	`min_duration_on`	`min_duration_off`
AMI Mix-Headset	0.448	0.362	0.116	0.187
DIHARD3	0.430	0.320	0.091	0.144
VoxConverse	0.587	0.426	0.337	0.112

VBx重分割

数据集	`onset`	`offset`	`min_duration_on`	`min_duration_off`
AMI Mix-Headset	0.542	0.527	0.044	0.705
DIHARD3	0.592	0.489	0.163	0.182
VoxConverse	0.537	0.724	0.410	0.563

预期输出（和VBx基线）也在 /reproducible_research 子目录中提供。

📚 详细文档

支持信息

商业咨询和科学咨询：请通过邮箱联系我。
技术问题和错误报告：请查看 pyannote.audio 的Github仓库，在讨论区提出技术问题，在问题区报告错误。

📄 许可证

本项目采用MIT许可证。

🔖 引用信息

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}