audio - transcribe开源语音转录模型，免费将音频文件快速转为文本

首页

Audio Transcribe

由 washeed 开发

这是一个基于Transformers的自动语音识别(ASR)模型，用于将音频文件转录为文本。

语音识别 #语音转文本 #多语言转录 #高精度时间戳

下载量 257

发布时间 : 2/3/2024

模型简介

该模型主要用于语音到文本的转录任务，支持高效批处理和时间戳返回功能。

模型特点

高效批处理

支持批量处理音频文件，提高处理效率

时间戳返回

可以返回转录文本对应的时间戳信息

GPU加速

支持CUDA加速，提高推理速度

模型能力

音频转录

语音识别

时间戳生成

使用案例

会议记录

自动会议纪要

将会议录音自动转换为文字记录

提高会议记录效率，减少人工转录时间

媒体制作

视频字幕生成

为视频内容自动生成字幕

简化字幕制作流程

🚀 音频转录模型运行指南

本指南将详细介绍如何安装必要的工具和依赖项，并运行音频转录模型。

🚀 快速开始

安装Chocolatey

在命令提示符（CMD）中运行以下命令来安装Chocolatey：

@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "[System.Net.ServicePointManager]::SecurityProtocol = 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"

安装FFmpeg

安装Chocolatey后，在CMD中运行以下命令来安装FFmpeg：

choco install ffmpeg

安装Python依赖项

在Python集成开发环境（IDE）中运行以下命令来安装所需的依赖项：

pip install --upgrade pip
pip install --upgrade git+https://github.com/huggingface/transformers.git accelerate datasets[audio]

模型推理

运行以下代码来进行模型推理：

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "washeed/audio-transcribe"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

result = pipe("audio.mp3")
print(result["text"])

转录而非翻译

如果您想进行转录而不是翻译，只需将以下代码：

result = pipe("audio.mp3")

替换为：

result = pipe("inference.mp3", generate_kwargs={"task": "transcribe"})

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

问答系统中文

uer

2,694

智启未来，您的人工智能解决方案智库

简体中文