🚀 Wav2Vec2-Base-TIMIT
该项目是在timit_asr数据集上对facebook/wav2vec2-base进行微调得到的语音识别模型。使用此模型时,请确保语音输入采样率为16kHz。
🚀 快速开始
本模型可以直接使用(无需语言模型),以下是使用示例:
import soundfile as sf
import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
model_name = "elgeish/wav2vec2-base-timit-asr"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2ForCTC.from_pretrained(model_name)
model.eval()
dataset = load_dataset("timit_asr", split="test").shuffle().select(range(10))
char_translations = str.maketrans({"-": " ", ",": "", ".": "", "?": ""})
def prepare_example(example):
example["speech"], _ = sf.read(example["file"])
example["text"] = example["text"].translate(char_translations)
example["text"] = " ".join(example["text"].split())
example["text"] = example["text"].lower()
return example
dataset = dataset.map(prepare_example, remove_columns=["file"])
inputs = processor(dataset["speech"], sampling_rate=16000, return_tensors="pt", padding="longest")
with torch.no_grad():
predicted_ids = torch.argmax(model(inputs.input_values).logits, dim=-1)
predicted_ids[predicted_ids == -100] = processor.tokenizer.pad_token_id
predicted_transcripts = processor.tokenizer.batch_decode(predicted_ids)
for reference, predicted in zip(dataset["text"], predicted_transcripts):
print("reference:", reference)
print("predicted:", predicted)
print("--")
以下是上述代码的输出示例:
reference: she had your dark suit in greasy wash water all year
predicted: she had your dark suit in greasy wash water all year
--
reference: where were you while we were away
predicted: where were you while we were away
--
reference: cory and trish played tag with beach balls for hours
predicted: tcory and trish played tag with beach balls for hours
--
reference: tradition requires parental approval for under age marriage
predicted: tradition requires parrental proval for under age marrage
--
reference: objects made of pewter are beautiful
predicted: objects made of puder are bautiful
--
reference: don't ask me to carry an oily rag like that
predicted: don't o ask me to carry an oily rag like that
--
reference: cory and trish played tag with beach balls for hours
predicted: cory and trish played tag with beach balls for ours
--
reference: don't ask me to carry an oily rag like that
predicted: don't ask me to carry an oily rag like that
--
reference: don't do charlie's dirty dishes
predicted: don't do chawly's tirty dishes
--
reference: only those story tellers will remain who can imitate the style of the virtuous
predicted: only those story tillaers will remain who can imvitate the style the virtuous
📚 详细文档
你可以在此处找到用于训练此模型的微调脚本。
📄 许可证
本项目采用Apache-2.0许可证。
属性 |
详情 |
数据集 |
timit_asr |
标签 |
音频、自动语音识别、语音 |
许可证 |
Apache-2.0 |