数据集:
- davidrrobinson/AnimalSpeak
BioLingual模型卡片
BioLingual模型卡片:基于人类语言监督的可迁移生物声学模型
一个基于对比语言-音频预训练的生物声学音频-文本模型。
使用方法
您可以使用该模型进行生物声学零样本音频分类,或针对生物声学任务进行微调。
用途
执行零样本音频分类
使用pipeline
from datasets import load_dataset
from transformers import pipeline
dataset = load_dataset("ashraq/esc50")
audio = dataset["train"]["audio"][-1]["array"]
audio_classifier = pipeline(task="zero-shot-audio-classification", model="davidrrobinson/BioLingual")
output = audio_classifier(audio, candidate_labels=["抹香鲸的声音", "海狮的声音"])
print(output)
>>> [{"score": 0.999, "label": "狗的声音"}, {"score": 0.001, "label": "吸尘器的声音"}]
运行模型:
您也可以使用ClapModel
获取音频和文本嵌入
在CPU上运行模型:
from datasets import load_dataset
from transformers import ClapModel, ClapProcessor
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = librispeech_dummy[0]
model = ClapModel.from_pretrained("laion/clap-htsat-unfused")
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt")
audio_embed = model.get_audio_features(**inputs)
在GPU上运行模型:
from datasets import load_dataset
from transformers import ClapModel, ClapProcessor
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = librispeech_dummy[0]
model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(0)
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt").to(0)
audio_embed = model.get_audio_features(**inputs)