license: apache-2.0
language:
- ta
pipeline_tag: text-to-speech
Nidum-马杜赖-泰米尔语-TTS
🔊 泰米尔语文本转语音(TTS)模型 由 Nidum 开发
🧪 在线演示: 在Hugging Face Spaces上试用

🗣️ 概述
这是由Nidum开发的高质量泰米尔语文本转语音(TTS)模型。它能将输入的泰米尔语文本转换为清晰、自然的语音输出,适用于语音助手、屏幕阅读器、语言学习应用和内容朗读等场景。
🚀 功能特点
- ✅ 将泰米尔语文本转换为语音
- ✅ 自然且富有表现力的声音
- ✅ 可选择男声或女声
- ✅ 通过Hugging Face Spaces提供易于使用的演示
🧪 在线演示
👉 点击此处试用演示
输入泰米尔语文本,选择说话人,点击生成,即可立即收听!
� 说话人选项
说话人ID |
声音类型 |
0 speaker |
男声 |
1 speaker |
女声 |
在提示中使用适当的说话人ID,例如:
0 speaker: வணக்கம்!
💻 使用示例(代码)
import torch
import soundfile as sf
from transformers import AutoModelForCausalLM, AutoTokenizer
from snac import SNAC
fine_tuned_checkpoint = "<Model_ID>"
print("加载模型中...")
model = AutoModelForCausalLM.from_pretrained(fine_tuned_checkpoint, torch_dtype=torch.bfloat16).cuda()
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_checkpoint)
print("加载SNAC模型中...")
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").to("cpu")
prompts = [
"0 speaker: வணக்கம்! இந்த பயன்பாட்டை பயன்படுத்தி உங்கள் உரையை குரலாக்கலாம்."
]
all_input_ids = [tokenizer(p, return_tensors="pt").input_ids for p in prompts]
start_token = torch.tensor([[128259]], dtype=torch.int64)
end_tokens = torch.tensor([[128009, 128260]], dtype=torch.int64)
all_modified_input_ids = [torch.cat([start_token, ids, end_tokens], dim=1) for ids in all_input_ids]
max_length = max([ids.shape[1] for ids in all_modified_input_ids])
all_padded_tensors, all_attention_masks = [], []
for modified_input_ids in all_modified_input_ids:
padding = max_length - modified_input_ids.shape[1]
padded_tensor = torch.cat([torch.full((1, padding), 128263, dtype=torch.int64), modified_input_ids], dim=1)
attention_mask = torch.cat([torch.zeros((1, padding), dtype=torch.int64), torch.ones((1, modified_input_ids.shape[1]), dtype=torch.int64)], dim=1)
all_padded_tensors.append(padded_tensor)
all_attention_masks.append(attention_mask)
input_ids = torch.cat(all_padded_tensors, dim=0).cuda()
attention_mask = torch.cat(all_attention_masks, dim=0).cuda()
print("生成语音中...")
with torch.no_grad():
generated_ids = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_new_tokens=4800,
do_sample=True,
temperature=1,
top_p=1,
repetition_penalty=1.1,
num_return_sequences=1,
eos_token_id=128258,
)
print("解析输出...")
token_indices = (generated_ids == 128257).nonzero(as_tuple=True)
if len(token_indices[1]) > 0:
last_occurrence_idx = token_indices[1][-1].item()
cropped_tensor = generated_ids[:, last_occurrence_idx + 1:]
else:
cropped_tensor = generated_ids
processed_rows = [row[row != 128258] for row in cropped_tensor]
code_lists = []
for row in processed_rows:
row_length = row.size(0)
new_length = (row_length // 7) * 7
trimmed_row = row[:new_length] - 128266
code_lists.append(trimmed_row.tolist())
def redistribute_codes(code_list):
layer_1, layer_2, layer_3 = [], [], []
for i in range((len(code_list) + 1) // 7):
layer_1.append(code_list[7 * i])
layer_2.append(code_list[7 * i + 1] - 4096)
layer_3.append(code_list[7 * i + 2] - (2 * 4096))
layer_3.append(code_list[7 * i + 3] - (3 * 4096))
layer_2.append(code_list[7 * i + 4] - (4 * 4096))
layer_3.append(code_list[7 * i + 5] - (5 * 4096))
layer_3.append(code_list[7 * i + 6] - (6 * 4096))
codes = [
torch.tensor(layer_1).unsqueeze(0),
torch.tensor(layer_2).unsqueeze(0),
torch.tensor(layer_3).unsqueeze(0)
]
return snac_model.decode(codes)
print("解码语音中...")
audio_samples = [redistribute_codes(codes) for codes in code_lists]
for i, samples in enumerate(audio_samples):
audio_data = samples.detach().squeeze().to("cpu").numpy()
sf.write(f"output_{i}.wav", audio_data, samplerate=24000)
print(f"音频{i}已保存为output_{i}.wav")
print("完成!")
📬 联系方式
如有问题、反馈或合作意向:
📧 info@nidum.ai