库名称:transformers
许可证:mit
数据集:
- bilalfaye/english-wolof-french-translation
- bilalfaye/english-wolof-french-translation-bis
语言:
- 沃洛夫语
- 法语
- 英语
评估指标:
- bleu
- chrf
基础模型:
- facebook/nllb-200-distilled-600M
任务标签:翻译
模型描述
该模型是nllb-200-distilled-600M的微调版本,专为法语-沃洛夫语和沃洛夫语-法语翻译优化。训练使用了经过深度预处理的bilalfaye/english-wolof-french-translation和bilalfaye/english-wolof-french-translation-bis数据集。
支持双向翻译:
- 沃洛夫语 → 法语
- 法语 → 沃洛夫语
- 英语 → 沃洛夫语
- 沃洛夫语 → 英语
- 法语 → 英语
- 英语 → 法语
测试应用:https://huggingface.co/spaces/bilalfaye/WoFrEn-Translator
使用方法
1. 手动推理
安装依赖库:
!pip install transformers
翻译示例代码:
from transformers import NllbTokenizer, AutoModelForSeq2SeqLM
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model_load_name = 'bilalfaye/nllb-200-distilled-600M-wo-fr-en'
model = AutoModelForSeq2SeqLM.from_pretrained(model_load_name).to(device)
tokenizer = NllbTokenizer.from_pretrained(model_load_name)
def translate(
text, src_lang='wol_Latn', tgt_lang='french_Latn',
a=32, b=3, max_input_length=1024, num_beams=4, **kwargs
):
"""将文本或文本列表转换为翻译结果"""
tokenizer.src_lang = src_lang
tokenizer.tgt_lang = tgt_lang
inputs = tokenizer(
text, return_tensors='pt', padding=True, truncation=True,
max_length=max_input_length
)
model.eval()
result = model.generate(
**inputs.to(model.device),
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
num_beams=num_beams, **kwargs
)
return tokenizer.batch_decode(result, skip_special_tokens=True)
print(translate("Ndax mën nga ko waxaat su la neexee?", src_lang="wol_Latn", tgt_lang="french_Latn")[0])
print(translate("Ndax mën nga ko waxaat su la neexee?", src_lang="wol_Latn", tgt_lang="eng_Latn")[0])
print(translate("Bonjour, où allez-vous?", src_lang="fra_Latn", tgt_lang="wol_Latn")[0])
print(translate("Bonjour, où allez-vous?", src_lang="fra_Latn", tgt_lang="eng_Latn")[0])
print(translate("Hello, how are you?", src_lang="eng_Latn", tgt_lang="wol_Latn")[0])
print(translate("Hello, how are you?", src_lang="eng_Latn", tgt_lang="fr_Latn")[0])
2. 使用Pipeline推理
安装依赖库:
!pip install transformers
Pipeline示例代码:
from transformers import pipeline
model_name = 'bilalfaye/nllb-200-distilled-600M-wo-fr-en'
device = "cuda" if torch.cuda.is_available() else "cpu"
translator = pipeline("translation", model=model_name, device=device)
print(translator("Ndax mën nga ko waxaat su la neexee?", src_lang="wol_Latn", tgt_lang="fra_Latn")[0]['translation_text'])
print(translator("Bonjour, où allez-vous?", src_lang="fra_Latn", tgt_lang="wol_Latn")[0]['translation_text'])
依赖版本
开发测试环境版本:
- transformers: 4.41.2
- torch: 2.4.0+cu121
- datasets: 3.2.0
- sentencepiece: 0.2.0
- sacrebleu: 2.5.1
作者
比拉·法耶
如有问题或改进建议,欢迎联系!