from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Milos/slovak-gpt-j-405M")
model = AutoModelForCausalLM.from_pretrained("Milos/slovak-gpt-j-405M")
生成文本时注意三点:
勿留尾部空格(如"slovenčinu"和"slovenčinu "编码不同)
使用美式英文双引号""
换行时使用\n\n而非单\n
示例:
>>> prompt = "Tradičné jedlo na Orave sú">>> encoded_input = tokenizer(prompt, return_tensors='pt')
>>> output = model.generate(**encoded_input)
>>> tokenizer.decode(output[0])
'Tradičné jedlo na Orave sú bryndzové halušky\n\nNa Orave sa v minulosti varilo viac druhov'