开源泰卢固语BERT模型telugu - sentence - bert - nli，免费用于句子相似度计算

首页

Telugu Sentence Bert Nli

由 l3cube-pune 开发

这是一个基于NLI数据集训练的泰卢固语BERT模型，用于句子相似度计算。

文本嵌入

Transformers

其他#泰卢固语句子相似度 #多语言NLI训练 #印度语言支持

下载量 26

发布时间 : 3/4/2023

模型简介

该模型是基于泰卢固语BERT（l3cube-pune/telugu-bert）训练的句子转换器，主要用于特征提取和句子相似度计算。作为MahaNLP项目的一部分发布。

模型特点

泰卢固语专用

专门针对泰卢固语优化的句子嵌入模型

基于NLI训练

使用自然语言推理(NLI)数据集进行训练，提高句子表示质量

句子相似度计算

能够有效计算泰卢固语句子之间的语义相似度

模型能力

句子特征提取

句子相似度计算

语义搜索

使用案例

文本处理

语义搜索

在泰卢固语文档中查找语义相似的句子

文本聚类

将语义相似的泰卢固语句子分组

问答系统

用于匹配问题和答案的语义相似度

🚀 TeluguSBERT

这是一个在NLI数据集上训练的泰卢固语BERT模型（l3cube - pune/telugu - bert）。
作为MahaNLP项目的一部分发布：https://github.com/l3cube - pune/MarathiNLP
一个支持主要印度语言和跨语言能力的该模型多语言版本可在此处查看 indic - sentence - bert - nli

一个更好的句子相似度模型（此模型的微调版本）可在此处查看：https://huggingface.co/l3cube - pune/telugu - sentence - similarity - sbert

关于数据集、模型和基线结果的更多详细信息可在我们的论文中找到。

🚀 快速开始

模型基础信息

属性	详情
模型类型	句子相似度模型
训练数据	NLI数据集

示例展示

以下是一些使用示例：

pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
license: cc-by-4.0
language: te
widget:
- source_sentence: "ఒక మహిళ ఉల్లిపాయను కత్తిస్తోంది"
  sentences:
    - "ఒక స్త్రీ ఉల్లిపాయలు కోస్తోంది" 
    - "ఒక స్త్రీ బంగాళాదుంపను తొక్కడం"
    - "ఒక పిల్లి ఇంటి చుట్టూ నడుస్తోంది"
  example_title: "Example 1"

- source_sentence: "పిల్లల బృందం జంపింగ్ పోటీని నిర్వహిస్తోంది"
  sentences:
    - "పిల్లల గుంపు సరదాగా గడుపుతోంది"
    - "పిల్లలు పార్కులో ఆడుకోవడానికి ఇష్టపడతారు"
    - "ముగ్గురు అబ్బాయిలు నడుస్తున్నారు"
  example_title: "Example 2"

- source_sentence: "మీ రెండు ప్రశ్నలకు అవుననే సమాధానం వస్తుంది"
  sentences:
    - "రెండు ప్రశ్నలకు అవుననే సమాధానం వస్తోంది"
    - "మేము మీ అన్ని ప్రశ్నలకు సమాధానమిచ్చాము"
    - "నేను ఈ ప్రశ్నకు సమాధానం ఇస్తాను"
  example_title: "Example 3"

📦 安装指南

当你安装了sentence - transformers后，使用此模型会变得很容易：

pip install -U sentence-transformers

💻 使用示例

基础用法（Sentence - Transformers）

安装好sentence - transformers后，你可以像这样使用模型：

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

高级用法（HuggingFace Transformers）

如果你没有安装sentence - transformers，可以按以下方式使用模型：首先，将输入传递给transformer模型，然后对上下文词嵌入应用正确的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch


def cls_pooling(model_output, attention_mask):
    return model_output[0][:,0]


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, cls pooling.
sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 详细文档

引用信息

@article{deode2023l3cube,
  title={L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT},
  author={Deode, Samruddhi and Gadre, Janhavi and Kajale, Aditi and Joshi, Ananya and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2304.11434},
  year={2023}
}

@article{joshi2022l3cubemahasbert,
  title={L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
  author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2211.11187},
  year={2022}
}

其他单语言印度句子BERT模型

[马拉地语SBERT](https://huggingface.co/l3cube - pune/marathi - sentence - bert - nli)
[印地语SBERT](https://huggingface.co/l3cube - pune/hindi - sentence - bert - nli)
[卡纳达语SBERT](https://huggingface.co/l3cube - pune/kannada - sentence - bert - nli)
[泰卢固语SBERT](https://huggingface.co/l3cube - pune/telugu - sentence - bert - nli)
[马拉雅拉姆语SBERT](https://huggingface.co/l3cube - pune/malayalam - sentence - bert - nli)
[泰米尔语SBERT](https://huggingface.co/l3cube - pune/tamil - sentence - bert - nli)
[古吉拉特语SBERT](https://huggingface.co/l3cube - pune/gujarati - sentence - bert - nli)
[奥里亚语SBERT](https://huggingface.co/l3cube - pune/odia - sentence - bert - nli)
[孟加拉语SBERT](https://huggingface.co/l3cube - pune/bengali - sentence - bert - nli)
[旁遮普语SBERT](https://huggingface.co/l3cube - pune/punjabi - sentence - bert - nli)
[印度SBERT（多语言）](https://huggingface.co/l3cube - pune/indic - sentence - bert - nli)

其他单语言相似度模型

[马拉地语相似度模型](https://huggingface.co/l3cube - pune/marathi - sentence - similarity - sbert)
[印地语相似度模型](https://huggingface.co/l3cube - pune/hindi - sentence - similarity - sbert)
[卡纳达语相似度模型](https://huggingface.co/l3cube - pune/kannada - sentence - similarity - sbert)
[泰卢固语相似度模型](https://huggingface.co/l3cube - pune/telugu - sentence - similarity - sbert)
[马拉雅拉姆语相似度模型](https://huggingface.co/l3cube - pune/malayalam - sentence - similarity - sbert)
[泰米尔语相似度模型](https://huggingface.co/l3cube - pune/tamil - sentence - similarity - sbert)
[古吉拉特语相似度模型](https://huggingface.co/l3cube - pune/gujarati - sentence - similarity - sbert)
[奥里亚语相似度模型](https://huggingface.co/l3cube - pune/odia - sentence - similarity - sbert)
[孟加拉语相似度模型](https://huggingface.co/l3cube - pune/bengali - sentence - similarity - sbert)
[旁遮普语相似度模型](https://huggingface.co/l3cube - pune/punjabi - sentence - similarity - sbert)
[印度相似度模型（多语言）](https://huggingface.co/l3cube - pune/indic - sentence - similarity - sbert)