StyleDistance开源风格嵌入模型 - 让写作风格相似文本更靠近、差异文本分开

首页

Styledistance

由 StyleDistance 开发

StyleDistance是一种风格嵌入模型，旨在将写作风格相似的文本紧密嵌入，不同风格的文本远离嵌入，而不受内容影响。

文本嵌入

Safetensors

英语开源协议:MIT #风格嵌入 #内容无关风格分析 #作者身份识别

下载量 492

发布时间 : 7/17/2024

模型简介

该模型可用于文本风格分析、聚类、作者身份识别与验证任务，以及自动风格转换评估。

模型特点

内容无关的风格嵌入

能够将写作风格相似的文本紧密嵌入，不同风格的文本远离嵌入，而不受内容影响。

合成数据训练

基于SynthSTEL合成数据集进行对比训练，包含40种风格特征在文本中使用的正负例样本。

风格分析能力强

相比现有风格嵌入模型能实现更强的内容独立性，适用于多种风格相关任务。

模型能力

文本风格分析

文本聚类

作者身份识别

作者身份验证

自动风格转换评估

使用案例

文本分析

作者身份识别

通过分析文本风格特征，识别文本的作者身份。

风格转换评估

评估自动风格转换系统的效果，比较转换前后的风格差异。

教育研究

写作风格分析

分析学生的写作风格变化，提供个性化的写作指导。

🚀 StyleDistance 风格嵌入模型

StyleDistance 是一个 风格嵌入模型，旨在将具有相似写作风格的文本紧密嵌入，而将不同风格的文本嵌入得相距较远，且不受内容影响。该模型可用于文本的风格分析、聚类、作者身份识别与验证任务，以及自动风格迁移评估。

🚀 快速开始

本仓库包含在 StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples 中介绍的模型。

✨ 主要特性

StyleDistance 是一种风格嵌入模型，能够将具有相似写作风格的文本紧密嵌入，而将不同风格的文本嵌入得相距较远，不受内容的影响。该模型可用于文本的风格分析、聚类、作者身份识别和验证任务，以及自动风格迁移评估。

📦 安装指南

暂未提供安装步骤相关内容。

💻 使用示例

基础用法

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

model = SentenceTransformer('StyleDistance/styledistance') # Load model

input = model.encode("Did you hear about the Wales wing? He'll h8 2 withdraw due 2 injuries from future competitions.")
others = model.encode(["We're raising funds 2 improve our school's storage facilities and add new playground equipment!", "Did you hear about the Wales wing? He'll hate to withdraw due to injuries from future competitions."])
print(cos_sim(input, others))

📚 详细文档

模型信息

属性	详情
基础模型	FacebookAI/roberta-base
数据集	SynthSTEL/styledistance_training_triplets、StyleDistance/synthstel
语言	en
库名称	sentence-transformers
许可证	mit
任务类型	句子相似度
标签	datadreamer、datadreamer - 0.35.0、synthetic、sentence-transformers、feature-extraction、sentence-similarity

示例展示

示例 1
- 源句子：Did you hear about the Wales wing? He'll h8 2 withdraw due 2 injuries from future competitions.
- 对比句子：
  - We're raising funds 2 improve our school's storage facilities and add new playground equipment!
  - Did you hear about the Wales wing? He'll hate to withdraw due to injuries from future competitions.
示例 2
- 源句子：You planned the DesignMeets Decades of Design event; you executed it perfectly.
- 对比句子：
  - We'll find it hard to prove the thief didn't face a real threat!
  - You orchestrated the DesignMeets Decades of Design gathering; you actualized it flawlessly.
示例 3
- 源句子：Did the William Barr maintain a commitment to allow Robert Mueller to finish the inquiry?
- 对比句子：
  - Will the artist be compiling a music album, or will there be a different focus in the future?
  - Did William Barr maintain commitment to allow Robert Mueller to finish inquiry?

训练数据和变体

StyleDistance 在 SynthSTEL 上进行了对比训练，这是一个合成生成的数据集，包含 40 种风格特征在文本中使用的正例和反例。通过利用这个合成数据集，StyleDistance 比目前其他风格嵌入模型能够实现更强的内容独立性。这个特定的模型是使用合成数据集和 [一个利用 Reddit 上的作者身份数据集来训练风格嵌入的真实数据集](https://aclanthology.org/2022.repl4nlp - 1.26/) 组合训练的。如需仅使用合成数据训练的版本，请查看 StyleDistance 的另一个版本。

🔧 技术细节

暂未提供技术细节相关内容。

📄 许可证

本模型使用 MIT 许可证。

📖 引用

@misc{patel2025styledistancestrongercontentindependentstyle,
      title={StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples}, 
      author={Ajay Patel and Jiacheng Zhu and Justin Qiu and Zachary Horvitz and Marianna Apidianaki and Kathleen McKeown and Chris Callison-Burch},
      year={2025},
      eprint={2410.12757},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.12757}, 
}