提示

您现在可以使用 ONNX 和 OpenVINO 后端用于 Sentence Transformer 模型。阅读 SentenceTransformer > 使用方法 > 加速推理 以了解有关新后端以及它们对您的推理速度的意义的更多信息。

提示

Sentence Transformers v3.4 刚刚发布,引入了使用 PEFT 进行训练的文档。阅读 SentenceTransformer > 训练示例 > 使用 PEFT Adapters 进行训练 以了解有关如何在不微调所有模型参数的情况下训练嵌入模型的更多信息。

SentenceTransformers 文档

Sentence Transformers (又名 SBERT) 是用于访问、使用和训练最先进的文本和图像嵌入模型的首选 Python 模块。它可用于使用 Sentence Transformer 模型 (快速入门) 计算嵌入,或使用 Cross-Encoder 模型 (快速入门) 计算相似度分数。 这解锁了广泛的应用,包括 语义搜索语义文本相似度释义挖掘

超过 5,000 个预训练 Sentence Transformers 模型 的广泛选择可在 🤗 Hugging Face 上立即使用,包括来自 Massive Text Embeddings Benchmark (MTEB) 排行榜 的许多最先进的模型。 此外,使用 Sentence Transformers 训练或微调您自己的模型 非常容易,使您能够为您的特定用例创建自定义模型。

Sentence Transformers 由 UKPLab 创建,并由 🤗 Hugging Face 维护。 如果某些内容已损坏或您有其他问题,请随时在 Sentence Transformers 存储库 上打开一个 issue。

使用方法

另请参阅

请参阅 快速入门,了解有关如何使用 Sentence Transformers 的更多快速信息。

使用 Sentence Transformer 模型非常简单

from sentence_transformers import SentenceTransformer

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
#         [0.6660, 1.0000, 0.1411],
#         [0.1046, 0.1411, 1.0000]])

下一步是什么?

考虑阅读以下部分之一以回答相关问题

引用

如果您觉得这个存储库有帮助,请随意引用我们的出版物 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

@inproceedings{reimers-2019-sentence-bert,
  title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
  author = "Reimers, Nils and Gurevych, Iryna",
  booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
  month = "11",
  year = "2019",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/1908.10084",
}

如果您使用多语言模型之一,请随意引用我们的出版物 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

@inproceedings{reimers-2020-multilingual-sentence-bert,
  title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
  author = "Reimers, Nils and Gurevych, Iryna",
  booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
  month = "11",
  year = "2020",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/2004.09813",
}

如果您将代码用于 数据增强,请随意引用我们的出版物 Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

@inproceedings{thakur-2020-AugSBERT,
  title = "Augmented {SBERT}: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
  author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes  and Gurevych, Iryna",
  booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
  month = jun,
  year = "2021",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2021.naacl-main.28",
  pages = "296--310",
}