nlp——SentenceTransformer使用例子

Hugging Face官网下载sentence-transformers模型

1、导入所需要的库

from transformers import AutoTokenizer, AutoModel
import numpy as np
import torch
import torch.nn.functional as F

2、加载预训练模型

path = 'D:/Model/sentence-transformers/all-MiniLM-L6-v2' 
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModel.from_pretrained(path)

 3、定义平均池化

def mean_pooling(model_output, attention_mask):
    #First element of model_output contains all token embeddings
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / 
                     torch.clamp(input_mask_expanded.sum(1), min=1e-9)

4、对句子进行嵌入

sentences = ['loved thisand know really bought wanted see pictures myselfIm lucky enough someone could justify buying present', 
             'issue pages stickers restuck really used configurations made regular pages rather taking pieces robot back', 
             'stickers dont stick well first time placing', 
             'Great fun grandson loves robots',
             'would suggest younger kids son 3']
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    model_output = model(**encoded_input)

sentence_embeddings1 = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings1)
# Normalize embeddings
sentence_embeddings2 = F.normalize(sentence_embeddings1, p=2, dim=1)
print("Sentence embeddings:")
print(sentence_embeddings2)

5、运行结果

6、定义句子之间的相似度

def compute_sim_score(v1, v2) :
    return v1.dot(v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

7、计算句子相似度

#'issue pages stickers restuck really used configurations made regular pages rather taking pieces robot back'
#'stickers dont stick well first time placing'
compute_sim_score(sentence_embeddings1[1], sentence_embeddings1[2])
#result:tensor(0.5126)

8、看一下嵌入的shape

sentence_embeddings1.shape
#torch.Size([5, 384])

展望总结:

接下来试试对真实用户对项目的评论句子做嵌入

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值