TextRNN的PyTorch实现

本文介绍一下如何使用PyTorch复现TextRNN,实现预测一句话的下一个词

参考这篇论文Finding Structure in Time(1990),如果你对RNN有一定的了解,实际上不用看,仔细看我代码如何实现即可。如果你对RNN不太了解,请仔细阅读我这篇文章RNN Layer,结合PyTorch讲的很详细

现在问题的背景是,我有n句话,每句话都由且仅由3个单词组成。我要做的是,将每句话的前两个单词作为输入,最后一词作为输出,训练一个RNN模型

导库

'''
  code by Tae Hwan Jung(Jeff Jung) @graykode, modify by wmathor
'''
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data

dtype = torch.FloatTensor

准备数据

sentences = [ "i like dog", "i love coffee", "i hate milk"]

word_list = " ".join(sentences).split()
vocab = list(set(word_list))
word2idx = {w: i for i, w in enumerate(vocab)}
idx2word = {i: w for i, w in enumerate(vocab)}
n_class = len(vocab)

预处理数据,构建Dataset,定义DataLoader,输入数据用one-hot编码

# TextRNN Parameter
batch_size = 2
n_step = 2 # number of cells(= number of Step)
n_hidden = 5 # number of hidden units in one cell

def make_data(sentences):
    input_batch = []
    target_batch = []

    for sen in sentences:
        word = sen.split()
        input = [word2idx[n] for n in word[:-1]]
        target = word2idx[word[-1]]

        input_batch.append(np.eye(n_class)[input])
        target_batch.append(target)

    return input_batch, target_batch

input_batch, target_batch = make_data(sentences)
input_batch, target_batch = torch.Tensor(input_batch), torch.LongTensor(target_batch)
dataset = Data.TensorDataset(input_batch, target_batch)
loader = Data.DataLoader(dataset, batch_size, True)

以上的代码我想大家应该都没有问题,接下来就是定义网络架构

class TextRNN(nn.Module):
    def __init__(self):
        super(TextRNN, self).__init__()
        self.rnn = nn.RNN(input_size=n_class, hidden_size=n_hidden)
        # fc
        self.fc = nn.Linear(n_hidden, n_class)

    def forward(self, hidden, X):
        # X: [batch_size, n_step, n_class]
        X = X.transpose(0, 1) # X : [n_step, batch_size, n_class]
        out, hidden = self.rnn(X, hidden)
        # out : [n_step, batch_size, num_directions(=1) * n_hidden]
        # hidden : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
        out = out[-1] # [batch_size, num_directions(=1) * n_hidden] ⭐
        model = self.fc(out)
        return model

model = TextRNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

以上代码每一步都值得说一下,首先是nn.RNN(input_size, hidden_size)的两个参数,input_size表示每个词的编码维度,由于我是用的one-hot编码,而不是WordEmbedding,所以input_size就等于词库的大小len(vocab),即n_class。然后是hidden_size,这个参数没有固定的要求,你想将输入数据的维度转为多少维,就设定多少

对于通常的神经网络来说,输入数据的第一个维度一般都是batch_size。而PyTorch中nn.RNN()要求将batch_size放在第二个维度上,所以需要使用x.transpose(0, 1)将输入数据的第一个维度和第二个维度互换

然后是rnn的输出,rnn会返回两个结果,即上面代码的out和hidden,关于这两个变量的区别,我在之前的博客也提到过了,如果不清楚,可以看我上面提到的RNN Layer这篇博客。这里简单说就是,out指的是下图的红框框起来的所有值;hidden指的是下图蓝框框起来的所有值。我们需要的是最后时刻的最后一层输出,即 Y 3 Y_3 Y3的值,所以使用out=out[-1]将其获取

剩下的部分就比较简单了,训练测试即可

# Training
for epoch in range(5000):
    for x, y in loader:
      # hidden : [num_layers * num_directions, batch, hidden_size]
      hidden = torch.zeros(1, x.shape[0], n_hidden)
      # x : [batch_size, n_step, n_class]
      pred = model(hidden, x)

      # pred : [batch_size, n_class], y : [batch_size] (LongTensor, not one-hot)
      loss = criterion(pred, y)
      if (epoch + 1) % 1000 == 0:
          print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
    
input = [sen.split()[:2] for sen in sentences]
# Predict
hidden = torch.zeros(1, len(input), n_hidden)
predict = model(hidden, input_batch).data.max(1, keepdim=True)[1]
print([sen.split()[:2] for sen in sentences], '->', [idx2word[n.item()] for n in predict.squeeze()])

完整代码如下

'''
  code by Tae Hwan Jung(Jeff Jung) @graykode, modify by wmathor
'''
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data

dtype = torch.FloatTensor

sentences = [ "i like dog", "i love coffee", "i hate milk"]

word_list = " ".join(sentences).split()
vocab = list(set(word_list))
word2idx = {w: i for i, w in enumerate(vocab)}
idx2word = {i: w for i, w in enumerate(vocab)}
n_class = len(vocab)

# TextRNN Parameter
batch_size = 2
n_step = 2 # number of cells(= number of Step)
n_hidden = 5 # number of hidden units in one cell

def make_data(sentences):
    input_batch = []
    target_batch = []

    for sen in sentences:
        word = sen.split()
        input = [word2idx[n] for n in word[:-1]]
        target = word2idx[word[-1]]

        input_batch.append(np.eye(n_class)[input])
        target_batch.append(target)

    return input_batch, target_batch

input_batch, target_batch = make_data(sentences)
input_batch, target_batch = torch.Tensor(input_batch), torch.LongTensor(target_batch)
dataset = Data.TensorDataset(input_batch, target_batch)
loader = Data.DataLoader(dataset, batch_size, True)

class TextRNN(nn.Module):
    def __init__(self):
        super(TextRNN, self).__init__()
        self.rnn = nn.RNN(input_size=n_class, hidden_size=n_hidden)
        # fc
        self.fc = nn.Linear(n_hidden, n_class)

    def forward(self, hidden, X):
        # X: [batch_size, n_step, n_class]
        X = X.transpose(0, 1) # X : [n_step, batch_size, n_class]
        out, hidden = self.rnn(X, hidden)
        # out : [n_step, batch_size, num_directions(=1) * n_hidden]
        # hidden : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
        out = out[-1] # [batch_size, num_directions(=1) * n_hidden] ⭐
        model = self.fc(out)
        return model

model = TextRNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training
for epoch in range(5000):
    for x, y in loader:
      # hidden : [num_layers * num_directions, batch, hidden_size]
      hidden = torch.zeros(1, x.shape[0], n_hidden)
      # x : [batch_size, n_step, n_class]
      pred = model(hidden, x)

      # pred : [batch_size, n_class], y : [batch_size] (LongTensor, not one-hot)
      loss = criterion(pred, y)
      if (epoch + 1) % 1000 == 0:
          print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
  
input = [sen.split()[:2] for sen in sentences]
# Predict
hidden = torch.zeros(1, len(input), n_hidden)
predict = model(hidden, input_batch).data.max(1, keepdim=True)[1]
print([sen.split()[:2] for sen in sentences], '->', [idx2word[n.item()] for n in predict.squeeze()])
### 如何用 PyTorch 实现 TextRNN TextRNN 是一种用于文本分类和其他自然语言处理任务的有效模型。该模型基于 RNN 架构,在每个时间步上接收输入词向量并更新隐藏状态,最终通过全连接层输出预测结果。 #### 1. 导入必要的库 为了构建 TextRNN 模型,首先需要导入所需的 Python 库: ```python import torch from torch import nn, optim import torch.nn.functional as F ``` #### 2. 定义 TextRNN 类 定义 `TextRNN` 类继承自 `nn.Module` 并初始化参数,包括嵌入矩阵、RNN 层以及线性变换层等组件[^2]。 ```python class TextRNN(nn.Module): def __init__(vocab_size, embed_dim=100, hidden_dim=128, num_layers=2, output_dim=2): super(TextRNN, self).__init__() # 嵌入层 self.embedding = nn.Embedding(vocab_size, embed_dim) # 双向 GRU/RNN/LSTM 层 self.rnn = nn.GRU(embed_dim, hidden_dim, num_layers=num_layers, bidirectional=True, batch_first=True, dropout=0.5) # 输出层 self.fc = nn.Linear(hidden_dim * 2, output_dim) def forward(self, text): embedded = self.embedding(text) # [batch size, seq_len, emb_dim] outputs, _ = self.rnn(embedded) # [batch size, seq_len, hid_dim*directions] encoding = torch.mean(outputs, dim=1) # 对所有时刻取平均作为句子表示 logits = self.fc(encoding) # [batch size, out_dim] return logits ``` 此部分代码实现了双向门控循环单元 (GRU),可以替换为其他类型的 RNN 或 LSTM 单元以适应具体应用场景的需求。 #### 3. 训练过程设置 创建实例化对象后即可开始训练流程,这里省略了数据预处理步骤,假设已经准备好了一批批次大小相同的数据集 `train_loader` 和相应的标签 `labels`: ```python device = 'cuda' if torch.cuda.is_available() else 'cpu' model = TextRNN(vocab_size=len(vocab), embed_dim=300).to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) for epoch in range(num_epochs): model.train() for i, (texts, labels) in enumerate(train_loader): texts, labels = texts.to(device), labels.to(device) optimizer.zero_grad() # 清除梯度缓存[^1] predictions = model(texts) loss = criterion(predictions, labels) loss.backward() optimizer.step() if i % log_interval == 0: print(f"Epoch [{epoch}/{num_epochs}], Step [{i}/{len(train_loader)}], Loss: {loss.item():.4f}") ``` 上述代码展示了完整的训练迭代逻辑,其中包含了清除旧有梯度的操作以防止累积错误的影响。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数学家是我理想

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值