TextCNN的PyTorch实现

更新时间:2023-05-12 02:18:02 阅读：评论：0

TextCNN的PyTorch实现

本⽂主要介绍⼀篇将CNN应⽤到NLP领域的⼀篇论⽂，然后给出 PyTorch 实现

论⽂⽐较短，总体流程也不复杂，最主要的是下⾯这张图，只要理解了这张图，就知道如何写代码了。如果你不了解CNN，请先看我的这篇⽂章

下图的feature map是将⼀句话中的各个词通过WordEmbedding得到的，feature map的宽为embedding的维度，长为⼀句话的单词数量。例如下图中，很明显就是⽤⼀个6维的向量去编码每个词，并且⼀句话中有9个词

之所以有两张feature map，你可以理解为batchsize为2

其中，红⾊的框代表的就是卷积核。⽽且很明显可以看出，这是⼀个长宽不等的卷积核。有意思的是，卷积核的宽可以认为是n-gram，⽐⽅说下图卷积核宽为2，所以同时考虑了"wait"和"for"两个单词的词向量，因此可以认为该卷积是⼀个类似于bigram的模型

[外链图⽚转存失败，源站可能有防盗链机制，建议将图⽚保存下来直接上传(img-9k0V91qc-1593652373824)

(/2020/06/25/F5GjQbMdR3WgukT.png#shadow)]

后⾯的部分就是传统CNN的步骤，激活、池化、Flatten，没什么好说的

代码实现（PyTorch版）

源码来⾃于，我在其基础上进⾏了修改（原本的代码感觉有很多问题）

'''

code by Tae Hwan Jung(Jeff Jung) @graykode, modify by wmathor

'''

import torch

import numpy as np

as nn

import torch.optim as optim

import torch.utils.data as Data

functional as F

dtype = torch.FloatTensor

device = torch.device("cuda"if torch.cuda.is_available()el"cpu")

下⾯代码就是定义⼀些数据，以及设置⼀些常规参数

# 3 words ntences (=quence_length is 3)

ntences =["i love you","he loves me","she likes baball","i hate you","sorry for that","this is awful"]

labels =[1,1,1,0,0,0]# 1 is good, 0 is not good.

# TextCNN Parameter

embedding_size =2

quence_length =len(ntences[0])# every ntences contains quence_length(=3) words

num_class =len(t(labels))# num_class=2

batch_size =3

word_list =" ".join(ntences).split()

vocab =list(t(word_list))

word2idx ={w: i for i, w in enumerate(vocab)}

vocab_size =len(vocab)

数据预处理

def make_data(ntences, labels):

inputs =[]

for n in ntences:

inputs.append([word2idx[n]for n in n.split()])

targets =[]

for out in labels:

targets.append(out)# To using Torch Softmax Loss function

return inputs, targets

input_batch, target_batch = make_data(ntences, labels)

input_batch, target_batch = torch.LongTensor(input_batch), torch.LongTensor(target_batch)

datat = Data.TensorDatat(input_batch, target_batch)

loader = Data.DataLoader(datat, batch_size,True)

构建模型

class TextCNN(nn.Module):

def__init__(lf):

super(TextCNN, lf).__init__()

lf.W = nn.Embedding(vocab_size, embedding_size)

output_channel =3

# conv : [input_channel(=1), output_channel, (filter_height, filter_width), stride=1]

nn.Conv2d(1, output_channel,(2, embedding_size)),

nn.ReLU(),

# pool : ((filter_height, filter_width))

nn.MaxPool2d((2,1)),

)

# fc

lf.fc = nn.Linear(output_channel, num_class)

def forward(lf, X):

'''

X: [batch_size, quence_length]

'''

batch_size = X.shape[0]

embedding_X = lf.W(X)# [batch_size, quence_length, embedding_size]

embedding_X = embedding_X.unsqueeze(1)# add channel(=1) [batch, channel(=1), quence_length, embedding_size]

conved = lf.conv(embedding_X)# [batch_size, output_channel*1*1]

flatten = conved.view(batch_size,-1)

output = lf.fc(flatten)

return output

下⾯详细介绍⼀下数据在⽹络中流动的过程中维度的变化。输⼊数据是个矩阵，矩阵维度为[batch_size, qence_length]，输⼊矩阵的数字代表的是某个词在整个词库中的索引（下标）

⾸先通过Embedding层，也就是查表，将每个索引转为⼀个向量，⽐⽅说12可能会变成[0.3,0.6,0.12,…]，因此整个数据⽆形中就增加了⼀个维度，变成了[batch_size, quence_length, embedding_size]

之后使⽤unsqueeze(1)函数使数据增加⼀个维度，变成[batch_size, 1, quence_length, embedding_size]。现在的数据才能做卷积，因为在传统CNN中，输⼊数据就应该是[batch_size, in_channel, height, width]这种维度

[batch_size, 1, 3, 2]的输⼊数据通过nn.Conv2d(1, 3, (2, 2))的卷积之后，得到的就是[batch_size, 3, 2, 1]的数据，由于经过ReLU激活函数是不改变维度的，所以就没画出来。最后经过⼀个nn.MaxPool2d((2, 1))池化，得到的数据维度就是[batch_size, 3, 1, 1]

训练

model = TextCNN().to(device)

criterion = nn.CrossEntropyLoss().to(device)

optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training

for epoch in range(5000):

for batch_x, batch_y in loader:

batch_x, batch_y = (device), (device)

pred = model(batch_x)

loss = criterion(pred, batch_y)

if(epoch +1)%1000==0:

print('Epoch:','%04d'%(epoch +1),'loss =','{:.6f}'.format(loss))

<_grad()

loss.backward()

optimizer.step()

测试

# Test

test_text ='i hate me'

tests =[[word2idx[n]for n in test_text.split()]]

test_batch = torch.LongTensor(tests).to(device)

# Predict

model = model.eval()

predict = model(test_batch).data.max(1, keepdim=True)[1]

if predict[0][0]==0:

print(test_text,"is ")

el:

print(test_text,"is Good Mean!!")

完整代码如下：

'''

code by Tae Hwan Jung(Jeff Jung) @graykode, modify by wmathor

'''

import torch

import numpy as np

as nn

import torch.optim as optim

import torch.utils.data as Data

functional as F

dtype = torch.FloatTensor

device = torch.device("cuda"if torch.cuda.is_available()el"cpu")

# 3 words ntences (=quence_length is 3)

ntences =["i love you","he loves me","she likes baball","i hate you","sorry for that","this is awful"] labels =[1,1,1,0,0,0]# 1 is good, 0 is not good.

# TextCNN Parameter

embedding_size =2

quence_length =len(ntences[0])# every ntences contains quence_length(=3) words

num_class =2# 0 or 1

batch_size =3

word_list =" ".join(ntences).split()

vocab =list(t(word_list))

word2idx ={w: i for i, w in enumerate(vocab)}

vocab_size =len(vocab)

def make_data(ntences, labels):

inputs =[]

for n in ntences:

inputs.append([word2idx[n]for n in n.split()])

targets =[]

for out in labels:

targets.append(out)# To using Torch Softmax Loss function

return inputs, targets

input_batch, target_batch = make_data(ntences, labels)

input_batch, target_batch = torch.LongTensor(input_batch), torch.LongTensor(target_batch)

datat = Data.TensorDatat(input_batch, target_batch)

loader = Data.DataLoader(datat, batch_size,True)

class TextCNN(nn.Module):

def__init__(lf):

super(TextCNN, lf).__init__()

lf.W = nn.Embedding(vocab_size, embedding_size)

output_channel =3

# conv : [input_channel(=1), output_channel, (filter_height, filter_width), stride=1]

nn.Conv2d(1, output_channel,(2, embedding_size)),

nn.ReLU(),

# pool : ((filter_height, filter_width))

nn.MaxPool2d((2,1)),

)

# fc

lf.fc = nn.Linear(output_channel, num_class)

def forward(lf, X):

'''

X: [batch_size, quence_length]

'''

batch_size = X.shape[0]

embedding_X = lf.W(X)# [batch_size, quence_length, embedding_size]

embedding_X = embedding_X.unsqueeze(1)# add channel(=1) [batch, channel(=1), quence_length, embedding_size] conved = lf.conv(embedding_X)# [batch_size, output_channel, 1, 1]

flatten = conved.view(batch_size,-1)# [batch_size, output_channel*1*1]

output = lf.fc(flatten)

return output

model = TextCNN().to(device)

criterion = nn.CrossEntropyLoss().to(device)

optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training

for epoch in range(5000):

for batch_x, batch_y in loader:

batch_x, batch_y = (device), (device)

pred = model(batch_x)

loss = criterion(pred, batch_y)

if(epoch +1)%1000==0:

print('Epoch:','%04d'%(epoch +1),'loss =','{:.6f}'.format(loss))

<_grad()

loss.backward()

optimizer.step()

# Test

test_text ='i hate me'

tests =[[word2idx[n]for n in test_text.split()]]

test_batch = torch.LongTensor(tests).to(device)

# Predict

model = model.eval()

predict = model(test_batch).data.max(1, keepdim=True)[1]

if predict[0][0]==0:

print(test_text,"is ")

el:

print(test_text,"is Good Mean!!")

如果你仔细看过我参考的源码，就会发现他写的很奇怪

for filter_size in filter_sizes:

# conv : [input_channel(=1), output_channel(=3), (filter_height, filter_width), bias_option]

conv = nn.Conv2d(1, num_filters,(filter_size, embedding_size), bias=True)(embedded_chars)

h = F.relu(conv)

# mp : ((filter_height, filter_width))

mp = nn.MaxPool2d((quence_length - filter_size +1,1))

# pooled : [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3)]

pooled = mp(h).permute(0,3,2,1)

pooled_outputs.append(pooled)

他使⽤了⼀个循环，对原始数据做了多次卷积，得到多个feature map。这个做法奇怪在于，如果说想要得到更多feature map，修

改nn.Conv2d()中output_channel参数即可，为什么要这样做多次循环？

如果作者本来的意思是想搞⼀个深层卷积神经⽹络，也说不通，因为他这个写法就没有这样的效果，他的循环始终是对原始输⼊数据做运算，⽽不是对卷积后的数据再运算

本文发布于:2023-05-12 02:18:02，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/593448.html

上一篇：午休讲话检讨书（通用20篇）

下一篇：学校办公室主任工作总结报告(14篇)

标签：数据卷积维度代码

留言与评论（共有 0 条评论）