[代码解读运⾏]SpatialTransformerNetworks(STN)
0 写在前⾯
在对STN的原论⽂进⾏了、后,我打算去github上运⾏下源码,以加深对ST的理解。毕竟,talk is cheap,show me the code!
此外,虽然论⽂作者发布是tf的源码,但由于我对tensorflow不如pytorch熟稔,因此这⾥我只看了的STN代码。发现写得⾮常详细,很适合⼩⽩⼊门,因此我放弃了⾃⼰解读的机会,打算就搬运⼀下原教程哈哈。
1 具体教程
注:以下内容均为复制/翻译,不过我在代码上加了点中⽂注释
Spatial transformer networks(简称STN)允许神经⽹络学习如何对输⼊图像执⾏空间变换,以增强模型的⼏何不变性。 例如,它可以裁剪感兴趣的区域,缩放并校正图像的⽅向。 这是⼀个有⽤的机制,因为CNN不会对旋转和缩放以及更⼀般的仿射变换保持invariance。
1.1 导⼊库
# Licen: BSD词性英语
# Author: Ghasn Hamrouni
from __future__ import print_function # 即使在python2.X,使⽤print就得像python3.X那样加括号使⽤
import torch
as nn
functional as F
import torch.optim as optim
import torchvision
from torchvision import datats, transforms
import matplotlib.pyplot as plt
import numpy as np
plt.ion()# interactive mode
1.2 载⼊数据
enforcer
在这个教程中我们使⽤MNIST⼿写数据集。
device = torch.device("cuda"if torch.cuda.is_available()el"cpu")
# Training datat
train_loader = torch.utils.data.DataLoader(
datats.MNIST(root='.', train=True, download=True,
transform=transforms.Compo([
transforms.ToTensor(),
transforms.Normalize((0.1307,),(0.3081,))
])), batch_size=64, shuffle=True, num_workers=4)
# Test datat
test_loader = torch.utils.data.DataLoader(
the giftdatats.MNIST(root='.', train=Fal, transform=transforms.Compo([
transforms.ToTensor(),
transforms.Normalize((0.1307,),(0.3081,))
])), batch_size=64, shuffle=True, num_workers=4)
个⼈觉得难懂的地⽅:
1.
1.3 建⽴STN模型
中级口译真题下载class Net(nn.Module):
def__init__(lf):
super(Net, lf).__init__()
lf.fc1 = nn.Linear(320,50)
lf.fc2 = nn.Linear(50,10)
# Spatial transformer localization-network
dhole# 其实这⾥的localization-network也只是⼀个普通的CNN+全连接层
# nn.Conv2d前⼏个参数为in_channel, out_channel, kennel_size, stride=1, padding=0
# nn.MaxPool2d前⼏个参数为kernel_size, stride=None, padding=0, dilation=1, return_indices=Fal, ceil_mode=Fal lf.localization = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=8, kernel_size=7),
nn.MaxPool2d(2, stride=2),
nn.ReLU(True),
nn.Conv2d(8,10, kernel_size=5),
nn.MaxPool2d(2, stride=2),
nn.ReLU(True)
)
# Regressor for the 3 * 2 affine matrix
lf.fc_loc = nn.Sequential(
nn.Linear(10*3*3,32),# in_features, out_features, bias = True
nn.ReLU(True),
nn.Linear(32,3*2)
)
repost
# Initialize the weights/bias with identity transformation
lf.fc_loc[2]._()
lf.fc_loc[2].py_(sor([1,0,0,0,1,0], dtype=torch.float))
# Spatial transformer network forward function
def stn(lf, x):
xs = lf.localization(x)# 先进⼊localization层
xs = xs.view(-1,10*3*3)# 展开为向量
theta = lf.fc_loc(xs)# 进⼊全连接层,得到theta向量
theta = theta.view(-1,2,3)# 对theta向量进⾏resize操作,输出2*3的仿射变换矩阵,通道数为C # affine_grid函数的输⼊中,theta的格式为(N,2,3),size参数的格式为(N,C,W',H')
# affine_grid函数中得到的输出grid的⼤⼩为(N,H,W,2),这⾥的2是因为⼀个点的坐标需要x和y两个数来描述
grid = F.affine_grid(theta=theta, size=x.size())# 这⾥size参数为输出图像的⼤⼩,和输⼊⼀样,因此采取x.size # grid_sample函数的输⼊中,x代表ST的输⼊图,格式为(N,C,W,H),W'可以不等于W,H‘可以不等于H;grid是上⼀步得到的 x = F.grid_sample(x, grid)
return x
def forward(lf, x):
# transform the input
x = lf.stn(x)
# Perform the usual forward pass
x = F.relu(F.max_v1(x),2))
x = F.relu(F.max_v2_v2(x)),2))
x = x.view(-1,320)
x = F.relu(lf.fc1(x))
x = F.dropout(x, aining)
x = lf.fc2(x)
return F.log_softmax(x, dim=1)
model = Net().to(device)
个⼈觉得难懂的地⽅:
1.localization net中卷积层的尺⼨问题。经过计算,最后⼀个卷积池的输⼊是7×7,理论上没法池化呀,硬是要池化的话,输出难道
为“3.5×3.5”个像素吗?实际上,由于maxpool层中ceil_mode=Fal,也就是会舍弃⽆法整除的部分,因此下⾯代码的第三⾏
中,xs.view是1033,其中10代表MNIST有⼗个分类,3*3代表经过最后⼀个池化层的图⽚尺⼨。
def stn(lf, x):
xs = lf.localization(x)# 先进⼊localization层
xs = xs.view(-1,10*3*3)# 展开为向量
具体计算过程如下:
此外,输⼊MNIST是单通道的(C=1),经过localization net后变为了10通道,这点代码⾥写得很清楚。
(2)F.grid_sample。利⽤上⼀步得到的⽹络在grid在原图上采样,输出(N,C,W’,H’)的图⽚。
grid = F.affine_grid(theta=theta, size=x.size())# 得到⽹络grid
x = F.grid_sample(x, grid)# 利⽤grid在原图上采样
1.4 训练部分
这⾥就是标准的深度学习⽹络。
⽹上看到很多⼈在问ST如何训练,其实不需要特别训练,把ST加⼊到你⾃⼰的CNN它就会⾃⼰进⾏反向传播调整参数的。
optimizer = optim.SGD(model.parameters(), lr=0.01)
def train(epoch):
for batch_idx,(data, target)in enumerate(train_loader):
data, target = (device), (device)
<_grad()
output = model(data)
少年派的奇幻漂流台词loss = F.nll_loss(output, target)# 前⾯⽤的是log_softmax,因此这⾥⽤nll_loss loss.backward()
2014考研成绩查询入口optimizer.step()
if batch_idx %500==0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx *len(data),len(train_loader.datat),
100.* batch_idx /len(train_loader), loss.item()))
#
# A simple test procedure to measure STN the performances on MNIST.
#
def test():
_grad():
model.eval()
test_loss =0
correct =0
for data, target in test_loader:
data, target = (device), (device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, reduction='sum').item()
# get the index of the max log-probability
pred = output.max(1, keepdim=True)[1]
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /=len(test_loader.datat)
print('\nTest t: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'
.format(test_loss, correct,len(test_loader.datat),
100.* correct /len(test_loader.datat)))
个⼈觉得难懂的地⽅:
1.为什么loss要⽤nll_loss?因为前⾯⽤了log_softmax。
1.5 可视化&运⾏!
def convert_image_np(inp):
"""Convert a Tensor to numpy image."""
inp = inp.numpy().transpo((1,2,0))
mean = np.array([0.485,0.456,0.406])
std = np.array([0.229,0.224,0.225])
inp = std * inp + mean
inp = np.clip(inp,0,1)
return inp
# We want to visualize the output of the spatial transformers layer
# after the training, we visualize a batch of input images and
# the corresponding transformed batch using STN.
def visualize_stn():
_grad():
# Get a batch of training data
data =next(iter(test_loader))[0].to(device)
input_tensor = data.cpu()
transformed_input_tensor = model.stn(data).cpu()
in_grid = convert_image_np(
torchvision.utils.make_grid(input_tensor))
out_grid = convert_image_np(
torchvision.utils.make_grid(transformed_input_tensor))
# Plot the results side-by-side美容护肤窍门
f, axarr = plt.subplots(1,2)
axarr[0].imshow(in_grid)
axarr[0].t_title('Datat Images')
axarr[1].imshow(out_grid)
axarr[1].t_title('Transformed Images')
if __name__ =='__main__':
for epoch in range(1,20+1):
train(epoch)
test()
# Visualize the STN transformation on some input batch
visualize_stn()
plt.ioff()
plt.show()
2 运⾏结果
只截取了部分。
————————————————————————————
Train Epoch: 19 [0/60000 (0%)] Loss: 0.097642
Train Epoch: 19 [32000/60000 (53%)] Loss: 0.092502throwing
Test t: Average loss: 0.0388, Accuracy: 9871/10000 (99%)————————————————————————————
Train Epoch: 20 [0/60000 (0%)] Loss: 0.042493
Train Epoch: 20 [32000/60000 (53%)] Loss: 0.025031
Test t: Average loss: 0.0396, Accuracy: 9874/10000 (99%)————————————————————————————