[代码解读运行]SpatialTransformerNetworks（STN）

更新时间:2023-06-07 11:14:26 阅读：评论：0

[代码解读运⾏]SpatialTransformerNetworks（STN）

0 写在前⾯

在对STN的原论⽂进⾏了、后，我打算去github上运⾏下源码，以加深对ST的理解。毕竟，talk is cheap，show me the code！

此外，虽然论⽂作者发布是tf的源码，但由于我对tensorflow不如pytorch熟稔，因此这⾥我只看了的STN代码。发现写得⾮常详细，很适合⼩⽩⼊门，因此我放弃了⾃⼰解读的机会，打算就搬运⼀下原教程哈哈。

1 具体教程

注：以下内容均为复制/翻译，不过我在代码上加了点中⽂注释

Spatial transformer networks（简称STN）允许神经⽹络学习如何对输⼊图像执⾏空间变换，以增强模型的⼏何不变性。例如，它可以裁剪感兴趣的区域，缩放并校正图像的⽅向。这是⼀个有⽤的机制，因为CNN不会对旋转和缩放以及更⼀般的仿射变换保持invariance。

1.1 导⼊库

# Licen: BSD词性英语

# Author: Ghasn Hamrouni

from __future__ import print_function # 即使在python2.X，使⽤print就得像python3.X那样加括号使⽤

import torch

as nn

functional as F

import torch.optim as optim

import torchvision

from torchvision import datats, transforms

import matplotlib.pyplot as plt

import numpy as np

plt.ion()# interactive mode

1.2 载⼊数据

enforcer

在这个教程中我们使⽤MNIST⼿写数据集。

device = torch.device("cuda"if torch.cuda.is_available()el"cpu")

# Training datat

train_loader = torch.utils.data.DataLoader(

datats.MNIST(root='.', train=True, download=True,

transform=transforms.Compo([

transforms.ToTensor(),

transforms.Normalize((0.1307,),(0.3081,))

])), batch_size=64, shuffle=True, num_workers=4)

# Test datat

test_loader = torch.utils.data.DataLoader(

the giftdatats.MNIST(root='.', train=Fal, transform=transforms.Compo([

transforms.ToTensor(),

transforms.Normalize((0.1307,),(0.3081,))

])), batch_size=64, shuffle=True, num_workers=4)

个⼈觉得难懂的地⽅：

1.3 建⽴STN模型

中级口译真题下载class Net(nn.Module):

def__init__(lf):

super(Net, lf).__init__()

lf.fc1 = nn.Linear(320,50)

lf.fc2 = nn.Linear(50,10)

# Spatial transformer localization-network

dhole# 其实这⾥的localization-network也只是⼀个普通的CNN+全连接层

# nn.Conv2d前⼏个参数为in_channel, out_channel, kennel_size, stride=1, padding=0

# nn.MaxPool2d前⼏个参数为kernel_size, stride=None, padding=0, dilation=1, return_indices=Fal, ceil_mode=Fal lf.localization = nn.Sequential(

nn.Conv2d(in_channels=1, out_channels=8, kernel_size=7),

nn.MaxPool2d(2, stride=2),

nn.ReLU(True),

nn.Conv2d(8,10, kernel_size=5),

nn.MaxPool2d(2, stride=2),

nn.ReLU(True)

)

# Regressor for the 3 * 2 affine matrix

lf.fc_loc = nn.Sequential(

nn.Linear(10*3*3,32),# in_features, out_features, bias = True

nn.ReLU(True),

nn.Linear(32,3*2)

)

repost

# Initialize the weights/bias with identity transformation

lf.fc_loc[2]._()

lf.fc_loc[2].py_(sor([1,0,0,0,1,0], dtype=torch.float))

# Spatial transformer network forward function

def stn(lf, x):

xs = lf.localization(x)# 先进⼊localization层

xs = xs.view(-1,10*3*3)# 展开为向量

theta = lf.fc_loc(xs)# 进⼊全连接层，得到theta向量

theta = theta.view(-1,2,3)# 对theta向量进⾏resize操作，输出2*3的仿射变换矩阵，通道数为C # affine_grid函数的输⼊中，theta的格式为(N,2,3)，size参数的格式为(N,C,W',H')

# affine_grid函数中得到的输出grid的⼤⼩为(N,H,W,2)，这⾥的2是因为⼀个点的坐标需要x和y两个数来描述

grid = F.affine_grid(theta=theta, size=x.size())# 这⾥size参数为输出图像的⼤⼩，和输⼊⼀样，因此采取x.size # grid_sample函数的输⼊中，x代表ST的输⼊图，格式为(N,C,W,H),W'可以不等于W,H‘可以不等于H;grid是上⼀步得到的 x = F.grid_sample(x, grid)

return x

def forward(lf, x):

# transform the input

x = lf.stn(x)

# Perform the usual forward pass

x = F.relu(F.max_v1(x),2))

x = F.relu(F.max_v2_v2(x)),2))

x = x.view(-1,320)

x = F.relu(lf.fc1(x))

x = F.dropout(x, aining)

x = lf.fc2(x)

return F.log_softmax(x, dim=1)

model = Net().to(device)

个⼈觉得难懂的地⽅：

1.localization net中卷积层的尺⼨问题。经过计算，最后⼀个卷积池的输⼊是7×7，理论上没法池化呀，硬是要池化的话，输出难道

为“3.5×3.5”个像素吗？实际上，由于maxpool层中ceil_mode=Fal，也就是会舍弃⽆法整除的部分，因此下⾯代码的第三⾏

中，xs.view是1033，其中10代表MNIST有⼗个分类，3*3代表经过最后⼀个池化层的图⽚尺⼨。

def stn(lf, x):

xs = lf.localization(x)# 先进⼊localization层

xs = xs.view(-1,10*3*3)# 展开为向量

具体计算过程如下：

此外，输⼊MNIST是单通道的（C=1)，经过localization net后变为了10通道，这点代码⾥写得很清楚。

（2）F.grid_sample。利⽤上⼀步得到的⽹络在grid在原图上采样，输出(N,C,W’,H’)的图⽚。

grid = F.affine_grid(theta=theta, size=x.size())# 得到⽹络grid

x = F.grid_sample(x, grid)# 利⽤grid在原图上采样

1.4 训练部分

这⾥就是标准的深度学习⽹络。

⽹上看到很多⼈在问ST如何训练，其实不需要特别训练，把ST加⼊到你⾃⼰的CNN它就会⾃⼰进⾏反向传播调整参数的。

optimizer = optim.SGD(model.parameters(), lr=0.01)

def train(epoch):

for batch_idx,(data, target)in enumerate(train_loader):

data, target = (device), (device)

<_grad()

output = model(data)

少年派的奇幻漂流台词loss = F.nll_loss(output, target)# 前⾯⽤的是log_softmax，因此这⾥⽤nll_loss loss.backward()

2014考研成绩查询入口optimizer.step()

if batch_idx %500==0:

print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(

epoch, batch_idx *len(data),len(train_loader.datat),

100.* batch_idx /len(train_loader), loss.item()))

# A simple test procedure to measure STN the performances on MNIST.

def test():

_grad():

model.eval()

test_loss =0

correct =0

for data, target in test_loader:

data, target = (device), (device)

output = model(data)

# sum up batch loss

test_loss += F.nll_loss(output, target, reduction='sum').item()

# get the index of the max log-probability

pred = output.max(1, keepdim=True)[1]

correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /=len(test_loader.datat)

print('\nTest t: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'

.format(test_loss, correct,len(test_loader.datat),

100.* correct /len(test_loader.datat)))

个⼈觉得难懂的地⽅：

1.为什么loss要⽤nll_loss？因为前⾯⽤了log_softmax。

1.5 可视化&运⾏！

def convert_image_np(inp):

"""Convert a Tensor to numpy image."""

inp = inp.numpy().transpo((1,2,0))

mean = np.array([0.485,0.456,0.406])

std = np.array([0.229,0.224,0.225])

inp = std * inp + mean

inp = np.clip(inp,0,1)

return inp

# We want to visualize the output of the spatial transformers layer

# after the training, we visualize a batch of input images and

# the corresponding transformed batch using STN.

def visualize_stn():

_grad():

# Get a batch of training data

data =next(iter(test_loader))[0].to(device)

input_tensor = data.cpu()

transformed_input_tensor = model.stn(data).cpu()

in_grid = convert_image_np(

torchvision.utils.make_grid(input_tensor))

out_grid = convert_image_np(

torchvision.utils.make_grid(transformed_input_tensor))

# Plot the results side-by-side美容护肤窍门

f, axarr = plt.subplots(1,2)

axarr[0].imshow(in_grid)

axarr[0].t_title('Datat Images')

axarr[1].imshow(out_grid)

axarr[1].t_title('Transformed Images')

if __name__ =='__main__':

for epoch in range(1,20+1):

train(epoch)

test()

# Visualize the STN transformation on some input batch

visualize_stn()

plt.ioff()

plt.show()

2 运⾏结果

只截取了部分。

————————————————————————————

Train Epoch: 19 [0/60000 (0%)] Loss: 0.097642

Train Epoch: 19 [32000/60000 (53%)] Loss: 0.092502throwing

Test t: Average loss: 0.0388, Accuracy: 9871/10000 (99%)————————————————————————————

Train Epoch: 20 [0/60000 (0%)] Loss: 0.042493

Train Epoch: 20 [32000/60000 (53%)] Loss: 0.025031

Test t: Average loss: 0.0396, Accuracy: 9874/10000 (99%)————————————————————————————

本文发布于:2023-06-07 11:14:26，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/893400.html

上一篇：机器学习及深度学习技术在海洋科学方面的应用

下一篇：跨国公司本土化策略的主要障碍与对策研究 2

标签：输出代码图像通道参数学习

留言与评论（共有 0 条评论）