CNN之⼿写数字识别(HandwritingRecognition)
CNN之⼿写数字识别(Handwriting Recognition)
⽬录
1、常⽤的包
torchvision.datats:数据集,对整个数据的封装,统⼀处理图像或张量等原始数据
torch.utils.data.DataLoader:数据加载器,负责在程序中对数据集的使⽤,可实现⾃动化批量输出数据
torch.utils.data.sampler:采样器,为加载器提供⼀个每⼀批抽取数据集中样本的⽅法,可实现顺序抽取,随机抽取或按概率分布抽取
2、常见概念
卷积(Convolution):在原始图像中搜索与卷积核相似的区域,即⽤卷积核从左到右、从上到下地进⾏逐个像素的扫描和匹配,并最终将匹配的结果表⽰成⼀张新的图像,通常被称为特征图(Feature Map)
色艳输出特征图有多少层,这⼀层卷积就有多少个卷积核,每⼀个卷积核会完全独⽴地进⾏运算
锐化图像(强调细节)、模糊图像(减少细节)都可以看作某种特定权重的卷积核在原始图像上的卷积操作
⼀般情况下,底层卷积操作的特征核数量少,越往后越多
特征图中,⼀个像素就是⼀个神经元
卷积计算的两个阶段:
前馈运算阶段(从输⼊图像到输出概率分布):所有连接的权重值都不改变,系统根据输⼊图像计算输出分类,并根据⽹络的分类与数据中标签进⾏⽐较,计算出交叉熵作为损失函数
反馈学习阶段:根据前馈阶段的损失函数调整所有连接上的权重值,从⽽完成神经⽹络的学习过程补齐(Padding):将原始图扩⼤,⽤0来填充补充的区域
池化(Pooling):将原始图变⼩,获取粗粒度信息、提炼⼤尺度图像信息的过程,是对原始图像的缩略和抽象
超参数:⼈为设定的参数值,决定整个⽹络的架构,如⽹络层数、神经元数量、卷积核窗⼝尺⼨、卷积核数量、填充格点⼤⼩、池化窗⼝尺⼨等
参数:不需要⼈为设定,在⽹络的训练过程中⽹络⾃动学习得到的数值
激活函数:提供⽹络的⾮线性建模能⼒
损失函数:度量神经⽹络的输出的预测值与实际值之间的差距
dropout技术:在深度学习⽹络的训练过程中,根据⼀定的概率随机将其中的⼀些神经元暂时丢弃,这样在每个批的训练过程中,都是在训练不同的神经⽹络,最后在测试时再使⽤全部的神经元,这样可以增强模型的泛化能⼒
3、⼿写数字识别器实现
3.1 数据准备
import torch
as nn
from torch.autograd import Variable
import torch.optim as optim
functional as F
import torchvision.datats as dts
ansforms as transforms
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
device = torch.device("cuda:0" if torch.cuda.is_available() el "cpu")
torch.abled = True
# 超参数
image_size = 28 # 图像分辨率28*28
num_class = 10
num_epochs = 60
num_workers = 2
batch_size = 128
train_datat = dts.MNIST(root='./data',
train=True,
transform=transforms.Compo([transforms.ToTensor(),
transforms.RandomHorizontalFlip(), # 图像的⼀半概率翻转,⼀半不翻
transforms.Normalize(mean=0.5, std=0.5)
]),
download=True)
test_datat = dts.MNIST(root='./data',
train=Fal,
transform=transforms.Compo([transforms.ToTensor(),
transforms.Normalize(mean=0.5, std=0.5)
]),
download=True)
train_loader = torch.utils.data.DataLoader(datat=train_datat, batch_size=batch_size, shuffle=True, num_workers=num_workers)
# 测试数据分成两部分,⼀部分作为校验数据,⼀部分作为测试数据
indices = range(len(test_datat))
indices_val = indices[:4000] # 校验集
indices_test = indices[4000:] # 测试集
# 采样器随机从原始数据集中抽样数据,⽣成任意⼀个下标重排,从⽽利⽤下标来提取数据集中数据
sampler_val = torch.utils.data.sampler.SubtRandomSampler(indices_val)
sampler_test = torch.utils.data.sampler.SubtRandomSampler(indices_test)
幼儿主题画val_loader = torch.utils.data.DataLoader(datat=test_datat, batch_size=batch_size, shuffle=Fal, sampler=sampler_val, num_workers=num_workers) test_loader = torch.utils.data.DataLoader(datat=test_datat, batch_size=batch_size, shuffle=Fal, sampler=sampler_test, num_workers=num_workers)
测试其中任意批次中的数据的图像打印及标签
idx = 26
mnist_img = test_datat[idx][0].numpy() # datat⽀持下标索引,提取出来的元素为features、target格式,第25个批次,[0]表⽰索引features
plt.imshow(mnist_img[0,...])
print('标签是:', test_datat[idx][1])
任意批次中的数据的图像打印及标签
3.2 构建⽹络
class ConvNet(nn.Module):
公丁# 构造函数,每当类ConvNet被具体化⼀个实例时就会被调⽤
def __init__(lf):
super(ConvNet, lf).__init__()
lf.fc2 = nn.Linear(512, num_class)
def forward(lf, x):
x = F.v1(x))
x = F.v2(lf.pool(x)))
x = lf.pool(x)
x = x.view(-1, image_size // 4 * image_size // 4 * 8)
x = F.relu(lf.fc1(x))
x = F.dropout(x, aining, p=0.4) # 40%的⽐例随机失活神经元,减少过拟合
x = F.log_softmax(lf.fc2(x), dim=1)
return x
# 提取特征图,返回前两层卷积层的特征图
健身海报def retrieve_features(lf, x):
feature_map1 = F.v1(x))
x = lf.pool(feature_map1)
feature_map2 = F.v2(x))
return (feature_map1, feature_map2)
3.3 运⾏模型
net = ConvNet()
# 采⽤多GPU训练
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net, device_ids=[0, 1])
<(device)
print(net)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(params=net.parameters(), lr=0.0001)
record = [] # 记录准确率等数值
weights = [] # 每若⼲步就记录⼀次卷积核
def rightness(output, target):
# torch.max函数返回输⼊张量给定维度上每⾏的最⼤值,并同时返回每个最⼤值的位置索引
preds = output.data.max(dim=1, keepdim=True)[1] # keepdim保持输出的维度
return preds.eq(target.data.view_as(preds)).sum(), len(target) # 返回数值为:(正确样例数,总样本数)
best_acc = 0.0 # 最优准确率
best_epoch = 0 # 最优轮次
save_path = './ConvNet.pth'
for epoch in range(num_epochs):
# 训练
train_rights = [] # 每轮次训练得到的准确数量
# enumerate起到枚举器的作⽤,在train_loader循环时,枚举器会⾃动输出⼀个数字指⽰循环的次数,并记录在batch_idx中 for batch_idx, (data, target) in enumerate(train_loader):
data, target = Variable(data), Variable(target) # data:⼿写数字图像,target:该图像对应标签
output = (device))
loss = criterion(output, (device))
<_grad() # 清空所有被优化变量的梯度信息
loss.backward()
optimizer.step() # 进⾏单次优化,更新所有的参数
train_rights.append(rightness(output, (device)))
# 校验
net.eval() # 把所有的dropout层关闭
val_rights = [] # 每轮次校验得到的准确数量
_grad():
for (data, target) in val_loader:
data, target = Variable(data), Variable(target)
output = (device))
val_rights.append(rightness(output, (device)))
train_r = (sum([tup[0] for tup in train_rights]), sum([tup[1] for tup in train_rights]))
val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))
train_acc = 1.0 * train_r[0] / train_r[1]
val_acc = 1.0 * val_r[0] / val_r[1]
if val_acc > best_acc:
best_acc = val_acc
best_epoch = epoch + 1
torch.save(net.state_dict(), save_path)
print("[epoch {}] loss:{:.6f},train_acc:{:.2f}%,val_acc:{:.2f}%".format(
epoch + 1, loss.item(),
100 * train_acc, 100 * val_accbetterman
))
record.append((1 - train_acc, 1- val_acc))
weights.append([v1.weight.data.clone(), v1.bias.data.clone(),
print("best epoch: %d,best val_acc: %.2f" %(best_epoch, best_acc * 100))
输出结果为:
DataParallel(
DataParallel(
(module): ConvNet(
(conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=Fal) (conv2): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc1): Linear(in_features=392, out_features=512, bias=True)
脾胃不和怎么调理(fc2): Linear(in_features=512, out_features=10, bias=True)
)
)
[epoch 1] loss:0.732432,train_acc:61.29%,val_acc:73.60%
薛仁贵的妻子[epoch 2] loss:0.699121,train_acc:79.38%,val_acc:78.98%
[epoch 3] loss:0.430334,train_acc:83.38%,val_acc:82.08%
[epoch 4] loss:0.310445,train_acc:85.96%,val_acc:84.68%
[epoch 5] loss:0.424920,train_acc:88.09%,val_acc:87.00%
[epoch 6] loss:0.297486,train_acc:89.76%,val_acc:88.73%
[epoch 7] loss:0.328308,train_acc:90.91%,val_acc:90.00%
[epoch 8] loss:0.198407,train_acc:92.00%,val_acc:90.85%
[epoch 9] loss:0.150639,train_acc:92.74%,val_acc:91.80%
[epoch 10] loss:0.186586,train_acc:93.15%,val_acc:92.58%
[epoch 11] loss:0.148867,train_acc:93.81%,val_acc:93.38%
[epoch 12] loss:0.161239,train_acc:94.33%,val_acc:93.50%
[epoch 13] loss:0.190747,train_acc:94.63%,val_acc:94.18%
[epoch 14] loss:0.141780,train_acc:94.93%,val_acc:94.33%
[epoch 15] loss:0.137817,train_acc:95.16%,val_acc:94.70%
[epoch 16] loss:0.092569,train_acc:95.43%,val_acc:95.00%
[epoch 17] loss:0.115552,train_acc:95.61%,val_acc:95.12%
[epoch 18] loss:0.155165,train_acc:95.85%,val_acc:95.53%
[epoch 19] loss:0.127627,train_acc:96.06%,val_acc:95.28%
[epoch 20] loss:0.053196,train_acc:96.17%,val_acc:95.85%
[epoch 21] loss:0.152282,train_acc:96.34%,val_acc:95.80%
[epoch 22] loss:0.047420,train_acc:96.44%,val_acc:95.90%
[epoch 23] loss:0.097075,train_acc:96.61%,val_acc:96.03%
[epoch 24] loss:0.209956,train_acc:96.66%,val_acc:96.25%
[epoch 25] loss:0.034327,train_acc:96.83%,val_acc:96.13%
[epoch 26] loss:0.238308,train_acc:96.90%,val_acc:96.40%
[epoch 27] loss:0.023966,train_acc:96.95%,val_acc:96.60%
[epoch 28] loss:0.161187,train_acc:97.05%,val_acc:96.18%
[epoch 29] loss:0.019604,train_acc:97.08%,val_acc:96.65%
[epoch 30] loss:0.041736,train_acc:97.20%,val_acc:96.70%
[epoch 31] loss:0.075512,train_acc:97.29%,val_acc:96.48%
[epoch 32] loss:0.103057,train_acc:97.38%,val_acc:96.45%
[epoch 33] loss:0.136958,train_acc:97.49%,val_acc:96.68%
[epoch 34] loss:0.143319,train_acc:97.41%,val_acc:96.78%
[epoch 35] loss:0.060183,train_acc:97.49%,val_acc:96.88%
[epoch 36] loss:0.032935,train_acc:97.58%,val_acc:96.93%
[epoch 37] loss:0.076284,train_acc:97.60%,val_acc:96.95%
[epoch 38] loss:0.040283,train_acc:97.65%,val_acc:96.95%
[epoch 39] loss:0.064808,train_acc:97.70%,val_acc:97.03%
[epoch 40] loss:0.231935,train_acc:97.83%,val_acc:96.85%
[epoch 41] loss:0.049855,train_acc:97.80%,val_acc:96.95%
[epoch 42] loss:0.042273,train_acc:97.84%,val_acc:97.13%
[epoch 43] loss:0.065264,train_acc:97.86%,val_acc:97.25%
[epoch 44] loss:0.147135,train_acc:97.84%,val_acc:97.23%
[epoch 45] loss:0.052399,train_acc:97.95%,val_acc:97.05%
[epoch 46] loss:0.053043,train_acc:97.90%,val_acc:97.13%
我最爱的家乡
[epoch 47] loss:0.104675,train_acc:98.08%,val_acc:97.18%
[epoch 48] loss:0.042580,train_acc:98.06%,val_acc:97.20%
[epoch 49] loss:0.127764,train_acc:98.01%,val_acc:97.43%
[epoch 50] loss:0.038456,train_acc:98.10%,val_acc:97.50%
[epoch 51] loss:0.077706,train_acc:98.20%,val_acc:97.33%
[epoch 52] loss:0.072369,train_acc:98.17%,val_acc:97.40%
[epoch 53] loss:0.072277,train_acc:98.16%,val_acc:97.23%
[epoch 54] loss:0.036564,train_acc:98.22%,val_acc:97.30%
[epoch 55] loss:0.053939,train_acc:98.33%,val_acc:97.38%
[epoch 56] loss:0.103391,train_acc:98.31%,val_acc:97.38%
[epoch 57] loss:0.105614,train_acc:98.26%,val_acc:97.40%