CAM实现的流程(pytorch)
2021/2/28 更新
之前写了⼀个简化版本()的可视化过程,简化版的可视化没有考虑到通道之间的关系。这篇将介绍cam的流程。
下⼀篇为
⽬录
流程图
算法思路
1. 将要可视化的图⽚输进⽹络模型,判断出所属类别
2. 获取最后⼀个卷积层的输出特征图
3. 通过图⽚所属类别,得到权重,对获取的特征图的各个通道赋值,并且相加为单通道的特征图
举个例⼦
如果输⼊⼀张图⽚,通过⽹络模型之后,判断这张图⽚为第500类(总共1000类)。获取的特征图shape为(1,512,13,13),假设分类层为1 x 1卷积(这⾥就不算是最后⼀个卷积层,⽽是属于分类层)和全局平均池化组成。那么,1000个类别有1000种权重,也就是说能够给特征图赋1000种值。每个权重关注点不⼀样,所以才需要知道图⽚属于哪个类别。知道它是500类后,那么只需要拿出第500个类别的权重赋给特征图就ok了。
CAM算法有⼀个制约条件,需要⽤到全局平均池化的操作,如果最后有多层全连接层,那么CAM算法就不适⽤了。⽐如vgg16,最后⼀个卷积层之后,接了三个全连接层,由于卷积层的输出特征图需要flatten才能接⼊全连接层,在经过三个全连接层后,已经难以算出通道之间的联系,则很难去计算各个特征图通道的权重重要性。这种情况下就需要⽤到Grad-Cam算法了。
代码分析
1.导⼊各种包,并且读取类别标签
from PIL import Image
import torch
from torchvision import models, transforms
from torch.autograd import Variable
import functional as F
import numpy as np
import cv2
import json
# 读取 imagenet数据集的类别标签
json_path ='./cam/labels.json'
vs什么意思with open(json_path,'r')as load_f:
load_json = json.load(load_f)
class ={int(key): value for(key, value)
in load_json.items()}
2.读取图⽚,并预处理
大昭寺简介# 读取 imagenet数据集的某类图⽚
img_path ='./cam/9933031-large.jpg'
normalize = transforms.Normalize(
mean=[0.485,0.456,0.406],
std=[0.229,0.224,0.225]
)
# 图⽚预处理
preprocess = transforms.Compo([
transforms.Resize((224,224)),
transforms.ToTensor(),
normalize
])
img_pil = Image.open(img_path)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))
3.加载预训练模型
# 加载预训练模型
model_id =1
if model_id ==1:
net = models.squeezenet1_1(pretrained=Fal)
pthfile = r'./pretrained/squeezenet1_1-f364aa15.pth'
net.load_state_dict(torch.load(pthfile))
finalconv_name ='features'# 获取卷积层的特征
elif model_id ==2:
net = snet18(pretrained=Fal)
finalconv_name ='layer4'
elif model_id ==3:
net = models.dennet161(pretrained=Fal)
finalconv_name ='features'
net.eval()# 使⽤eval()属性
print(net)
我只下了net1_1,如果想使⽤其余两个模型,依葫芦画瓢⾃⾏修改。
打印模型的结果:
SqueezeNet(
(features):Sequential(
(0):Conv2d(3,64, kernel_size=(3,3), stride=(2,2))
(1):ReLU(inplace)
(2):MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(3):Fire(
(squeeze):Conv2d(64,16, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(16,64, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(16,64, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(4):Fire(
(squeeze):Conv2d(128,16, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(16,64, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(16,64, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(5):MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(6):Fire(
(squeeze):Conv2d(128,32, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(32,128, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(32,128, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(7):Fire(
(squeeze):Conv2d(256,32, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(32,128, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(32,128, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(8):MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
slight(9):Fire(
(squeeze):Conv2d(256,48, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(48,192, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(48,192, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(10):Fire(
(squeeze):Conv2d(384,48, kernel_size=(1,1), stride=(1,1))
育才教育怎么样
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(48,192, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(48,192, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(11):Fire(
(squeeze):Conv2d(384,64, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(64,256, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(64,256, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
(12):Fire(
(squeeze):Conv2d(512,64, kernel_size=(1,1), stride=(1,1))
(squeeze_activation):ReLU(inplace)
(expand1x1):Conv2d(64,256, kernel_size=(1,1), stride=(1,1))
(expand1x1_activation):ReLU(inplace)
(expand3x3):Conv2d(64,256, kernel_size=(3,3), stride=(1,1), padding=(1,1)) (expand3x3_activation):ReLU(inplace)
)
)
(classifier):Sequential(
(0):Dropout(p=0.5)
(1):Conv2d(512,1000, kernel_size=(1,1), stride=(1,1))
(2):ReLU(inplace)
(3):AdaptiveAvgPool2d(output_size=(1,1))
)
)
可以看到特征提取部分在(features)中,分类层在(classifier)中。
4.获取特征图
features_blobs =[]# 后⾯⽤于存放特征图
def hook_feature(module,input, output):
gomeifeatures_blobs.append(output.data.cpu().numpy())
# 获取 features 模块的输出
net._(finalconv_name).register_forward_hook(hook_feature)
register_forward_hook可以获取中间层输出,具体可⾃⾏百度。
闰秒
5.获取权重
# 获取权重
net_name =[]
params =[]
for name, param in net.named_parameters():
net_name.append(name)
params.append(param)
print(net_name[-1], net_name[-2])# classifier.1.bias classifier.1.weight
print(len(params))# 52
weight_softmax = np.squeeze(params[-2].data.numpy())# shape:(1000, 512)
params 中保存了模型的所有权重,怎么索引到我们需要的呢?再回到模型打印结果那⾥,由于poolin
g层、dropout层以及ReLU激活是不保存参数的,将所有的卷积、激活操作数下来,发现⼀共有52层有参数。如果要获取features模块到classifier模块的权重,那么就是获取classifier中(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))的参数。也可以看打印net_name的结果,发现-1表⽰classifier.1的偏置,-2表⽰classifier.1的权重。因此,我们要的就是索引为-2的参数。
logit = net(img_variable)# 计算输⼊图⽚通过⽹络后的输出值
print(logit.shape)# torch.Size([1, 1000])
print(params[-2].data.numpy().shape)# 权重有1000种 (1000, 512, 1, 1)
print(features_blobs[0].shape)# 特征图⼤⼩为 (1, 512, 13, 13)
# 结果有1000类,进⾏排序,并获得排序索引
h_x = F.softmax(logit, dim=1).data.squeeze()
print(h_x.shape)# torch.Size([1000])
thinnerprobs, idx = h_x.sort(0,True)
probs = probs.numpy()# 概率值排序
idx = idx.numpy()# 类别索引排序,概率值越⾼,索引越靠前
# 取概率值为前5的类别看看类别名和概率值
for i in range(0,5):
print('{:.3f} -> {}'.format(probs[i], class[idx[i]]))
'''
0.678 -> mountain bike, all-terrain bike, off-roader
0.088 -> bicycle-built-for-two, tandem bicycle, tandem
0.042 -> unicycle, monocycle
0.038 -> hor cart, hor-cart
0.019 -> lakeside, lakeshore
'''
6.定义计算CAM的函数
# 定义计算CAM的函数
def returnCAM(feature_conv, weight_softmax, class_idx):
# 类激活图上采样到 256 x 256
size_upsample =(256,256)
bz, nc, h, w = feature_conv.shape
output_cam =[]
# 将权重赋给卷积层:这⾥的weigh_softmax.shape为(1000, 512)
# feature_conv.shape为(1, 512, 13, 13)
# weight_softmax[class_idx]由于只选择了⼀个类别的权重,所以为(1, 512)
# shape((nc, h * w))后feature_conv.shape为(512, 169)
cam = weight_softmax[class_idx].dot(shape((nc, h * w)))
print(cam.shape)# 矩阵乘法之后,为各个特征通道赋值。输出shape为(1,169)
cam = shape(h, w)# 得到单张特征图
# 特征图上所有元素归⼀化到 0-1
cam_img =(cam - cam.min())/(cam.max()- cam.min())
# 再将元素更改到 0-255
cam_img = np.uint8(255* cam_img)
output_cam.size(cam_img, size_upsample))
return output_cam
7.⽣成图⽚
forthright
# 对概率最⾼的类别产⽣类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax,[idx[0]])
# 融合类激活图和原始图⽚
img = cv2.imread(img_path)
height, width, _ = img.shapedominate什么意思
heatmap = cv2.size(CAMs[0],(width, height)), cv2.COLORMAP_JET) result = heatmap *0.3+ img *0.7
cv2.imwrite('CAM0.jpg', result)
cv2.applyColorMap函数的作⽤这⾥不再赘述,上⼀篇博客中已经涉及。
# 对概率排在第五的类别产⽣类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax,[idx[4]])
# 融合类激活图和原始图⽚
运营商英文img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.size(CAMs[0],(width, height)), cv2.COLORMAP_JET) result = heatmap *0.3+ img *0.7
cv2.imwrite('CAM1.jpg', result)