首页 > 英文翻译

cnn风格迁移_[论文阅读]使用卷积网络进行风格迁移

更新时间:2023-06-05 02:09:11 阅读：评论：0

cnn风格迁移_[论⽂阅读]使⽤卷积⽹络进⾏风格迁移

论⽂在Image Style Transfer Using Convolutional Neural Networks。

风格迁移⼀个⽐较⽕的应⽤是Prisma，关于其的介绍可参考AI修图艺术：Prisma背后的奇妙算法 | 深度，不过这篇论⽂的⼀个硬伤就是，Prisma⾥⾯的原理不是⽤这篇⽂章⾥的，这篇⽂章⾥有写：

其中alpha和belta分别代表风格和内容的权重⽐例，若alpha/belta⾼则⽣成图会更凸显出内容⽽风格化会少⼀些，低则⽣成图会风

Prisma提供⽤户调节的原理所在。

格化强烈但是内容被稀释，这也是 Prisma提供⽤户调节的原理所在

然⽽，调整这个权重⽐例后需要重新训练，⽽Prisma的训练都是在云端进⾏的，⽹络延迟是不允许很畅快的调整的，有兴趣的可以把Prisma下了玩⼀玩，还是很有趣的⼀个应⽤。

Prisma 背后的算法是怎样炼成的？它有什么其他应⽤价值？⾥⾯有⽐较专业的分析，还在评论⾥发现了论⽂Demystifying Neural Style Transfer的作者Naiyan Wang，都是⼤神啊，这篇论⽂也在读的计划之中（最近发现风格迁移的作⽤不⼤，主要是风格迁移的效果⽆法量化，因此读的计划⽆限搁置吧）。

知乎这篇⽂章⾥有个回答对风格迁移讲的很明⽩，这⾥摘抄如下：

卷积神经⽹络(CNN)可以学习到图像的各种特征：

1.底层（靠近输⼊层）可以获取诸如局部模式、边缘、描边等低层次信息；

2.⾼层（靠近输出层）可以获取物体部件和特征等⾼层次信息；

风格迁移做的就是利⽤算法使输出图像的底层表征与指定“风格”的图⽚相同，同时保持原图⽚的⾼层次表征。使得转换后的图像看起来像原图，但是拥有不同的风格。

0. Abstract

风格迁移的⼀个关键是讲内容和风格分离出来，这个是贯穿全⽂，请牢记于⼼：

Arguably, a major limiting factor for previous approaches has been the lack of image reprentations that explicitly reprent mantic information and, thus, allow to parate image content from style.

1. Introduction

风格迁移要将指定“风格”的图⽚的纹理特征迁移到输出图像中，同时保持输出图像的语义内容：

Transferring the style from one image onto another can be considered a problem of texture transfer. In texture transfer the goal is to synthesi a texture from a source image while constraining the texture synthesis in order to prerve the mantic content of a target image.

之前的算法只会利⽤底层特征，⽽理想情况下的风格迁移还会保留⽬标图像的语义信息。

论⽂讲的是利⽤CNN来同时获得底层特征和⾼级特征。

2. Deep image reprentations

论⽂利⽤的是VGG-19⽹络，对权值进⾏了正则化：

geroWe normalized the network by scaling the weights such that the mean activation of each convolutional filter over images and positions is equal to one.

不是很明⽩论⽂说该正则化不会影响⽹络的输出：

Such re-scaling can be done for the VGG network without changing its output, becau it contains only rectifying linear activation functions and no normalization or pooling over feature maps.

论⽂将Maxpool换成了Average pool.

2.1. Content reprentation

⾸先弄懂⼏个变量代表的意思吧：

A layer with

distinct filters has

feature maps each of size

, where

is the height times the width of the feature map.

代表的是第⼏层神经⽹络，

表⽰的是filter的数量，也是channel的数量，

表⽰的是feature map的⾯积。

这⾥先贴⼀张论⽂的图吧：

开始对Content Reconstructions不太理解，姑且不论中间经过了pool层，经过卷积后图像就会变化很多的，但是重建后的图像和原图很像，当时我⼀直不太明⽩重建的意思，后来发现论⽂写的还是很明⽩的：

To visuali the image information that is encoded at different layers of the hierarchy one can perform gradient descent on a white noi image to find another image that matches the feature respons of the original image.

什么意思呢，⾸先想想反步法，反步法的基本推导公式依靠的是

和

的关系，⼀般我们的

enhanced的源头都是⼀个全连接，但是我们想想，是不是每⼀个全连接层都是有这样的⼀个梯度矩阵，⽽这篇论⽂的思想就是⼀个white noi

image和Input image，两者同时经过卷积操作，到达

层的时候，以white noi image在

2020高考英语层的feature map为待训练的节点，以Input image在

层的feature map为label，这样就能得到在

层的梯度，然后利⽤反步法去推动white noi image发⽣改变。

之后定义了损失函数

，之后对其的求导，⼤于0的情况看懂了，平⽅项求导即可，⼩于0的情况不太了解，不知道为什么要强制设置为0：

如何提高语言表达能力

有了损失函数，之后就可以使⽤反步法了:

Thus we can change the initially random image

until it generates the same respon in a certain layer of the Convolutional Neural Network as the original image

之后对低层特征和⾼级特征的分析也是相当精彩的，低层特征能够重建准确的像素，⾼级特征只能得到high-level content，所以会使⽤⾼级特征的层作为content reprentation。

2.2. Style reprentation

merlin

论⽂讲的Gram matrix不太了解原理是什么:

It consists of the correlations between the different filter respons, where the expectation is taken over the spatial extent of the feature maps.

其作⽤是:

By including the feature correlations of multiple layers, we obtain a stationary, multi-scale reprentation of the input image, which captures its texture information but not the global arrangement.

然后就可以得到每⼀层的loss：

，总的style的loss是：

，注意，与content的loss只⽤了⼀个层不同，这⾥是多个层的加权和，我的理解是依据论⽂第4节Discussion⾥对style的理解，style可能是笔画这种低级特征，也可能是景⾊组成这种⾼级特征，所以必须综合使⽤各个层的loss：

The paration of image content from style is not necessarily a well defined problem. This is mostly

becau it is not clear what exactly defines the style of an image. It might be the brush strokes in a painting, the colour map, certain dominant forms and shapes, but also the composition of a scene and the choice of the subject of the image - and probably it is a mixture of all of them and many more.北京计算机培训班>应急演练总结

最后的偏导使⽤的技巧就是

，同2.1节，不理解为何⼩于0的那种情况：

⼤于0的推导是：

⽽

，注意下标，是

⽽不是

。

考研政治怎么复习2.3. Style transfer

⽣成风格迁移图的思想就是让新的图与内容图和风格图同时都接近：

To transfer the style of an artwork

onto a photograph

we synthesi a new image that simultaneously matches the content reprentation of

and the style reprentation of

. Thus we jointly minimi the distance of the feature reprentations of a white noi image from the content reprentation of the photograph in one layer and the style reprentation of the painting defined on a number of

layers of the Convolutional Neural Network.

总的loss函数是：

，注意这⾥的

和

是权重，这是在训练前就需要指定的，这也是说Prisma所⽤的算法不是这篇论⽂的⼀个原因（另外⼀个原因是速度！）。

L-BFGS可以参考数值优化：理解L-BFGS算法-码农场，其所⽤数学知识较复杂，待闲下来再详细看看。

3. Results

主要进⾏了⼀些实验，⾃⼰看看就好。

3.1. Trade-off between content and style matching

分析了

gregor mendel对结果的影响。

3.2. Effect of different layers of the Convolutional Neural Network

⽐较了卷积层对结果的影响。

3.3. Initialisation of gradient descent

英语谜语带翻译

⽐较了初始化图像对结果的影响。

3.4. Photorealistic style transfer

⽣成的图⽚很真实！

4. Discussion

训练速度⽐较慢，在K40上要训练⼀个⼩时。

对噪声很敏感。

⼀个遗憾就是风格迁移的效果很难量化，这是与科学的基本精神背道⽽驰的：

We are fully aware though that this evaluation criterion is neither mathematically preci nor universally agreed upon. [已完结]

本文发布于:2023-06-05 02:09:11，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/134310.html

上一篇：洛阳市七年级英语上册Unit8Whenisyourbirthday基础知识手册

下一篇：欧洲城市名字的由来（英荷德奥）

标签：特征风格图像迁移

留言与评论（共有 0 条评论）