首页 > 英语园地

图像生成：变分自编码器（VAE）和生成式对抗网络（GAN）

更新时间:2023-05-18 14:06:59 阅读：评论：0

图像⽣成：变分⾃编码器（VAE）和⽣成式对抗⽹络（GAN）

（本⽂由《Python深度学习》整理）

图像⽣成的关键思想是找到⼀个低维的表⽰潜在空间（latent space），其中任意点都可以被映射为⼀张逼真的图像，这种映射模块叫⽣成器（generator，对于GAN）或解码器（decoder，对于VAE）。

VAE和GAN各⾃的优缺点：

VAE适合学习具有良好结构的潜在空间（连续性、低维度）；

GAN⽣成的图像逼真，但潜在空间可能没有良好结构。

1. 变分编码器（VAE）

⽬的：将输⼊编码到低维潜在空间，再解码回来，使其和原始图像有⼀样的尺⼨。

1. 编码过程：将输⼊图像转换为统计分布参数，即均值、⽅差。

2. 解码过程：从正态分布中随机采样⼀个元素，将这个元素解码到原始输⼊。

3. 损失函数：重构损失（使解码后的样本匹配初始输⼊）、正则化损失（学习具有良好结构的潜在空间）

⼤致代码：

z_mean, z_log_variance = encoder(input_img)

z = z_mean + exp(0.5* z_log_variance)*epsilon

reconstructed_img = decoder(z)

model = Model(input_img, reconstructed_img)

具体代码：

#潜在空间采样

劳动节英文

#包装到lamda层

def sampling(args):

z_mean, z_log_var = args

epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),mean=0., stddev=1.)

return z_mean + K.exp(0.5*z_log_var)* epsilon

z = layers.Lamda(sampling)([z_mean, z_log_var])

#VAE解码器⽹络，将潜在空间点映射为图像

decoder_input = layers.Input(K.int_shape(z)[1:])

x = layers.Den(np.prod(shape_before_flattening[1:]),activation='relu')(decoder_input)

x = layers.Reshape(shape_before_flattening[1:])(x)

x = layers.Conv2DTranspo(32,3, padding='same',activation='relu',strides=(2,2))(x)

decoder = Model(decoder_input, x)

z_decoded = decoder(z)

#⽤于计算VAE损失的⾃定义层

class CustomVariationalLayer(keras.layers.Layer):

def vae_loss(lf, x, z_decoded):

x = K.flatten(x)

z_decoded = K.flatten(z_decoded)

xent_loss = ics.binary_crosntropy(x, z_decoded)

kl_loss =-5e-4* K.mean(1+ z_log_var - K.square(z_mean)-K.exp(z_log_var), axis=-1)

an(xent_loss + kl_loss)

def call(lf, inputs):

x = inputs[0]

z_decoded = inputs[1]

christmas song

loss = lf.vae_loss(x, z_decoded)

lf.add_loss(loss, inputs=inputs)

return x

y = CustomVariationalLayer()([input_img, z_decoded])

#训练VAE（在MNIST训练）

from keras.datats import mnist

vae = Model(input_img, y)

vae.summary()

(x_train, _),(x_test, y_test)= mnist.load_data()

x_train = x_train.astype('float32')/255.

x_train = shape(x_train.shape +(1,))

x_test = x_test.astype('float32')/255.

x_test = shape(x_test.shape +(1,))

vae.fit(x=x_train, y=None, shuffle=True, epochs=10, batch_size=batch_size, validation_data=(x_test,None))

#使⽤训练的⽹络，从⼆维潜在空间采取⼀组点的⽹格，将其解码为图像

import matplotlib.pyplot as plt

from scipy.stats import norm

n =15

digit_size =28

figure = np.zeros((digit_size * n, digit_size * n))

grid_x = norm.ppf(np.linspace(0.05,0.95, n))

grid_y = norm.ppf(np.linspace(0.05,0.95, n))

for i, yi in enumerate(grid_x):

for j, xi in enumerate(grid_y):

z_sample = np.array([xi, yi])

z_sample = np.tile(z_sample, batch_size).reshape(batch_size,2)

x_decoded = decoder.predict(z_sample, batch_size=batch_size)

digit = x_decoded[0].reshape(digit_size, digit_size)

figure[i*digit_size:(i+1)*digit_size, j*digit_size:(j+1)*digit_size]= digit

plt.figure(figsize=(10,10))

plt.imshow(figure, cmap='Greys_r')

plt.show()

2. ⽣成式对抗⽹络（GAN）

2.1 组成

⽣成器⽹络（generator network）：以⼀个随机向量（潜在空间的⼀个点）作为输⼊，将其解码为⼀张合成图像。判别器⽹络（discriminator network）：以⼀张图像（真实的或合成的）作为输⼊，预测来⾃训练集还是⽣成⽹络。

2.2 过程

looking1. generator⽹络将形状为（latent_dim,）的潜在空间向量映射到形状为（32，32,3）的图像。

2. discriminator⽹络将形状为（32,32,3）的图像映射到⼀个⼆进制分数，⽤于评估图像为真的概率。

3. gan⽹络将generator⽹络和discriminator⽹络连接在⼀起：gan(x) = discriminator(generator(x))，代表将潜在向量映射到判别

器的结果。

4. ⽤带“真/假”标签的真假图像来训练判别器。

5. ⽤gan模型的损失相对于⽣成器权重的梯度来训练⽣成器，向某个⽅向移动，来欺骗判别器。

具体代码：

#⽣成器

penceimport keras

from keras import layers

import numpy as np

latent_dim =32

height =32

width =32

channels =3

generator_input = keras.Input(shape=(latent_dim,))

drill是什么意思#将输⼊转换为⼤⼩16×16的128个通道的特征图

x = layers.Den(128*16*16)(generator_input)

x = layers.LeakyReLU()(x)

x = layers.Reshape((16,16,128))(x)

x = layers.Conv2D(256,5, padding='same')(x)

x = layers.LeakyReLU()(x)

#上采样为32×32

x = layers.Conv2DTranspo(256,4, stride=2, padding='same')(x)

x = layers.LeakyReLU()(x)

x = layers.Conv2D(256,5, padding='same')(x)

x = layers.LeakyReLU()(x)

x = layers.Conv2D(256,5, padding='same')(x)

x = layers.LeakyReLU()(x)

x = layers.Conv2D(channels,7, activation='tanh', padding='same')(x)

#将⽣成器模型实例化，它将形状为(latent_dim,)的输⼊映射到形状为(32, 32, 3)的图像

generator = dels.Model(generator_input, x)

generator.summary()

#判别器没离开过英文

discriminator_input = layers.Input(shape=(height, width, channels))

x = layers.Conv2D(128,3)(discriminator_input)

x = layers.LeakyReLU()(x)

x = layers.Conv2D(128,4, strides=2)(x)

x = layers.LeakyReLU()(x)

x = layers.Conv2D(128,4, strides=2)(x)

x = layers.LeakyReLU()(x)

x = layers.Conv2D(128,4, strides=2)(x)

x = layers.LeakyReLU()(x)

x = layers.Flatten()(x)

x = layers.Dropout(0.4)(x)

x = layers.Den(1, activation='sigmoid')(x)#分类层

#将判别器实例化，它将形状为（32,32,3）的输⼊转换为⼀个⼆进制的分类决策（真/假）

discriminator = dels.Model(discriminator_input, x)

discriminator.summary()

#在优化器中使⽤梯度裁剪来限制梯度值的范围；使⽤学习率衰减来稳定系列过程

discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0008, clipvalue=1.0, decay=1e-8) pile(optimizer=discriminator_optimizer, loss='binary_crosntropy')

#对抗⽹络（将潜在空间的点转换为⼀个分类决策，需要将判别器设置为冻结）ainable =Fal

gan_input = keras.Input(shape=(latent_dim,))

gan_output = discriminator(generator(gan_input))

走遍美国文本下载gan = dels.Model(gan_input, gan_output)

gan_optimizer = keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)

#训练DCGAN

说明：训练过程每轮都进⾏如下操作

1. 从潜在空间中抽取随机的点（随机噪声）；

2. ⽤generator⽣成图像；

3. 将⽣成图像和真实图像混合；

4. 使⽤混合的图像和标签来训练discriminator；

5. 在潜在空间随机抽取新的点；

6. 使⽤这些随机向量以及全部是“真实图像”的标签来训练gan，这会更新⽣成器的权重。

import os

from keras.preprocessing import image

(x_train,y_train),(_, _)= keras.datats.cifar10.load_data()

x_train = x_train[y_train.flatten()==6]#选择青蛙图像编号为6

pashmina是什么意思

x_train = shape((x_train.shape[0],)+(height, width, channels).astype('float32')/255.#数据标准化

iterations =10000

batch_size =20

save_dir ='your_dir'

start =0

for step in range(iterations):

#在潜在空间随机采样

random_latent_vectors = al(size=(batch_size,latent_dim))

#将这些点解码为虚假图像

generated_images = generator.predict(random_latent_vectors)

#将这些虚假图像和真实图像合在⼀起

stop = start + batch_size

real_images = x_train[start:stop]

combined_images = np.concatenate([generated_images, real_images])

labels = np.concatenate([np.ones((batch_size,1)),np.zeros((batch_size,1))])

#向标签中添加随机噪声

label +=0.05* np.random.random(labels.shape)

#训练判别器

d_loss = ain_on_batch(combined_images,labels)

#在潜在空间中采样随机点

random_latent_vectors = al(size=(batch_size,latent_dim))

#合并标签，假装全是真实图像

misleading_targets = np.zeros((batch_size,1))

#通过GAN模型训练⽣成器（冻结判别器权重）

whole numbera_loss = ain_on_batch(random_latent_vectors, misleading_targets)

start += batch_sizeaina

if start >len(x_train)- batch_size:

start =0

if start %100==0:

gan.save_weights('gan.h5')#保存权重模型

print('discriminator loss:', d_loss)

print('adversarial loss:', a_loss)

img = image.array_to_img(generated_image[0]*255., scale =Fal)

img.save(os.path.join(save_dir,'generated_frog'+str(step)+'.png'))

img = image.array_to_img(real_image[0]*255., scale =Fal)

img.save(os.path.join(save_dir,'real_frog'+str(step)+'.png'))

【训练技巧】

1. 使⽤tanh作为⽣成器最后⼀层激活；

2. 使⽤正态分布⽽不是均匀分布对潜在空间进⾏采样；

3. 引⼊随机性：在判别器中使⽤dropout；向判别器标签添加随机噪声；

4. 放宽稀疏性限制：⽤步进卷积代替池化进⾏下采样；使⽤LeakyReLU代替ReLU激活；

5. 避免⽣成器像素空间不均匀：⽣成器和判别器中使⽤步进的卷积和反卷积时，内核⼤⼩要能被步幅⼤⼩整除。

本文发布于:2023-05-18 14:06:59，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/682258.html

上一篇：AD元器件库名称对照

下一篇：MODEM常用术语解释

标签：图像空间判别成器训练损失解码映射

留言与评论（共有 0 条评论）