图像生成:变分自编码器(VAE)和生成式对抗网络(GAN)

更新时间:2023-05-18 14:06:59 阅读: 评论:0

图像⽣成:变分⾃编码器(VAE)和⽣成式对抗⽹络(GAN)
(本⽂由《Python深度学习》整理)
图像⽣成的关键思想是找到⼀个低维的表⽰潜在空间(latent space),其中任意点都可以被映射为⼀张逼真的图像,这种映射模块叫⽣成器(generator,对于GAN)或解码器(decoder,对于VAE)。
VAE和GAN各⾃的优缺点:
VAE适合学习具有良好结构的潜在空间(连续性、低维度);
GAN⽣成的图像逼真,但潜在空间可能没有良好结构。
1. 变分编码器(VAE)
⽬的:将输⼊编码到低维潜在空间 ,再解码回来,使其和原始图像有⼀样的尺⼨。
1. 编码过程:将输⼊图像转换为统计分布参数,即均值、⽅差。
2. 解码过程:从正态分布中随机采样⼀个元素,将这个元素解码到原始输⼊。
3. 损失函数:重构损失(使解码后的样本匹配初始输⼊)、正则化损失(学习具有良好结构的潜在空间)
⼤致代码:
z_mean, z_log_variance = encoder(input_img)
z = z_mean + exp(0.5* z_log_variance)*epsilon
reconstructed_img = decoder(z)
model = Model(input_img, reconstructed_img)
具体代码:
#潜在空间采样
劳动节 英文
#包装到lamda层
def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),mean=0., stddev=1.)
return z_mean + K.exp(0.5*z_log_var)* epsilon
z = layers.Lamda(sampling)([z_mean, z_log_var])
#VAE解码器⽹络,将潜在空间点映射为图像
decoder_input = layers.Input(K.int_shape(z)[1:])
x = layers.Den(np.prod(shape_before_flattening[1:]),activation='relu')(decoder_input)
x = layers.Reshape(shape_before_flattening[1:])(x)
x = layers.Conv2DTranspo(32,3, padding='same',activation='relu',strides=(2,2))(x)
decoder = Model(decoder_input, x)
z_decoded = decoder(z)
#⽤于计算VAE损失的⾃定义层
class CustomVariationalLayer(keras.layers.Layer):
def vae_loss(lf, x, z_decoded):
x = K.flatten(x)
z_decoded = K.flatten(z_decoded)
xent_loss = ics.binary_crosntropy(x, z_decoded)
kl_loss =-5e-4* K.mean(1+ z_log_var - K.square(z_mean)-K.exp(z_log_var), axis=-1)
an(xent_loss + kl_loss)
def call(lf, inputs):
x = inputs[0]
z_decoded = inputs[1]
christmas song
loss = lf.vae_loss(x, z_decoded)
lf.add_loss(loss, inputs=inputs)
return x
y = CustomVariationalLayer()([input_img, z_decoded])
#训练VAE(在MNIST训练)
from keras.datats import mnist
vae = Model(input_img, y)
vae.summary()
(x_train, _),(x_test, y_test)= mnist.load_data()
x_train = x_train.astype('float32')/255.
x_train = shape(x_train.shape +(1,))
x_test = x_test.astype('float32')/255.
x_test = shape(x_test.shape +(1,))
vae.fit(x=x_train, y=None, shuffle=True, epochs=10, batch_size=batch_size, validation_data=(x_test,None))
#使⽤训练的⽹络,从⼆维潜在空间采取⼀组点的⽹格,将其解码为图像
import matplotlib.pyplot as plt
from scipy.stats import norm
n =15
digit_size =28
figure = np.zeros((digit_size * n, digit_size * n))
grid_x = norm.ppf(np.linspace(0.05,0.95, n))
grid_y = norm.ppf(np.linspace(0.05,0.95, n))
for i, yi in enumerate(grid_x):
for j, xi in enumerate(grid_y):
z_sample = np.array([xi, yi])
z_sample = np.tile(z_sample, batch_size).reshape(batch_size,2)
x_decoded = decoder.predict(z_sample, batch_size=batch_size)
digit = x_decoded[0].reshape(digit_size, digit_size)
figure[i*digit_size:(i+1)*digit_size, j*digit_size:(j+1)*digit_size]= digit
plt.figure(figsize=(10,10))
plt.imshow(figure, cmap='Greys_r')
plt.show()
2. ⽣成式对抗⽹络(GAN)
2.1 组成
⽣成器⽹络(generator network):以⼀个随机向量(潜在空间的⼀个点)作为输⼊,将其解码为⼀张合成图像。判别器⽹络(discriminator network):以⼀张图像(真实的或合成的)作为输⼊,预测来⾃训练集还是⽣成⽹络。
2.2 过程
looking1. generator⽹络将形状为(latent_dim,)的潜在空间向量映射到形状为(32,32,3)的图像。
2. discriminator⽹络将形状为(32,32,3)的图像映射到⼀个⼆进制分数,⽤于评估图像为真的概率。
3. gan⽹络将generator⽹络和discriminator⽹络连接在⼀起:gan(x) = discriminator(generator(x)),代表将潜在向量映射到判别
器的结果。
4. ⽤带“真/假”标签的真假图像来训练判别器。
5. ⽤gan模型的损失相对于⽣成器权重的梯度来训练⽣成器,向某个⽅向移动,来欺骗判别器。
具体代码:
#⽣成器
penceimport keras
from keras import layers
import numpy as np
latent_dim =32
height =32
width =32
channels =3
generator_input = keras.Input(shape=(latent_dim,))
drill是什么意思#将输⼊转换为⼤⼩16×16的128个通道的特征图
x = layers.Den(128*16*16)(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape((16,16,128))(x)
x = layers.Conv2D(256,5, padding='same')(x)
x = layers.LeakyReLU()(x)
#上采样为32×32
x = layers.Conv2DTranspo(256,4, stride=2, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256,5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256,5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(channels,7, activation='tanh', padding='same')(x)
#将⽣成器模型实例化,它将形状为(latent_dim,)的输⼊映射到形状为(32, 32, 3)的图像
generator = dels.Model(generator_input, x)
generator.summary()
#判别器没离开过 英文
discriminator_input = layers.Input(shape=(height, width, channels))
x = layers.Conv2D(128,3)(discriminator_input)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128,4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128,4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128,4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.4)(x)
x = layers.Den(1, activation='sigmoid')(x)#分类层
#将判别器实例化,它将形状为(32,32,3)的输⼊转换为⼀个⼆进制的分类决策(真/假)
discriminator = dels.Model(discriminator_input, x)
discriminator.summary()
#在优化器中使⽤梯度裁剪来限制梯度值的范围;使⽤学习率衰减来稳定系列过程
discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0008, clipvalue=1.0, decay=1e-8) pile(optimizer=discriminator_optimizer, loss='binary_crosntropy')
#对抗⽹络(将潜在空间的点转换为⼀个分类决策,需要将判别器设置为冻结)ainable =Fal
gan_input = keras.Input(shape=(latent_dim,))
gan_output = discriminator(generator(gan_input))
走遍美国文本下载gan = dels.Model(gan_input, gan_output)
gan_optimizer = keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)
#训练DCGAN
说明:训练过程每轮都进⾏如下操作
1. 从潜在空间中抽取随机的点(随机噪声);
2. ⽤generator⽣成图像;
3. 将⽣成图像和真实图像混合;
4. 使⽤混合的图像和标签来训练discriminator;
5. 在潜在空间随机抽取新的点;
6. 使⽤这些随机向量以及全部是“真实图像”的标签来训练gan,这会更新⽣成器的权重。
import os
from keras.preprocessing import image
(x_train,y_train),(_, _)= keras.datats.cifar10.load_data()
x_train = x_train[y_train.flatten()==6]#选择青蛙图像编号为6
pashmina是什么意思
x_train = shape((x_train.shape[0],)+(height, width, channels).astype('float32')/255.#数据标准化
iterations =10000
batch_size =20
save_dir ='your_dir'
start =0
for step in range(iterations):
#在潜在空间随机采样
random_latent_vectors = al(size=(batch_size,latent_dim))
#将这些点解码为虚假图像
generated_images = generator.predict(random_latent_vectors)
#将这些虚假图像和真实图像合在⼀起
stop = start + batch_size
real_images = x_train[start:stop]
combined_images = np.concatenate([generated_images, real_images])
labels = np.concatenate([np.ones((batch_size,1)),np.zeros((batch_size,1))])
#向标签中添加随机噪声
label +=0.05* np.random.random(labels.shape)
#训练判别器
d_loss = ain_on_batch(combined_images,labels)
#在潜在空间中采样随机点
random_latent_vectors = al(size=(batch_size,latent_dim))
#合并标签,假装全是真实图像
misleading_targets = np.zeros((batch_size,1))
#通过GAN模型训练⽣成器(冻结判别器权重)
whole numbera_loss = ain_on_batch(random_latent_vectors, misleading_targets)
start += batch_sizeaina
if start >len(x_train)- batch_size:
start =0
if start %100==0:
gan.save_weights('gan.h5')#保存权重模型
print('discriminator loss:', d_loss)
print('adversarial loss:', a_loss)
img = image.array_to_img(generated_image[0]*255., scale =Fal)
img.save(os.path.join(save_dir,'generated_frog'+str(step)+'.png'))
img = image.array_to_img(real_image[0]*255., scale =Fal)
img.save(os.path.join(save_dir,'real_frog'+str(step)+'.png'))
【训练技巧】
1. 使⽤tanh作为⽣成器最后⼀层激活;
2. 使⽤正态分布⽽不是均匀分布对潜在空间进⾏采样;
3. 引⼊随机性:在判别器中使⽤dropout;向判别器标签添加随机噪声;
4. 放宽稀疏性限制:⽤步进卷积代替池化进⾏下采样;使⽤LeakyReLU代替ReLU激活;
5. 避免⽣成器像素空间不均匀:⽣成器和判别器中使⽤步进的卷积和反卷积时,内核⼤⼩要能被步幅⼤⼩整除。

本文发布于:2023-05-18 14:06:59,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/682258.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:图像   空间   判别   成器   训练   损失   解码   映射
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图