Diffusion Model

最新推荐文章于 2025-04-13 23:09:32 发布

hawk2014bj

最新推荐文章于 2025-04-13 23:09:32 发布

阅读量403

点赞数 2

文章标签：扩散模型

本文链接：https://blog.csdn.net/hawk2014bj/article/details/143491877

版权

Diffusion Model 是图片生成模型，Diffusion 的原理是将杂音图片还原成原始图片，通过提示词生成最终的图片。本文只是用 Diffusion Model，不输入任何的提示词。

下图为 Stable Diffusion 的网络架构，本文使用的是一个 UNet，没有 Text 也没有 Latent。
在这里插入图片描述

下载模型并生成

本文使用的模型是 google/ddpm-celebahq-256，DDPM 模型，这个模型没有做过任何优化，需要 1000 步才能生成图片，这里只是用来学习，需要 GPU 运行。

from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline
model_id = "google/ddpm-celebahq-256"

# load model and scheduler
image_pipe = DDPMPipeline.from_pretrained("google/ddpm-celebahq-256")
image_pipe.to("cuda")

image = image_pipe().images[0]

模型输出的为一张图片
在这里插入图片描述
使用教堂数据集

from diffusers import UNet2DModel

repo_id = "google/ddpm-church-256"
model = UNet2DModel.from_pretrained(repo_id)

Model 为 UNet 架构

model.config

在这里插入图片描述
UNet 就是从 Noise Image 转到 Image 的过程，首先要创建一个 Noise Image。

import torch

torch.manual_seed(0)

noisy_sample = torch.randn(
    1, model.config.in_channels, model.config.sample_size, model.config.sample_size
)
noisy_sample.shape

在这里插入图片描述
给定时间戳，可以得到输出的图片，输入/输出的形状一致

with torch.no_grad():
    noisy_residual = model(sample=noisy_sample, timestep=2).sample

在这里插入图片描述
Diffusion Model 中的 Scheduler 在训练过程中负责在图片中添加噪声，在推理过程中，同样需要它。

from diffusers import DDPMScheduler

scheduler = DDPMScheduler.from_config(repo_id)

在这里插入图片描述
进行推理，并将过程中的图片进行打印

import PIL.Image
import numpy as np

def display_sample(sample, i):
    image_processed = sample.cpu().permute(0, 2, 3, 1)
    image_processed = (image_processed + 1.0) * 127.5
    image_processed = image_processed.numpy().astype(np.uint8)

    image_pil = PIL.Image.fromarray(image_processed[0])
    display(f"Image at step {i}")
    display(image_pil)

import tqdm
model.to("cuda")
noisy_sample = noisy_sample.to("cuda")
sample = noisy_sample

for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
  # 1. predict noise residual
  with torch.no_grad():
      residual = model(sample, t).sample

  # 2. compute less noisy image and set x_t -> x_t-1
  sample = scheduler.step(residual, t, sample).prev_sample

  # 3. optionally look at image
  if (i + 1) % 300 == 0:
      display_sample(sample, i + 1)

可以看到图片逐渐清晰。
在这里插入图片描述