Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable diffusion 3.5 support. #2578

Merged
merged 5 commits into from
Oct 27, 2024
Merged

Stable diffusion 3.5 support. #2578

merged 5 commits into from
Oct 27, 2024

Conversation

LaurentMazare
Copy link
Collaborator

Image generated with 3.5 turbo.

image

@LaurentMazare
Copy link
Collaborator Author

Another image generated with 3.5 large (non-turbo).
image

@super-fun-surf
Copy link

nice work cleaning up and integrating thank you.
where do you learn about how to make these 3.5 specific changes?

@LaurentMazare
Copy link
Collaborator Author

where do you learn about how to make these 3.5 specific changes?

Mostly by looking at the python reference implementation here: https://github.com/Stability-AI/sd3.5

Copy link
Contributor

@Czxck001 Czxck001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into sd3.5/mmdit-x.py in comparison with sd3-ref/mmdit.py. Looks like so far this PR is missing some changes to MMDiT blocks, including

  • an extra attention for the MMDiT-X block is placed here.
  • modified pre_attention for x_block (code here)
  • modified post-attention for x_block (code here)
  • and different block-joining (between x and context) is present here.

I noticed you already finished adapting the projection layer. Looks like this PR is still WIP or maybe changes in some files are missing in the pushing? Let me know if I got it right and if/how I can help.

Interestingly, I am still able to reproduce generating beautiful images, though. I'm not sure if my guess is right, but perhaps the original SD3.0's MMDiT could simply benefit from stacking more layers without having to change the architecture.

#[arg(long, default_value_t = 4.0)]
cfg_scale: f64,
#[arg(long)]
cfg_scale: Option<f64>,

// Time shift factor (alpha).
#[arg(long, default_value_t = 3.0)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed SAI's reference implementation of sd3.5 seems to change the recommended time_shift for sd3.0-medium from 3.0 to 1.0, but it's for dpmpp_2m sampler (code here). This wasn't the case in their original reference impl for sd3.0-medium, which is always 3.0 but for euler sampling (code here).

Not sure if it's worth to change the default behavior here. The difference should be small, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, as both the scheduler and the parameter changed I preferred keeping everything as they were before and not modify the existing behavior.

Comment on lines 265 to 272
{
let vb_vae = vb_fp16.rename_f(sd3_vae_vb_rename).pp("first_stage_model");
let autoencoder = build_sd3_vae_autoencoder(vb_vae)?;

// Apply TAESD3 scale factor. Seems to be significantly improving the quality of the image.
// https://github.com/comfyanonymous/ComfyUI/blob/3c60ecd7a83da43d694e26a77ca6b93106891251/nodes.py#L721-L723
autoencoder.decode(&((x / 1.5305)? + 0.0609)?)?
}
Copy link
Contributor

@Czxck001 Czxck001 Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know the VAE hasn't changed so this part could share across sd3.0 and sd3.5. Maybe it's worth to wrap those lines into a small function to avoid duplication, in case we need to change or tweak this part in the future. We can still keep the call site into a curly brace scope to control the lifecycle of VAE model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've pushed some changes to avoid a bit of the duplication (so the vae application and creation as well as the text model/mmdit application)

@LaurentMazare
Copy link
Collaborator Author

Thanks for looking into this, note that the new attention mechanism that you mentioned is I think triggered based an the existence of the attn2 weights here, but these do not exist for the published model so all the logic predicated by x_block_self_attn that you mentioned doesn't get used here. So overall I feel that we can avoid supporting this for the time being and revisit in a couple days when sd3.5 medium gets released in case it actually includes the new attention layers.

@Czxck001
Copy link
Contributor

Czxck001 commented Oct 27, 2024

@LaurentMazare Got it! I didn't expect that. Seems the new archetecture is for SD3.5-medium only. Yes I agree we should wait until SD3.5-medium comes out, which should be around the corner on Oct 29. I think I should have the bandwidth to do the support and testing over the next weekend. Let me know if you want me to be on it.

This PR LGTM. Thanks for the prompt support!

@LaurentMazare LaurentMazare merged commit 37e0ab8 into main Oct 27, 2024
10 of 11 checks passed
@LaurentMazare LaurentMazare deleted the sd3.5 branch October 27, 2024 09:01
@LaurentMazare
Copy link
Collaborator Author

Thanks, yeah let's have a look on the 29th and if it needs the new cross attention it would be great if you can add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants