Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing editing results #41

Open
momopusheen opened this issue Apr 6, 2024 · 1 comment
Open

Reproducing editing results #41

momopusheen opened this issue Apr 6, 2024 · 1 comment

Comments

@momopusheen
Copy link

Hi,

Thanks for your brilliant work!

I used the vanilla implementation provided in playground_real.ipynb for image editing, without making any modifications, but got unexpected results.

For example, when using the target prompt "a photo of a pixar superhero in NYC," the structure of the edited image did not align well with the original image.

I've installed the correct version of diffusers (0.15.0). The base model is SD v1.4.

Have you encountered similar cases on your end? Any insights or suggestions you could provide would be greatly appreciated.
Thanks for your support in advance.

Here are my editing results:
Target prompt: "a photo of a pixar superhero in NYC"

image

Target prompt: "a photo of a bronze horse in a museum"

image
@ljzycmd
Copy link
Collaborator

ljzycmd commented Apr 6, 2024

Hi @momopusheen, thanks for your attention. Note that MasaCtrl is designed for non-rigid editing that tries to maintain content consistency after editing. If you want to keep the layout unchanged after editing (change the global style or local object), you can use the following attention editor:

class MutualSelfAttentionStyle(AttentionBase):
    """
    Change the style of the orginal image
    """
    def __init__(self, end_step=25):
        super().__init__()
        self.end_step = end_step

    def forward(self, q, k, v, sim, attn, is_cross, place_in_unet, num_heads, **kwargs):
        if not is_cross:
            if self.cur_step < self.end_step:
                attn_u_ref, attn_u_cur, attn_c_ref, attn_c_cur = attn.chunk(4)
                attn = torch.cat([attn_u_ref, attn_u_ref, attn_c_ref, attn_c_ref], dim=0)

        return super().forward(q, k, v, sim, attn, is_cross, place_in_unet, num_heads, **kwargs)

In this editor, the self-attention maps are used to maintain the layout of the edited image, thus achieving style editing and local object editing.

Hope the above can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants