Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier-free guidance? #6

Open
mehdidc opened this issue Apr 28, 2023 · 7 comments
Open

Classifier-free guidance? #6

mehdidc opened this issue Apr 28, 2023 · 7 comments

Comments

@mehdidc
Copy link
Contributor

mehdidc commented Apr 28, 2023

I might have missed it in the code, but I can't see whether we randomly drop the captions for classifier-free guidance (which is already used at inference).

@vkramanuj
Copy link
Contributor

vkramanuj commented Apr 28, 2023

Hi, thanks for the question. I didn't notice a practical difference between no text dropout and some text dropout in my experiments, so I left it out of this repo. However, I can push a branch later today and potentially merge after some testing. For reference, the implementation is just randomly substituting the input string with the empty string, similar to how it's done at inference time in diffusers (https://github.com/huggingface/diffusers/blob/384c83aa9a1f268e5587d5ea1ea9f4c040845167/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L371)

@mehdidc
Copy link
Contributor Author

mehdidc commented Apr 28, 2023

Ah, you mean you could apply classifier-free guidance at inference even the model never encounter empty strings (like done in the repo you mention), isn't this unexpected?

@vkramanuj
Copy link
Contributor

vkramanuj commented Apr 28, 2023

Yes, the guidance_scale parameter still worked, which is surprising. However it's possible there's some hit in performance for longer training runs, so it does make sense to add to this repo.

@mehdidc
Copy link
Contributor Author

mehdidc commented Apr 28, 2023

I think it's interesting that it worked anyway, on which dataset you were training ?

@vkramanuj
Copy link
Contributor

@mehdidc I've used several datasets with this setup, mostly various filtered versions of LAION-2b (e.g. laion aesthetics and laion high res). I've added text dropout in the text-dropout branch (https://github.com/mlfoundations/open-diffusion/tree/text-dropout). Specifically, the changes are at:

  1. To add the option to the WebDataset class:
    text_dropout=0.0,
  2. To add the conditional dropout to the data pipeline:
    wds.map_dict(input_ids=lambda text: "" if r.random() < text_dropout else text),

I haven't been able to test this code recently due to lack of resources. Let me know if you get a chance to try this out, and I can merge it into main.

@mehdidc
Copy link
Contributor Author

mehdidc commented May 5, 2023

Thanks @vkramanuj for the implementation, I can try to do some runs, do you maybe have the config file you used in your tests with LAION aesthetics and/or high res, so that we can compare more or less directly ?

@vkramanuj
Copy link
Contributor

vkramanuj commented May 9, 2023

Here's one. I removed my wandb and some pathing info for privacy reasons. You'd need to replace the webdataset path with one for laion high-res aesthetics (either original width/height >=512 or >=1024). Try to make the global batch size 2048 with either more GPUs or gradient accumulation. Note this is using the SD v1 architecture, which I found has better throughput and allows for greater per gpu batch size.

system:
    gradient_accumulation: 1
    batch_size: 32
    workers: 6
    dist_backend: ${distributed.dist_backend}
    dist_url: ${distributed.dist_url}

distributed:
    dist_backend: 'nccl'
    dist_url: 'env://'

experiment:
    log_dir: <path>/sd-logs
    name: "laion-2b-aesthetics-hr"
    project: "diffusion"
    num_examples_to_see: 2000000000
    save_every: 2000
    requeue: True

optimizer:
    name: adamw
    params:
        learning_rate: 0.0001
        beta1: 0.9
        beta2: 0.98 # changed from initial sd value for training stability
        weight_decay: 0.01
        epsilon: 0.00000001

model:
    vae:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"

    text_encoder:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"

    tokenizer:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"
    
    scheduler:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"
    
    unet:
        target: UNet2DConditionModel
        params:
            act_fn: "silu"
            attention_head_dim: 8
            block_out_channels: [320, 640, 1280, 1280]
            center_input_sample: False
            cross_attention_dim: 768
            down_block_types: ["CrossAttnDownBlock2D","CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D"]
            downsample_padding: 1
            flip_sin_to_cos: true
            freq_shift: 0
            in_channels: 4
            layers_per_block: 2
            mid_block_scale_factor: 1
            norm_eps: 1e-05
            norm_num_groups: 32
            out_channels: 4
            sample_size: 32
            up_block_types: [
                "UpBlock2D",
                "CrossAttnUpBlock2D",
                "CrossAttnUpBlock2D",
                "CrossAttnUpBlock2D"
            ]
        
    use_ema: True
    mixed_precision: bf16
    gradient_checkpointing: True
    xformers: True


dataset:
    type: WebDataset
    params: 
        path: "pipe:aws s3 cp s3://s-datasets/laion5b/laion2B-data/{000000..231349}.tar -"
        batch_size: ${system.batch_size}
        workers: ${system.workers}
        num_examples_to_see: ${experiment.num_examples_to_see}
        resolution: 512
        text_dropout: 0.0

lr_scheduler:
    scheduler: "ConstantWithWarmup"
    params:
        learning_rate: ${optimizer.params.learning_rate}
        warmup_length: 500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants