Classifier-free guidance? #6

mehdidc · 2023-04-28T14:05:53Z

I might have missed it in the code, but I can't see whether we randomly drop the captions for classifier-free guidance (which is already used at inference).

vkramanuj · 2023-04-28T20:05:32Z

Hi, thanks for the question. I didn't notice a practical difference between no text dropout and some text dropout in my experiments, so I left it out of this repo. However, I can push a branch later today and potentially merge after some testing. For reference, the implementation is just randomly substituting the input string with the empty string, similar to how it's done at inference time in diffusers (https://github.com/huggingface/diffusers/blob/384c83aa9a1f268e5587d5ea1ea9f4c040845167/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L371)

mehdidc · 2023-04-28T20:12:53Z

Ah, you mean you could apply classifier-free guidance at inference even the model never encounter empty strings (like done in the repo you mention), isn't this unexpected?

vkramanuj · 2023-04-28T20:29:53Z

Yes, the guidance_scale parameter still worked, which is surprising. However it's possible there's some hit in performance for longer training runs, so it does make sense to add to this repo.

mehdidc · 2023-04-28T20:48:31Z

I think it's interesting that it worked anyway, on which dataset you were training ?

vkramanuj · 2023-05-02T06:45:19Z

@mehdidc I've used several datasets with this setup, mostly various filtered versions of LAION-2b (e.g. laion aesthetics and laion high res). I've added text dropout in the text-dropout branch (https://github.com/mlfoundations/open-diffusion/tree/text-dropout). Specifically, the changes are at:

To add the option to the WebDataset class:

open-diffusion/data/base.py

Line 113 in 8090dc9

text_dropout=0.0,
To add the conditional dropout to the data pipeline:

open-diffusion/data/base.py

Line 168 in 8090dc9

wds.map_dict(input_ids=lambda text: "" if r.random() < text_dropout else text),

I haven't been able to test this code recently due to lack of resources. Let me know if you get a chance to try this out, and I can merge it into main.

mehdidc · 2023-05-05T13:59:46Z

Thanks @vkramanuj for the implementation, I can try to do some runs, do you maybe have the config file you used in your tests with LAION aesthetics and/or high res, so that we can compare more or less directly ?

vkramanuj · 2023-05-09T02:46:56Z

Here's one. I removed my wandb and some pathing info for privacy reasons. You'd need to replace the webdataset path with one for laion high-res aesthetics (either original width/height >=512 or >=1024). Try to make the global batch size 2048 with either more GPUs or gradient accumulation. Note this is using the SD v1 architecture, which I found has better throughput and allows for greater per gpu batch size.

system:
    gradient_accumulation: 1
    batch_size: 32
    workers: 6
    dist_backend: ${distributed.dist_backend}
    dist_url: ${distributed.dist_url}

distributed:
    dist_backend: 'nccl'
    dist_url: 'env://'

experiment:
    log_dir: <path>/sd-logs
    name: "laion-2b-aesthetics-hr"
    project: "diffusion"
    num_examples_to_see: 2000000000
    save_every: 2000
    requeue: True

optimizer:
    name: adamw
    params:
        learning_rate: 0.0001
        beta1: 0.9
        beta2: 0.98 # changed from initial sd value for training stability
        weight_decay: 0.01
        epsilon: 0.00000001

model:
    vae:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"

    text_encoder:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"

    tokenizer:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"
    
    scheduler:
        pretrained: "<path>/stable-diffusion-v1-5-fp32"
    
    unet:
        target: UNet2DConditionModel
        params:
            act_fn: "silu"
            attention_head_dim: 8
            block_out_channels: [320, 640, 1280, 1280]
            center_input_sample: False
            cross_attention_dim: 768
            down_block_types: ["CrossAttnDownBlock2D","CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D"]
            downsample_padding: 1
            flip_sin_to_cos: true
            freq_shift: 0
            in_channels: 4
            layers_per_block: 2
            mid_block_scale_factor: 1
            norm_eps: 1e-05
            norm_num_groups: 32
            out_channels: 4
            sample_size: 32
            up_block_types: [
                "UpBlock2D",
                "CrossAttnUpBlock2D",
                "CrossAttnUpBlock2D",
                "CrossAttnUpBlock2D"
            ]
        
    use_ema: True
    mixed_precision: bf16
    gradient_checkpointing: True
    xformers: True


dataset:
    type: WebDataset
    params: 
        path: "pipe:aws s3 cp s3://s-datasets/laion5b/laion2B-data/{000000..231349}.tar -"
        batch_size: ${system.batch_size}
        workers: ${system.workers}
        num_examples_to_see: ${experiment.num_examples_to_see}
        resolution: 512
        text_dropout: 0.0

lr_scheduler:
    scheduler: "ConstantWithWarmup"
    params:
        learning_rate: ${optimizer.params.learning_rate}
        warmup_length: 500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classifier-free guidance? #6

Classifier-free guidance? #6

mehdidc commented Apr 28, 2023

vkramanuj commented Apr 28, 2023 •

edited

Loading

mehdidc commented Apr 28, 2023

vkramanuj commented Apr 28, 2023 •

edited

Loading

mehdidc commented Apr 28, 2023

vkramanuj commented May 2, 2023

mehdidc commented May 5, 2023

vkramanuj commented May 9, 2023 •

edited

Loading

Classifier-free guidance? #6

Classifier-free guidance? #6

Comments

mehdidc commented Apr 28, 2023

vkramanuj commented Apr 28, 2023 • edited Loading

mehdidc commented Apr 28, 2023

vkramanuj commented Apr 28, 2023 • edited Loading

mehdidc commented Apr 28, 2023

vkramanuj commented May 2, 2023

mehdidc commented May 5, 2023

vkramanuj commented May 9, 2023 • edited Loading

vkramanuj commented Apr 28, 2023 •

edited

Loading

vkramanuj commented Apr 28, 2023 •

edited

Loading

vkramanuj commented May 9, 2023 •

edited

Loading