Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Someone made LCM sampler only 10 steps can you add it to demo page and pipe? #391

Open
FurkanGozukara opened this issue Oct 6, 2024 · 10 comments
Assignees

Comments

@FurkanGozukara
Copy link

FurkanGozukara commented Oct 6, 2024

Feature request / 功能建议

His results are great : https://www.reddit.com/r/StableDiffusion/comments/1fwzaw9/cogvideo_i2v_working_with_lcm_with_only_10_steps/?sort=new

if you could add to demo here would be amazing

https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space

@jzhang38
Copy link

jzhang38 commented Oct 6, 2024

I check the link and there is no model weight?

@FurkanGozukara
Copy link
Author

I check the link and there is no model weight?

i think it uses CogVideoX 5b image to video

@foreverpiano
Copy link

I think it requires some effort of finetuning?

@kijai
Copy link

kijai commented Oct 7, 2024

This is mostly misunderstanding as the reddit poster seems to have confused what LCM actually is, all they did was change the sampler, there's no distillation involved. I have noticed that the I2V model innately performs decently well at lower steps, LCM sampler or not.

@foreverpiano
Copy link

@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.

python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
    --model_path "/workspace/data/CogVideoX-5b" \
    --generate_type "t2v" \
    --num_inference_steps 20 \
    --guidance_scale 7.5

@foreverpiano
Copy link

I think there are some tricks to reduce the steps.

@FurkanGozukara
Copy link
Author

@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.

python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
    --model_path "/workspace/data/CogVideoX-5b" \
    --generate_type "t2v" \
    --num_inference_steps 20 \
    --guidance_scale 7.5

same here

low steps doesn't produce good results

@kijai
Copy link

kijai commented Oct 7, 2024

@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.

python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
    --model_path "/workspace/data/CogVideoX-5b" \
    --generate_type "t2v" \
    --num_inference_steps 20 \
    --guidance_scale 7.5

It won't work for text2video, from what I've seen under 32 steps usually produces full white outputs. image2video is different and works with down to 7 steps.

@foreverpiano
Copy link

@kijai got it. It is true that <32 steps returns full white for t2v.

@foreverpiano
Copy link

I agree that they just use fewer steps for i2v.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants