Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is it not working? #2

Open
MMMazart opened this issue Sep 22, 2024 · 38 comments
Open

Why is it not working? #2

MMMazart opened this issue Sep 22, 2024 · 38 comments
Assignees
Labels
enhancement New feature or request

Comments

@MMMazart
Copy link

./wasmedge --dir .:. sd-api-server.wasm --model-name sd-v1.4 --model /mnt/data/zhangmingyang/t2i/models/stable-diffusion-v-1-4-GGUF/stable-diffusion-v1-4-Q8_0.gguf I executed this command, and the result is as follows, but when I send the request 'curl -X POST 'http://localhost:8080/v1/images/generations' --header 'Content-Type: application/json' --data '{"model": "sd-v1.4", "prompt": "A cute baby sea otter"}'', there is no response. What is going on?
image

@MMMazart
Copy link
Author

现在它可以正常运行,我不知道是为什么。

@MMMazart
Copy link
Author

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

@MMMazart
Copy link
Author

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

时间都花费在了加载模型上

@apepkuss
Copy link
Collaborator

现在它可以正常运行,我不知道是为什么。

If no new log messages show on the screen, please check if the port you're using is in use or not.

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?

  • Operating system
  • CPU
  • Memory
  • GPU and vRAM if present

Thanks a lot!

@apepkuss apepkuss self-assigned this Sep 25, 2024
@MMMazart
Copy link
Author

现在它可以正常运行,我不知道是为什么。

If no new log messages show on the screen, please check if the port you're using is in use or not.

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?

  • Operating system
  • CPU
  • Memory
  • GPU and vRAM if present

Thanks a lot!

感谢您的回复!

  • Operating system
    image
  • CPU
    image
  • Memory
    image
    -GPU
    A100-SXM4-80GB

@MMMazart
Copy link
Author

MMMazart commented Sep 25, 2024

image
他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

@MMMazart
Copy link
Author

image 他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

Thank you. This is very important to me.

@MMMazart
Copy link
Author

MMMazart commented Sep 25, 2024

image 他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

Thank you. This is very important to me.

Flux1-dev is like this every time as well. 每次运行完,gpu显存就被释放了。

@apepkuss
Copy link
Collaborator

Thanks for the feedback. We'll check the issue ASAP. Thanks!

@apepkuss apepkuss added the enhancement New feature or request label Sep 25, 2024
@MMMazart
Copy link
Author

Thanks for the feedback. We'll check the issue ASAP. Thanks!

Thanks a lot.

@hydai
Copy link

hydai commented Sep 26, 2024

After checking the design of wasmedge-stablediffusion, the context should remain after loading.
@apepkuss Will the sd-api-server or llama-core try to init and drop the context per request?

@MMMazart
Copy link
Author

The models of the stable-diffusion series will not be dropped, while those of the flux series will be dropped.

@apepkuss
Copy link
Collaborator

@hydai According to the investigation, llama-core create the text_to_image or image_to_image context once per request. The improvement in the design will come in the next release. Thanks!

@MMMazart
Copy link
Author

@hydai According to the investigation, llama-core create the text_to_image or image_to_image context once per request. The improvement in the design will come in the next release. Thanks!根据调查, llama-core每个请求都会创建一次text_to_imageimage_to_image上下文。设计的改进将在下一个版本中出现。谢谢!

Thank you! I'm looking forward to it very very much!

@apepkuss
Copy link
Collaborator

@MMMazart We released v0.1.5. Please try it. Thanks!

@MMMazart
Copy link
Author

@MMMazart We released v0.1.5. Please try it. Thanks!

Thank you for your effort, but it seems there are a few bugs at the moment:

  1. It can only read relative paths (./), but cannot read absolute paths.
  2. When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?
  3. The program crashes when sending the second request.
    image

Additionally, I have a question: How do I load the flux1-merged model? Is it the same as flux1-dev and others?

@apepkuss
Copy link
Collaborator

@MMMazart Thanks for your quick feedback!

  1. It can only read relative paths (./), but cannot read absolute paths.

You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:. in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev to the root directory of the wasm sandbox environment:

wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
  --model-name flux1-dev \
  --diffusion-model flux1-dev-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf
  1. When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?

Yeah, the major target of v0.1.5 is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.

  1. The program crashes when sending the second request.

Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.

In addition, our wasmedge_stablediffusion plugin is based on stable-diffusion.cpp (master-e71ddce). According to our test with flux.1-dev, stable-diffusion.cpp (master-e71ddce) causes sagfault issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion plugin to stable-diffusion.cpp (master-14206fd), which has some fixes.

How do I load the flux1-merged model? Is it the same as flux1-dev and others?

I have no idea about flux1-merged, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.

@MMMazart
Copy link
Author

@MMMazart Thanks for your quick feedback!

  1. It can only read relative paths (./), but cannot read absolute paths.

You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:. in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev to the root directory of the wasm sandbox environment:

wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
  --model-name flux1-dev \
  --diffusion-model flux1-dev-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf
  1. When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?

Yeah, the major target of v0.1.5 is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.

  1. The program crashes when sending the second request.

Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.

In addition, our wasmedge_stablediffusion plugin is based on stable-diffusion.cpp (master-e71ddce). According to our test with flux.1-dev, stable-diffusion.cpp (master-e71ddce) causes sagfault issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion plugin to stable-diffusion.cpp (master-14206fd), which has some fixes.

How do I load the flux1-merged model? Is it the same as flux1-dev and others?

I have no idea about flux1-merged, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.

My environment information is the same as mentioned before and has not been changed. This problem occurs every time.

@apepkuss
Copy link
Collaborator

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!

@apepkuss
Copy link
Collaborator

@MMMazart For the issue 2 mentioned before, please try 0.1.6. This version add --context-type CLI option with text-to-image, image-to-image, and full possible values. The default setting is full, meaning create both text-to-image and image-to-image contexts.

@MMMazart
Copy link
Author

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev时会触发该问题,对吧?谢谢!

prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.

@MMMazart
Copy link
Author

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev时会触发该问题,对吧?谢谢!

prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.提示:“一只可爱的猫举着一个牌子,上面写着‘flux.cpp’”。是的,flux.1-dev和flux.1-schnell都会触发这个问题。

"a cat" will trigger it, too. This seems to have nothing to do with the prompt.

@apepkuss
Copy link
Collaborator

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev时会触发该问题,对吧?谢谢!

prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.

@MMMazart Could you share with us the request? For example, steps.

@MMMazart
Copy link
Author

MMMazart commented Sep 27, 2024

text-to-image

headers = {
'Content-Type': 'application/json'
}

data = {
"model": "flux1-schnell",
# "prompt": "a lovely cat holding a sign says 'flux.cpp'",
"prompt": "a cat",
"cfg_scale": 1.0,
"sample_method": "euler",
"steps": 8,

}
time_start = time.time()
response = requests.post(url, headers=headers, json=data)

This is my request, which is the same as the example

@MMMazart
Copy link
Author

text-to-image

headers = { 'Content-Type': 'application/json' }

data = { "model": "flux1-schnell", # "prompt": "a lovely cat holding a sign says 'flux.cpp'", "prompt": "a cat", "cfg_scale": 1.0, "sample_method": "euler", "steps": 8,

} time_start = time.time() response = requests.post(url, headers=headers, json=data)

This is my request, which is the same as the example

@apepkuss

@MMMazart
Copy link
Author

MMMazart commented Sep 27, 2024

image

After first inference is completed, it can be seen that the memory is released. So, the second request directly results in an error.
@apepkuss

@apepkuss
Copy link
Collaborator

@MMMazart Which version of CUDA are you using?

@MMMazart
Copy link
Author

MMMazart commented Sep 27, 2024

@MMMazart Which version of CUDA are you using?
11.5 @apepkuss
image
image

@apepkuss
Copy link
Collaborator

@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080 + cuda 11.3 + ubuntu 20.04. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!

image

@MMMazart
Copy link
Author

@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080 + cuda 11.3 + ubuntu 20.04. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!

image

I see that in your snapshot, it shows that only one request has been sent? It will crash on the second request. Can you send multiple requests? In my environment, after the first request, the context will be deleted. Thanks!

@MMMazart
Copy link
Author

我把cuda版本换成了12.2,在第一次request后,context还是会被清除。I changed the CUDA version to 12.2. After the first request, the context will still be cleared. My Ubuntu version is 22.04, but it seems that the biggest difference is on the GPU.

@fabiopolimeni
Copy link

I don't think it has anything to do with machines, GPU etc. I am getting the very same behaviour on a Macbook M3 Pro 48GB of shared RAM.

At the second request the server crashes:

segmentation fault  wasmedge --dir .:. sd-api-server.wasm --model-name flux1-schnell   --vae  

I followed the steps for the FLUX example.

Server runs with:

wasmedge --dir .:. sd-api-server.wasm \
  --model-name flux1-schnell \
  --diffusion-model flux1-schnell-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf \
  --context-type text-to-image

The client request:

curl -X POST 'http://localhost:8080/v1/images/generations' \
  --header 'Content-Type: application/json' \
  --data '{
      "model": "flux1-schnell",
      "prompt": "a lovely cat",
      "cfg_scale": 1.0,
      "sample_method": "euler",
      "steps": 10
  }'

The second time I execute this request the server crashes.

@alabulei1
Copy link
Contributor

Thanks for reporting, @fabiopolimeni and @MMMazart . Will release a new version to solve this problem. See the upstream issue. WasmEdge/WasmEdge#3803

@hydai
Copy link

hydai commented Oct 11, 2024

Hi @fabiopolimeni and @MMMazart
We updated the plugin to fix this problem, please update the plugin and try again.

@MMMazart
Copy link
Author

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image
I encounter this error during initialization after the update.

@MMMazart
Copy link
Author

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

My CUDA version is 11.5, but it seems to be unsupported. I switched to version 12.2, which works.

@hydai
Copy link

hydai commented Oct 11, 2024

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?

@MMMazart
Copy link
Author

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?

This is indeed strange, but I was using the same port before and after. It worked after changing the CUDA version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants