Why is it not working? #2

MMMazart · 2024-09-22T06:25:18Z

./wasmedge --dir .:. sd-api-server.wasm --model-name sd-v1.4 --model /mnt/data/zhangmingyang/t2i/models/stable-diffusion-v-1-4-GGUF/stable-diffusion-v1-4-Q8_0.gguf I executed this command, and the result is as follows, but when I send the request 'curl -X POST 'http://localhost:8080/v1/images/generations' --header 'Content-Type: application/json' --data '{"model": "sd-v1.4", "prompt": "A cute baby sea otter"}'', there is no response. What is going on?

MMMazart · 2024-09-22T15:20:01Z

现在它可以正常运行，我不知道是为什么。

MMMazart · 2024-09-22T15:20:57Z

另外一个问题是，stable-diffusion每次都是迅速响应，flux每次要加载很久，这是因为在加载模型吗？每次post请求，都需要重新加载flux模型吗？

MMMazart · 2024-09-22T15:33:42Z

另外一个问题是，stable-diffusion每次都是迅速响应，flux每次要加载很久，这是因为在加载模型吗？每次post请求，都需要重新加载flux模型吗？

时间都花费在了加载模型上

apepkuss · 2024-09-25T06:44:05Z

现在它可以正常运行，我不知道是为什么。

If no new log messages show on the screen, please check if the port you're using is in use or not.

另外一个问题是，stable-diffusion每次都是迅速响应，flux每次要加载很久，这是因为在加载模型吗？每次post请求，都需要重新加载flux模型吗？

The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?

Operating system
CPU
Memory
GPU and vRAM if present

Thanks a lot!

MMMazart · 2024-09-25T07:29:10Z

现在它可以正常运行，我不知道是为什么。

If no new log messages show on the screen, please check if the port you're using is in use or not.

另外一个问题是，stable-diffusion每次都是迅速响应，flux每次要加载很久，这是因为在加载模型吗？每次post请求，都需要重新加载flux模型吗？

The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?

Operating system

CPU

Memory

GPU and vRAM if present

Thanks a lot!

感谢您的回复！

Operating system
CPU
Memory

-GPU
A100-SXM4-80GB

MMMazart · 2024-09-25T07:37:05Z

他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

MMMazart · 2024-09-25T07:50:23Z

他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

Thank you. This is very important to me.

MMMazart · 2024-09-25T07:58:25Z

他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

Thank you. This is very important to me.

Flux1-dev is like this every time as well. 每次运行完，gpu显存就被释放了。

apepkuss · 2024-09-25T08:01:38Z

Thanks for the feedback. We'll check the issue ASAP. Thanks!

MMMazart · 2024-09-25T09:01:12Z

Thanks for the feedback. We'll check the issue ASAP. Thanks!

Thanks a lot.

hydai · 2024-09-26T04:18:45Z

After checking the design of wasmedge-stablediffusion, the context should remain after loading.
@apepkuss Will the sd-api-server or llama-core try to init and drop the context per request?

MMMazart · 2024-09-26T05:43:02Z

The models of the stable-diffusion series will not be dropped, while those of the flux series will be dropped.

apepkuss · 2024-09-26T07:02:32Z

@hydai According to the investigation, llama-core create the text_to_image or image_to_image context once per request. The improvement in the design will come in the next release. Thanks!

MMMazart · 2024-09-26T07:19:04Z

@hydai According to the investigation, llama-core create the text_to_image or image_to_image context once per request. The improvement in the design will come in the next release. Thanks!根据调查， llama-core每个请求都会创建一次text_to_image或image_to_image上下文。设计的改进将在下一个版本中出现。谢谢！

Thank you! I'm looking forward to it very very much!

apepkuss · 2024-09-26T11:09:11Z

@MMMazart We released v0.1.5. Please try it. Thanks!

MMMazart · 2024-09-26T16:27:24Z

@MMMazart We released v0.1.5. Please try it. Thanks!

Thank you for your effort, but it seems there are a few bugs at the moment:

It can only read relative paths (./), but cannot read absolute paths.
When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?
The program crashes when sending the second request.

Additionally, I have a question: How do I load the flux1-merged model? Is it the same as flux1-dev and others?

apepkuss · 2024-09-27T01:48:07Z

@MMMazart Thanks for your quick feedback!

It can only read relative paths (./), but cannot read absolute paths.

You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:. in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev to the root directory of the wasm sandbox environment:

wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
  --model-name flux1-dev \
  --diffusion-model flux1-dev-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf

When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?

Yeah, the major target of v0.1.5 is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.

The program crashes when sending the second request.

Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.

In addition, our wasmedge_stablediffusion plugin is based on stable-diffusion.cpp (master-e71ddce). According to our test with flux.1-dev, stable-diffusion.cpp (master-e71ddce) causes sagfault issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion plugin to stable-diffusion.cpp (master-14206fd), which has some fixes.

How do I load the flux1-merged model? Is it the same as flux1-dev and others?

I have no idea about flux1-merged, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.

MMMazart · 2024-09-27T03:08:15Z

@MMMazart Thanks for your quick feedback!

It can only read relative paths (./), but cannot read absolute paths.

You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:. in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev to the root directory of the wasm sandbox environment:
wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
  --model-name flux1-dev \
  --diffusion-model flux1-dev-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf
When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?

Yeah, the major target of v0.1.5 is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.

The program crashes when sending the second request.

Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.

In addition, our wasmedge_stablediffusion plugin is based on stable-diffusion.cpp (master-e71ddce). According to our test with flux.1-dev, stable-diffusion.cpp (master-e71ddce) causes sagfault issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion plugin to stable-diffusion.cpp (master-14206fd), which has some fixes.

How do I load the flux1-merged model? Is it the same as flux1-dev and others?

I have no idea about flux1-merged, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.

My environment information is the same as mentioned before and has not been changed. This problem occurs every time.

apepkuss · 2024-09-27T04:19:18Z

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!

apepkuss · 2024-09-27T05:57:10Z

@MMMazart For the issue 2 mentioned before, please try 0.1.6. This version add --context-type CLI option with text-to-image, image-to-image, and full possible values. The default setting is full, meaning create both text-to-image and image-to-image contexts.

MMMazart · 2024-09-27T06:39:52Z

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗？顺便说一句，使用flux.1-dev时会触发该问题，对吧？谢谢！

prompt："a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.

MMMazart · 2024-09-27T06:41:08Z

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗？顺便说一句，使用flux.1-dev时会触发该问题，对吧？谢谢！

prompt："a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.提示：“一只可爱的猫举着一个牌子，上面写着‘flux.cpp’”。是的，flux.1-dev和flux.1-schnell都会触发这个问题。

"a cat" will trigger it, too. This seems to have nothing to do with the prompt.

apepkuss · 2024-09-27T06:42:09Z

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗？顺便说一句，使用flux.1-dev时会触发该问题，对吧？谢谢！

prompt："a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.

@MMMazart Could you share with us the request? For example, steps.

MMMazart · 2024-09-27T06:53:22Z

text-to-image

headers = {
'Content-Type': 'application/json'
}

data = {
"model": "flux1-schnell",
# "prompt": "a lovely cat holding a sign says 'flux.cpp'",
"prompt": "a cat",
"cfg_scale": 1.0,
"sample_method": "euler",
"steps": 8,

}
time_start = time.time()
response = requests.post(url, headers=headers, json=data)

This is my request, which is the same as the example

MMMazart · 2024-09-27T06:54:48Z

text-to-image

headers = { 'Content-Type': 'application/json' }

data = { "model": "flux1-schnell", # "prompt": "a lovely cat holding a sign says 'flux.cpp'", "prompt": "a cat", "cfg_scale": 1.0, "sample_method": "euler", "steps": 8,

} time_start = time.time() response = requests.post(url, headers=headers, json=data)

This is my request, which is the same as the example

@apepkuss

MMMazart · 2024-09-27T06:57:38Z

After first inference is completed, it can be seen that the memory is released. So, the second request directly results in an error.
@apepkuss

apepkuss · 2024-09-27T06:59:07Z

@MMMazart Which version of CUDA are you using?

MMMazart · 2024-09-27T07:00:33Z

@MMMazart Which version of CUDA are you using?
11.5 @apepkuss

apepkuss · 2024-09-29T07:00:54Z

@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080 + cuda 11.3 + ubuntu 20.04. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!

MMMazart · 2024-09-29T07:27:56Z

@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080 + cuda 11.3 + ubuntu 20.04. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!

I see that in your snapshot, it shows that only one request has been sent? It will crash on the second request. Can you send multiple requests? In my environment, after the first request, the context will be deleted. Thanks！

MMMazart · 2024-09-29T07:49:28Z

我把cuda版本换成了12.2，在第一次request后，context还是会被清除。I changed the CUDA version to 12.2. After the first request, the context will still be cleared. My Ubuntu version is 22.04, but it seems that the biggest difference is on the GPU.

fabiopolimeni · 2024-10-04T13:36:20Z

I don't think it has anything to do with machines, GPU etc. I am getting the very same behaviour on a Macbook M3 Pro 48GB of shared RAM.

At the second request the server crashes:

segmentation fault  wasmedge --dir .:. sd-api-server.wasm --model-name flux1-schnell   --vae

I followed the steps for the FLUX example.

Server runs with:

wasmedge --dir .:. sd-api-server.wasm \
  --model-name flux1-schnell \
  --diffusion-model flux1-schnell-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf \
  --context-type text-to-image

The client request:

curl -X POST 'http://localhost:8080/v1/images/generations' \
  --header 'Content-Type: application/json' \
  --data '{
      "model": "flux1-schnell",
      "prompt": "a lovely cat",
      "cfg_scale": 1.0,
      "sample_method": "euler",
      "steps": 10
  }'

The second time I execute this request the server crashes.

alabulei1 · 2024-10-07T09:22:12Z

Thanks for reporting, @fabiopolimeni and @MMMazart . Will release a new version to solve this problem. See the upstream issue. WasmEdge/WasmEdge#3803

hydai · 2024-10-11T11:22:06Z

Hi @fabiopolimeni and @MMMazart
We updated the plugin to fix this problem, please update the plugin and try again.

MMMazart · 2024-10-11T12:14:03Z

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

I encounter this error during initialization after the update.

MMMazart · 2024-10-11T12:23:59Z

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

I encounter this error during initialization after the update.

My CUDA version is 11.5, but it seems to be unsupported. I switched to version 12.2, which works.

hydai · 2024-10-11T12:27:18Z

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

I encounter this error during initialization after the update.

It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?

MMMazart · 2024-10-11T12:30:57Z

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

I encounter this error during initialization after the update.

It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?

This is indeed strange, but I was using the same port before and after. It worked after changing the CUDA version.

apepkuss self-assigned this Sep 25, 2024

MMMazart closed this as completed Sep 25, 2024

MMMazart mentioned this issue Sep 25, 2024

2: Why is it not working? #3

Closed

apepkuss reopened this Sep 25, 2024

apepkuss added the enhancement New feature or request label Sep 25, 2024

apepkuss assigned hydai and alabulei1 Sep 25, 2024

Why is it not working? #2

Why is it not working? #2

Comments

MMMazart commented Sep 22, 2024

MMMazart commented Sep 22, 2024

MMMazart commented Sep 22, 2024

MMMazart commented Sep 22, 2024

apepkuss commented Sep 25, 2024

MMMazart commented Sep 25, 2024

MMMazart commented Sep 25, 2024 • edited Loading

MMMazart commented Sep 25, 2024

MMMazart commented Sep 25, 2024 • edited Loading

apepkuss commented Sep 25, 2024

MMMazart commented Sep 25, 2024

hydai commented Sep 26, 2024

MMMazart commented Sep 26, 2024

apepkuss commented Sep 26, 2024

MMMazart commented Sep 26, 2024

apepkuss commented Sep 26, 2024

MMMazart commented Sep 26, 2024

apepkuss commented Sep 27, 2024

MMMazart commented Sep 27, 2024

apepkuss commented Sep 27, 2024

apepkuss commented Sep 27, 2024

MMMazart commented Sep 27, 2024

MMMazart commented Sep 27, 2024

apepkuss commented Sep 27, 2024

MMMazart commented Sep 27, 2024 • edited Loading

MMMazart commented Sep 27, 2024

MMMazart commented Sep 27, 2024 • edited Loading

apepkuss commented Sep 27, 2024

MMMazart commented Sep 27, 2024 • edited Loading

apepkuss commented Sep 29, 2024

MMMazart commented Sep 29, 2024

MMMazart commented Sep 29, 2024

fabiopolimeni commented Oct 4, 2024

alabulei1 commented Oct 7, 2024

hydai commented Oct 11, 2024

MMMazart commented Oct 11, 2024

MMMazart commented Oct 11, 2024

hydai commented Oct 11, 2024

MMMazart commented Oct 11, 2024

MMMazart commented Sep 25, 2024 •

edited

Loading

MMMazart commented Sep 25, 2024 •

edited

Loading

MMMazart commented Sep 27, 2024 •

edited

Loading

MMMazart commented Sep 27, 2024 •

edited

Loading

MMMazart commented Sep 27, 2024 •

edited

Loading