[BUG] CUDA Out of Memory when eval model. #133

Crystalxd · 2023-09-12T02:08:22Z

Required prerequisites

I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

System information

conda environment
torch=2.0.1
transformers=4.29.2
...

Problem description

I used A100(80G) to run the evaluate_zh.py script for evaluating baichuan model, but it occupied abundant GPU memory up to overflow. Then I found the model loaded without eval mode, meanwhile, it inferred without no_grad.

Reproducible example code

The Python snippets:

[https://github.com/baichuan-inc/Baichuan-7B/blob/6f3ef4633a90c2d8a3e0763d0dec1b8dc11588f5/evaluation/evaluate_zh.py#L97C13-L97C13](url)
self.model = model.eval()

https://github.com/baichuan-inc/Baichuan-7B/blob/6f3ef4633a90c2d8a3e0763d0dec1b8dc11588f5/evaluation/evaluate_zh.py#L103
Add on this line:
@torch.inference_mode()

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

Checklist

I have provided all relevant and necessary information above.
I have chosen a suitable title for this issue.

The text was updated successfully, but these errors were encountered:

Guanze-Chen · 2023-10-19T12:19:03Z

Thank you. It works!!!

ICanFlyGFC · 2023-12-07T14:01:08Z

Thanks!

Guanze-Chen · 2023-12-07T14:01:29Z

您的邮件已经收到，会尽快回复您

Young-X · 2024-07-24T10:56:58Z

我在训练模型过程中，脚本默认使用gpu0，怎么调换到gpu1上面？

Guanze-Chen · 2024-07-24T10:57:30Z

您的邮件已经收到，会尽快回复您

Crystalxd added the bug Something isn't working label Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CUDA Out of Memory when eval model. #133

[BUG] CUDA Out of Memory when eval model. #133

Crystalxd commented Sep 12, 2023

Guanze-Chen commented Oct 19, 2023

ICanFlyGFC commented Dec 7, 2023

Guanze-Chen commented Dec 7, 2023 via email

Young-X commented Jul 24, 2024

Guanze-Chen commented Jul 24, 2024 via email

[BUG] CUDA Out of Memory when eval model. #133

[BUG] CUDA Out of Memory when eval model. #133

Comments

Crystalxd commented Sep 12, 2023

Required prerequisites

System information

Problem description

Reproducible example code

Traceback

Expected behavior

Additional context

Checklist

Guanze-Chen commented Oct 19, 2023

ICanFlyGFC commented Dec 7, 2023

Guanze-Chen commented Dec 7, 2023 via email

Young-X commented Jul 24, 2024

Guanze-Chen commented Jul 24, 2024 via email