Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training process freezes without using GPUs #255

Open
TOP-RX opened this issue Sep 24, 2023 · 1 comment
Open

Training process freezes without using GPUs #255

TOP-RX opened this issue Sep 24, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@TOP-RX
Copy link

TOP-RX commented Sep 24, 2023

Description

I just simply try to run the code for GIANT-XRT training process for ogbn-arxiv, but it seems the code freezes without allocating any GPUs for training.

How to Reproduce?

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

data_dir=./proc_data_xrt/ogbn-arxiv
bash xrt_train.sh ${data_dir}

(Paste the commands you ran that produced the error.)

1.data_dir=./proc_data_xrt/ogbn-arxiv
bash xrt_train.sh ${data_dir}
2.

What have you tried to solve it?

Error message or code output

The code stuck here. And no GPUs are used.

warnings.warn(
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - ***** Running training *****
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Num examples = 169286
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Num labels = 32
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Num Epochs = 4
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Learning Rate Schedule = linear
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Batch size = 256
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Gradient Accumulation steps = 1
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher -   Total optimization steps = 2500

Environment

  • Operating system:
  • Python version:
  • PECOS version:

(Add as much information about your environment as possible, e.g. dependencies versions.)

@TOP-RX TOP-RX added the bug Something isn't working label Sep 24, 2023
@Dong3759
Copy link

haved you solved?how to solve, I am the same with you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants