Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream ID to GPU operators and support multiple streams in simulate_execution #126

Merged
merged 3 commits into from
Jul 13, 2024

Conversation

TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented Jul 11, 2024

Summary

Add stream ID to GPU operators and support multiple streams in simulate_execution. There are two problems in the current main branch. First, the converter does not encode stream ID to GPU operators. Second, the simulate_execution method does not take stream IDs into account, resulting in the serialization of all GPU operators. As a result, for a specific trace, while the measured runtime is 4087.535ms, the simulated runtime with the simulated simulate_execution is 14802.3ms. This PR fixes the issue by encoding stream ID to GPU operators and supporting multi-stream in simulate_execution. Disabled simulate_execution by default because it takes too long now that the converter supports multiple streams.

Test Plan

1. Correlation 1

$ chakra_trace_link \
  --pytorch-et-file /Users/theo/Downloads/traces/et_0.json\
  --kineto-file /Users/theo/Downloads/traces/kineto_0.json\
  --output-file ~/megatron_0.json
$ chakra_converter --input_filename ~/megatron_0.json --output_filename megatron_0.chakra --input_type PyTorch --log_filename /tmp/rank_0 &

/tmp/rank_0

...
INFO [07/11/2024 07:22:34 AM] GPU Node ID 163402 on stream 7 completed at 3747218us, tid: stream 7
INFO [07/11/2024 07:22:34 AM] GPU Node ID 162604 on stream 36 completed at 3752832us, tid: stream 36
INFO [07/11/2024 07:22:34 AM] Simulation of Chakra node execution completed.
  • Measured: 4087.535ms
  • Simulated: 3752.832ms (91.8%)

2. Correlation 2

$ pip install .      
Processing /Users/theo/chakra-dev
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: protobuf==4.* in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (4.23.4)
Requirement already satisfied: graphviz in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (0.20.1)
Requirement already satisfied: networkx in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (3.2.1)
Requirement already satisfied: pydot in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (2.0.0)
Requirement already satisfied: pyparsing>=3 in /Users/theo/venv/lib/python3.10/site-packages (from pydot->chakra==0.0.4) (3.1.1)
Building wheels for collected packages: chakra
  Building wheel for chakra (pyproject.toml) ... done
  Created wheel for chakra: filename=chakra-0.0.4-py3-none-any.whl size=52531 sha256=e7d74179181184ee0778b68a706c8abbeb707e6efd2b1a80c7adb9d6cf9f1ed2
  Stored in directory: /Users/theo/Library/Caches/pip/wheels/1f/cc/a0/f451e6630d3461090be1de9594059abe3c2f5be7ce264deca3
Successfully built chakra
Installing collected packages: chakra
  Attempting uninstall: chakra
    Found existing installation: chakra 0.0.4
    Uninstalling chakra-0.0.4:
      Successfully uninstalled chakra-0.0.4
Successfully installed chakra-0.0.4

$ python3 ci_tools/integration_tests.py --tgz_path tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05.tgz --num_ranks 8 --tolerance 0.05 --expected_times_ms 14597 14597 14968 14638 14649 14700 14677 14735
Extracting tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05.tgz to tests/data/1.0.2-chakra.0.0.4
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_0.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_0.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_0.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_1.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_1.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_1.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_2.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_2.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_2.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_3.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_3.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_3.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_4.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_4.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_4.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_5.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_5.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_5.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_6.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_6.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_6.json
Running command: chakra_trace_link --pytorch-et-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_7.json --kineto-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/kineto_7.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_7.json
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_0.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_0.chakra --input_type PyTorch --log_filename /tmp/rank_0.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_1.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_1.chakra --input_type PyTorch --log_filename /tmp/rank_1.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_2.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_2.chakra --input_type PyTorch --log_filename /tmp/rank_2.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_4.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_4.chakra --input_type PyTorch --log_filename /tmp/rank_4.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_5.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_5.chakra --input_type PyTorch --log_filename /tmp/rank_5.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_7.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_7.chakra --input_type PyTorch --log_filename /tmp/rank_7.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_6.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_6.chakra --input_type PyTorch --log_filename /tmp/rank_6.log
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_3.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_3.chakra --input_type PyTorch --log_filename /tmp/rank_3.log

Manually checked the simulated runtime.

==> rank_0.log <==
INFO [07/11/2024 07:53:23 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14488270us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:53:23 AM] GPU Node ID 301192 on stream 7 completed at 14488271us, tid: stream 7
INFO [07/11/2024 07:53:23 AM] Simulation of Chakra node execution completed.

==> rank_1.log <==
INFO [07/11/2024 08:33:09 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14489194us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 08:33:09 AM] GPU Node ID 301192 on stream 7 completed at 14489195us, tid: stream 7
INFO [07/11/2024 08:33:09 AM] Simulation of Chakra node execution completed.

==> rank_2.log <==
INFO [07/11/2024 07:44:05 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14550789us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:44:05 AM] GPU Node ID 301192 on stream 7 completed at 14550790us, tid: stream 7
INFO [07/11/2024 07:44:05 AM] Simulation of Chakra node execution completed.

==> rank_3.log <==
INFO [07/11/2024 07:50:59 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14418326us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:50:59 AM] GPU Node ID 301192 on stream 7 completed at 14418327us, tid: stream 7
INFO [07/11/2024 07:50:59 AM] Simulation of Chakra node execution completed.

==> rank_4.log <==
INFO [07/11/2024 07:50:59 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14500583us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:50:59 AM] GPU Node ID 301192 on stream 7 completed at 14500584us, tid: stream 7
INFO [07/11/2024 07:50:59 AM] Simulation of Chakra node execution completed.

==> rank_5.log <==
INFO [07/11/2024 07:47:32 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14308677us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:47:32 AM] GPU Node ID 301192 on stream 7 completed at 14308678us, tid: stream 7
INFO [07/11/2024 07:47:32 AM] Simulation of Chakra node execution completed.

==> rank_6.log <==
INFO [07/11/2024 07:50:13 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14385407us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:50:13 AM] GPU Node ID 301192 on stream 7 completed at 14385408us, tid: stream 7
INFO [07/11/2024 07:50:13 AM] Simulation of Chakra node execution completed.

==> rank_7.log <==
INFO [07/11/2024 07:45:30 AM] Issuing GPU Node ID 301192 (Memcpy DtoD (Device -> Device)) at 14398106us on stream 7 with duration
 1us, tid: stream 7
INFO [07/11/2024 07:45:30 AM] GPU Node ID 301192 on stream 7 completed at 14398107us, tid: stream 7
INFO [07/11/2024 07:45:30 AM] Simulation of Chakra node execution completed.

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner July 11, 2024 11:17
@TaekyungHeo TaekyungHeo added the bug Something isn't working label Jul 11, 2024
Copy link

github-actions bot commented Jul 11, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo force-pushed the stream-id-encode branch 2 times, most recently from 0815fd4 to 6897ef7 Compare July 11, 2024 11:50
@srinivas212 srinivas212 merged commit 2d16e16 into main Jul 13, 2024
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2024
@TaekyungHeo TaekyungHeo deleted the stream-id-encode branch July 16, 2024 12:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants