Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuda] Dump useful GPU characteristics #13955

Merged
merged 1 commit into from
Jun 5, 2023

Conversation

antiagainst
Copy link
Contributor

@antiagainst antiagainst commented Jun 5, 2023

This commit implements iree_hal_cuda_driver_dump_device_info to print out GPU characteristics including launch configuration size limits, per block/multiprocessor resource limits, memory system characteristics, and others.

On NVIDIA GeForce RTX 2070 SUPER, iree-run-module --dump_devices shows:

- gpu-compute-capability: 7.5
- driver-max-cuda-version: 12.1

- launch-max-block-dims: (1024, 1024, 64)
- launch-max-grid-dims: (2147483647, 65535, 65535)

- block-max-thread-count: 1024
- block-max-32-bit-register-count: 65536
- block-max-shared-memory: 49152 bytes

- multiprocessor-max-thread-count: 1024
- multiprocessor-max-block-count: 16
- multiprocessor-max-32-bit-register-count: 65536
- multiprocessor-max-shared-memory: 65536 bytes

- memory-has-unified-address-space: 1
- memory-supports-managed-memory: 1
- memory-can-map-host-memory-to-device: 1
- memory-supports-pageable-memory-access-from-device: 0
- memory-supports-concurrent-managed-access: 1
- memory-supports-memory-pools: 1
- memory-l2-cache-size: 4194304 bytes

- gpu-multiprocessor-count: 40
- gpu-clock-rate: 1815000 kHz
- gpu-warp-size: 32
- kernel-has-execution-timeout: 1

Progress towards #13245

@antiagainst antiagainst added the hal/cuda Runtime CUDA HAL backend label Jun 5, 2023
@antiagainst antiagainst self-assigned this Jun 5, 2023
This commit implements `iree_hal_cuda_driver_dump_device_info` to
print out GPU characteristics including launch configuration
size limits, per block/multiprocessor resource limits, memory
system characteristics, and others.

On NVIDIA GeForce RTX 2070 SUPER, `iree-run-module --dump_devices`
shows:

```
- gpu-compute-capability: 7.5
- driver-max-cuda-version: 12.1

- launch-max-block-dims: (1024, 1024, 64)
- launch-max-grid-dims: (2147483647, 65535, 65535)

- block-max-thread-count: 1024
- block-max-32-bit-register-count: 65536
- block-max-shared-memory: 49152 bytes

- multiprocessor-max-thread-count: 1024
- multiprocessor-max-block-count: 16
- multiprocessor-max-32-bit-register-count: 65536
- multiprocessor-max-shared-memory: 65536 bytes

- memory-has-unified-address-space: 1
- memory-supports-managed-memory: 1
- memory-can-map-host-memory-to-device: 1
- memory-supports-pageable-memory-access-from-device: 0
- memory-supports-concurrent-managed-access: 1
- memory-supports-memory-pools: 1
- memory-l2-cache-size: 4194304 bytes

- gpu-multiprocessor-count: 40
- gpu-clock-rate: 1815000 kHz
- gpu-warp-size: 32
- kernel-has-execution-timeout: 1
```
Copy link
Collaborator

@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice! have wanted this for ages!

@antiagainst antiagainst merged commit 686860c into iree-org:main Jun 5, 2023
@antiagainst antiagainst deleted the cuda2-attributes branch June 5, 2023 21:44
NatashaKnk pushed a commit to NatashaKnk/iree that referenced this pull request Jul 6, 2023
This commit implements `iree_hal_cuda_driver_dump_device_info` to print
out GPU characteristics including launch configuration size limits, per
block/multiprocessor resource limits, memory system characteristics, and
others.

On NVIDIA GeForce RTX 2070 SUPER, `iree-run-module --dump_devices`
shows:

```
- gpu-compute-capability: 7.5
- driver-max-cuda-version: 12.1

- launch-max-block-dims: (1024, 1024, 64)
- launch-max-grid-dims: (2147483647, 65535, 65535)

- block-max-thread-count: 1024
- block-max-32-bit-register-count: 65536
- block-max-shared-memory: 49152 bytes

- multiprocessor-max-thread-count: 1024
- multiprocessor-max-block-count: 16
- multiprocessor-max-32-bit-register-count: 65536
- multiprocessor-max-shared-memory: 65536 bytes

- memory-has-unified-address-space: 1
- memory-supports-managed-memory: 1
- memory-can-map-host-memory-to-device: 1
- memory-supports-pageable-memory-access-from-device: 0
- memory-supports-concurrent-managed-access: 1
- memory-supports-memory-pools: 1
- memory-l2-cache-size: 4194304 bytes

- gpu-multiprocessor-count: 40
- gpu-clock-rate: 1815000 kHz
- gpu-warp-size: 32
- kernel-has-execution-timeout: 1
```
nhasabni pushed a commit to plaidml/iree that referenced this pull request Aug 24, 2023
This commit implements `iree_hal_cuda_driver_dump_device_info` to print
out GPU characteristics including launch configuration size limits, per
block/multiprocessor resource limits, memory system characteristics, and
others.

On NVIDIA GeForce RTX 2070 SUPER, `iree-run-module --dump_devices`
shows:

```
- gpu-compute-capability: 7.5
- driver-max-cuda-version: 12.1

- launch-max-block-dims: (1024, 1024, 64)
- launch-max-grid-dims: (2147483647, 65535, 65535)

- block-max-thread-count: 1024
- block-max-32-bit-register-count: 65536
- block-max-shared-memory: 49152 bytes

- multiprocessor-max-thread-count: 1024
- multiprocessor-max-block-count: 16
- multiprocessor-max-32-bit-register-count: 65536
- multiprocessor-max-shared-memory: 65536 bytes

- memory-has-unified-address-space: 1
- memory-supports-managed-memory: 1
- memory-can-map-host-memory-to-device: 1
- memory-supports-pageable-memory-access-from-device: 0
- memory-supports-concurrent-managed-access: 1
- memory-supports-memory-pools: 1
- memory-l2-cache-size: 4194304 bytes

- gpu-multiprocessor-count: 40
- gpu-clock-rate: 1815000 kHz
- gpu-warp-size: 32
- kernel-has-execution-timeout: 1
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hal/cuda Runtime CUDA HAL backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants