Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MI300 chip_id instead of model to detect XCD count #448

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

benrichard-amd
Copy link
Contributor

In a previous change we began using "MI300" for gpu_model instead of the full "MI300X_A0" or "MI300X_A1", etc.

The XCD detection code was receiving gpu_model and expecting the full name, causing the XCD count = 1 and several metrics to be off by a factor of 8 (e.g. VALU utilization, wavefront occupancy).

Passing chip_id instead of gpu_model fixes the issue.

In a previous change we started using "MI300" for gpu_model instead of the full
"MI300X_A0" or "MI300X_A1", etc.

The XCD detection code was using gpu_model and expecting the full name, causing
the XCD count = 1. Passing chip_id fixes the issue.

Signed-off-by: benrichard-amd <ben.richard@amd.com>
@benrichard-amd benrichard-amd merged commit a236fe0 into amd-staging Oct 16, 2024
13 checks passed
@benrichard-amd benrichard-amd deleted the fix-xcd-detect branch October 16, 2024 19:41
xuchen-amd pushed a commit to xuchen-amd/omniperf that referenced this pull request Oct 19, 2024
In a previous change we started using "MI300" for gpu_model instead of the full
"MI300X_A0" or "MI300X_A1", etc.

The XCD detection code was using gpu_model and expecting the full name, causing
the XCD count = 1. Passing chip_id fixes the issue.

Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: xuchen-amd <xuchen@amd.com>
xuchen-amd pushed a commit to xuchen-amd/omniperf that referenced this pull request Oct 22, 2024
In a previous change we started using "MI300" for gpu_model instead of the full
"MI300X_A0" or "MI300X_A1", etc.

The XCD detection code was using gpu_model and expecting the full name, causing
the XCD count = 1. Passing chip_id fixes the issue.

Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: xuchen-amd <xuchen@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants