Skip to content

Commit

Permalink
Merge branch 'dev' of github.com:AMDResearch/omniperf into mi300
Browse files Browse the repository at this point in the history
Signed-off-by: coleramos425 <colramos@amd.com>
  • Loading branch information
coleramos425 committed Jun 19, 2023
2 parents fcb13dc + 79eecb4 commit a38e11e
Show file tree
Hide file tree
Showing 31 changed files with 1,063 additions and 787 deletions.
3 changes: 0 additions & 3 deletions .github/workflows/rhel-8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,6 @@ jobs:
yum -y install which
- name: Checkout
uses: actions/checkout@v3
with:
submodules: recursive
token: ${{ secrets.GH_PAT }}
- name: Install Python prereqs
run: |
python3.9 -m pip install -r requirements.txt
Expand Down
3 changes: 0 additions & 3 deletions .github/workflows/ubuntu-focal.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,6 @@ jobs:
sudo apt-get install -y cmake
- name: Checkout
uses: actions/checkout@v3
with:
submodules: recursive
token: ${{ secrets.GH_PAT }}
- name: Install Python prereqs
run: |
python3 -m pip install -r requirements.txt
Expand Down
10 changes: 0 additions & 10 deletions .gitmodules

This file was deleted.

24 changes: 23 additions & 1 deletion CHANGES
Original file line number Diff line number Diff line change
@@ -1,9 +1,31 @@
Version 1.0.8 (30 May 2023)

* add `--kernel-names` option to toggle kernelName overlay in standalone roofline plot (#93)
* remove unused python modules (#96)
* fix empirical roofline calculation for single dispatch workloads (#97)
* match color of arithmetic intensity points to corresponding bw lines

* ux improvements in standalone GUI (#101)
* enhanced readability for filtering dropdowns in standalone GUI (#102)
* new logfile to capture rocprofiler output (#106)
* roofline support for sles15 sp4 and future service packs (#109)
* adding dockerfiles for all supported Linux distros
* new examples for `--roof-only` and `--kernel` options added to documentation

* enable cli analysis in Windows (#110)
* optional random port number in standalone GUI (#111)
* limit length of visible kernelName in `--kernel-names` option (#115)
* adjust metric definitions (#117, #130)
* manually merge rocprof runs, overriding default rocprofiler implementation (#125)
* fixed compatibility issues with Python 3.11 (#131)

Version 1.0.8-PR2 (17 Apr 2023)

* ux improvements in standalone GUI (#101)
* enhanced readability for filtering dropdowns in standalone GUI (#102)
* new logfile to capture rocprofiler output (#106)
* roofline support for sles15 sp4 and future service packs (#109)
* adding dockerfiles for all supported Linux distos
* adding dockerfiles for all supported Linux distros
* new examples for `--roof-only` and `--kernel` options added to documentation

Version 1.0.8-PR1 (13 Mar 2023)
Expand Down
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,19 @@ This software can be cited using a Zenodo
style reference is provided below for convenience:

```
@software{xiamin_lu_2022_7314631
@software{xiaomin_lu_2022_7314631
author = {Xiaomin Lu and
Cole Ramos and
Fei Zheng and
Karl W. Schulz and
Jose Santos and
Keith Lowery},
title = {AMDResearch/omniperf: v1.0.8-PR2 (17 April 2023)},
month = apr,
Keith Lowery and
Cristian Di Pietrantonio},
title = {AMDResearch/omniperf: v1.0.8 (30 May 2023)},
month = may,
year = 2023,
publisher = {Zenodo},
version = {v1.0.8-PR1},
version = {v1.0.8},
doi = {10.5281/zenodo.7314631},
url = {https://doi.org/10.5281/zenodo.7314631}
}
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0.8-PR2
1.0.8
1,327 changes: 660 additions & 667 deletions dashboards/Omniperf_v1.0.8_pub.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion src/mibench
Submodule mibench deleted from 9151f7
1 change: 0 additions & 1 deletion src/multevent
Submodule multevent deleted from 2367a3
35 changes: 24 additions & 11 deletions src/omniperf
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ from pathlib import Path as path

from parser import parse
from utils import specs
from utils.perfagg import perfmon_filter, pmc_filter
from utils.perfagg import perfmon_filter, pmc_filter, pmc_perf_split, join_prof
from utils import remove_workload
from utils import csv_converter # Import workload
from omniperf_analyze.omniperf_analyze import roofline_only # Standalone roofline
Expand Down Expand Up @@ -169,7 +169,7 @@ def gen_sysinfo(workload_name, workload_dir, ip_blocks, app_cmd, skip_roof):
header += "command,"
header += "host_name,host_cpu,host_distro,host_kernel,host_rocmver,date,"
header += "gpu_soc,numSE,numCU,numSIMD,waveSize,maxWavesPerCU,maxWorkgroupSize,"
header += "L1,L2,sclk,mclk,cur_sclk,cur_mclk,L2Banks,name,numSQC,hbmBW,"
header += "L1,L2,sclk,mclk,cur_sclk,cur_mclk,L2Banks,LDSBanks,name,numSQC,hbmBW,"
header += "ip_blocks\n"
sysinfo.write(header)

Expand Down Expand Up @@ -213,11 +213,11 @@ def gen_sysinfo(workload_name, workload_dir, ip_blocks, app_cmd, skip_roof):
blocks = []
hbmBW = int(mspec.cur_MCLK) / 1000 * 4096 / 8 * 2
if mspec.GPU == "gfx906":
param += ["16", "mi50", str(int(mspec.CU) // 4), str(hbmBW)]
param += ["16", "32", "mi50", str(int(mspec.CU) // 4), str(hbmBW)]
elif mspec.GPU == "gfx908":
param += ["32", "mi100", "48", str(hbmBW)]
param += ["32", "32", "mi100", "48", str(hbmBW)]
elif mspec.GPU == "gfx90a":
param += ["32", "mi200", "56", str(hbmBW)]
param += ["32", "32", "mi200", "56", str(hbmBW)]
if not skip_roof:
blocks.append("roofline")

Expand Down Expand Up @@ -380,6 +380,9 @@ def characterize_app(args, VER):
# Perfmon filtering
pmc_filter(workload_dir, perfmon_dir, args.target)

# Separate pmc_perf runs
pmc_perf_split(workload_dir)

# Set up a log file
log = open(workload_dir + "/log.txt", "w")
print("Log: ", workload_dir + "/log.txt\n")
Expand All @@ -388,7 +391,7 @@ def characterize_app(args, VER):
for fname in glob.glob(workload_dir + "/perfmon/*.txt"):
# Kernel filtering (in-place replacement)
if not args.kernel == None:
run_subprocess(
success, output = capture_subprocess_output(
[
"sed",
"-i",
Expand All @@ -397,10 +400,11 @@ def characterize_app(args, VER):
fname,
]
)
log.write(output)

# Dispatch filtering (inplace replacement)
if not args.dispatch == None:
run_subprocess(
success, output = capture_subprocess_output(
[
"sed",
"-i",
Expand All @@ -409,14 +413,14 @@ def characterize_app(args, VER):
fname,
]
)
log.write(output)
print(fname)
if args.use_rocscope == True:
run_rocscope(args, fname)
else:
run_prof(fname, workload_dir, perfmon_dir, app_cmd, args.target, log, args.verbose)

# Close log
log.close()



################################################
Expand Down Expand Up @@ -521,6 +525,9 @@ def omniperf_profile(args, VER):
# Perfmon filtering
perfmon_filter(workload_dir, perfmon_dir, args)

# Separate pmc_perf runs
pmc_perf_split(workload_dir)

# Set up a log file
log = open(workload_dir + "/log.txt", "w")
print("Log: ", workload_dir + "/log.txt\n")
Expand Down Expand Up @@ -598,7 +605,7 @@ def omniperf_profile(args, VER):
for fname in glob.glob(workload_dir + "/perfmon/*.txt"):
# Kernel filtering (in-place replacement)
if not args.kernel == None:
run_subprocess(
success, output = capture_subprocess_output(
[
"sed",
"-i",
Expand All @@ -607,10 +614,11 @@ def omniperf_profile(args, VER):
fname,
]
)
log.write(output)

# Dispatch filtering (inplace replacement)
if not args.dispatch == None:
run_subprocess(
success, output = capture_subprocess_output(
[
"sed",
"-i",
Expand All @@ -619,12 +627,17 @@ def omniperf_profile(args, VER):
fname,
]
)
log.write(output)
print(fname)
if args.use_rocscope == True:
run_rocscope(args, fname)
else:
run_prof(fname, workload_dir, perfmon_dir, args.remaining, args.target, log, args.verbose)

# Manually join each pmc_perf*.csv output
if args.use_rocscope == False:
join_prof(workload_dir, args.join_type, log, args.verbose)

# Generate sysinfo
gen_sysinfo(args.name, workload_dir, args.ipblocks, args.remaining, args.no_roof)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,11 +104,11 @@ Panel Config:
/ SQ_ACTIVE_INST_ANY))) / 5)
tips:
LDS BW:
value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ (EndNs - BeginNs)))
unit: GB/sec
peak: (($sclk * $numCU) * 0.128)
pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ (EndNs - BeginNs)) / (($sclk * $numCU) * 0.00128)))
tips:
LDS Bank Conflict:
Expand Down
12 changes: 9 additions & 3 deletions src/omniperf_analyze/configs/gfx906/0700_wavefront-launch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,15 @@ Panel Config:
unit: Wavefronts
tips:
VGPRs:
avg: AVG(vgpr)
min: MIN(vgpr)
max: MAX(vgpr)
avg: AVG(arch_vgpr)
min: MIN(arch_vgpr)
max: MAX(arch_vgpr)
unit: Registers
tips:
AGPRs:
avg: AVG(accum_vgpr)
min: MIN(accum_vgpr)
max: MAX(accum_vgpr)
unit: Registers
tips:
SGPRs:
Expand Down
14 changes: 4 additions & 10 deletions src/omniperf_analyze/configs/gfx906/1200_lds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Panel Config:
value: AVG(((200 * SQ_ACTIVE_INST_LDS) / (GRBM_GUI_ACTIVE * $numCU)))
tips:
Bandwidth (Pct-of-Peak):
value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ (EndNs - BeginNs)) / (($sclk * $numCU) * 0.00128)))
tips:
Bank Conflict Rate:
Expand All @@ -42,24 +42,18 @@ Panel Config:
unit: Unit
tips: Tips
metric:
Wave Cycles:
avg: AVG(((4 * SQ_WAVE_CYCLES) / SQ_WAVES))
min: MIN(((4 * SQ_WAVE_CYCLES) / SQ_WAVES))
max: MAX(((4 * SQ_WAVE_CYCLES) / SQ_WAVES))
unit: Cycles/Wave
tips:
LDS Instrs:
avg: AVG((SQ_INSTS_LDS / $denom))
min: MIN((SQ_INSTS_LDS / $denom))
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ $denom))
min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ $denom))
max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ $denom))
unit: (Bytes + $normUnit)
tips:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ Panel Config:
+ TO_INT(TCC_REQ[22])) + TO_INT(TCC_REQ[23])) + TO_INT(TCC_REQ[24])) + TO_INT(TCC_REQ[25]))
+ TO_INT(TCC_REQ[26])) + TO_INT(TCC_REQ[27])) + TO_INT(TCC_REQ[28])) + TO_INT(TCC_REQ[29]))
+ TO_INT(TCC_REQ[30])) + TO_INT(TCC_REQ[31])) / 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L1 - L2 Read Req:
mean: AVG((((((((((((((((((((((((((((((((((TO_INT(TCC_READ[0]) + TO_INT(TCC_READ[1]))
Expand Down Expand Up @@ -247,7 +247,7 @@ Panel Config:
+ TO_INT(TCC_READ[24])) + TO_INT(TCC_READ[25])) + TO_INT(TCC_READ[26])) +
TO_INT(TCC_READ[27])) + TO_INT(TCC_READ[28])) + TO_INT(TCC_READ[29])) + TO_INT(TCC_READ[30]))
+ TO_INT(TCC_READ[31])) / 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L1 - L2 Write Req:
mean: AVG((((((((((((((((((((((((((((((((((TO_INT(TCC_WRITE[0]) + TO_INT(TCC_WRITE[1]))
Expand Down Expand Up @@ -294,7 +294,7 @@ Panel Config:
+ TO_INT(TCC_WRITE[24])) + TO_INT(TCC_WRITE[25])) + TO_INT(TCC_WRITE[26]))
+ TO_INT(TCC_WRITE[27])) + TO_INT(TCC_WRITE[28])) + TO_INT(TCC_WRITE[29]))
+ TO_INT(TCC_WRITE[30])) + TO_INT(TCC_WRITE[31])) / 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L1 - L2 Atomic Req:
mean: AVG((((((((((((((((((((((((((((((((((TO_INT(TCC_ATOMIC[0]) + TO_INT(TCC_ATOMIC[1]))
Expand Down Expand Up @@ -345,7 +345,7 @@ Panel Config:
+ TO_INT(TCC_ATOMIC[26])) + TO_INT(TCC_ATOMIC[27])) + TO_INT(TCC_ATOMIC[28]))
+ TO_INT(TCC_ATOMIC[29])) + TO_INT(TCC_ATOMIC[30])) + TO_INT(TCC_ATOMIC[31]))
/ 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L2 - EA Read Req:
mean: AVG((((((((((((((((((((((((((((((((((TO_INT(TCC_EA_RDREQ[0]) + TO_INT(TCC_EA_RDREQ[1]))
Expand Down Expand Up @@ -396,7 +396,7 @@ Panel Config:
+ TO_INT(TCC_EA_RDREQ[26])) + TO_INT(TCC_EA_RDREQ[27])) + TO_INT(TCC_EA_RDREQ[28]))
+ TO_INT(TCC_EA_RDREQ[29])) + TO_INT(TCC_EA_RDREQ[30])) + TO_INT(TCC_EA_RDREQ[31]))
/ 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L2 - EA Write Req:
mean: AVG((((((((((((((((((((((((((((((((((TO_INT(TCC_EA_WRREQ[0]) + TO_INT(TCC_EA_WRREQ[1]))
Expand Down Expand Up @@ -447,7 +447,7 @@ Panel Config:
+ TO_INT(TCC_EA_WRREQ[26])) + TO_INT(TCC_EA_WRREQ[27])) + TO_INT(TCC_EA_WRREQ[28]))
+ TO_INT(TCC_EA_WRREQ[29])) + TO_INT(TCC_EA_WRREQ[30])) + TO_INT(TCC_EA_WRREQ[31]))
/ 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L2 - EA Atomic Req:
mean: AVG((((((((((((((((((((((((((((((((((TO_INT(TCC_EA_ATOMIC[0]) + TO_INT(TCC_EA_ATOMIC[1]))
Expand Down Expand Up @@ -498,7 +498,7 @@ Panel Config:
+ TO_INT(TCC_EA_ATOMIC[26])) + TO_INT(TCC_EA_ATOMIC[27])) + TO_INT(TCC_EA_ATOMIC[28]))
+ TO_INT(TCC_EA_ATOMIC[29])) + TO_INT(TCC_EA_ATOMIC[30])) + TO_INT(TCC_EA_ATOMIC[31]))
/ 32) / $denom))
units: ( $normUnit )
units: ( + $normUnit)
tips:
L2 - EA Read Lat:
mean: AVG((((((((((((((((((((((((((((((((((TCC_EA_RDREQ_LEVEL[0] + TCC_EA_RDREQ_LEVEL[1])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,11 +104,11 @@ Panel Config:
/ SQ_ACTIVE_INST_ANY))) / 5)
tips:
LDS BW:
value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ (EndNs - BeginNs)))
unit: GB/sec
peak: (($sclk * $numCU) * 0.128)
pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($L2Banks))
pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($LDSBanks))
/ (EndNs - BeginNs)) / (($sclk * $numCU) * 0.00128)))
tips:
LDS Bank Conflict:
Expand Down
12 changes: 9 additions & 3 deletions src/omniperf_analyze/configs/gfx908/0700_wavefront-launch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,15 @@ Panel Config:
unit: Wavefronts
tips:
VGPRs:
avg: AVG(vgpr)
min: MIN(vgpr)
max: MAX(vgpr)
avg: AVG(arch_vgpr)
min: MIN(arch_vgpr)
max: MAX(arch_vgpr)
unit: Registers
tips:
AGPRs:
avg: AVG(accum_vgpr)
min: MIN(accum_vgpr)
max: MAX(accum_vgpr)
unit: Registers
tips:
SGPRs:
Expand Down
Loading

0 comments on commit a38e11e

Please sign in to comment.