forked from icl-utk-edu/papi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLogP710.txt
1472 lines (1223 loc) · 68.5 KB
/
ChangeLogP710.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2023-12-19 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/flops.c: cat: fix compile-time error
On some older versions of GCC (10.3.0), not having a statement
after 'default' in a switch-case statement can yield the compiler
warning: "label at end of compound statement" These changes fix
this error and have been tested on the AMD Zen3 architecture.
2023-12-19 Giuseppe Congiu <gcongiu@icl.utk.edu>
* .../rocm/tests/hl_intercept_multi_thread_monitoring.cpp,
.../rocm/tests/hl_intercept_single_kernel_monitoring.cpp,
.../rocm/tests/hl_intercept_single_thread_monitoring.cpp,
.../rocm/tests/hl_sample_single_kernel_monitoring.cpp,
.../rocm/tests/hl_sample_single_thread_monitoring.cpp,
src/components/rocm/tests/matmul.cpp: rocm: fix warnings in the
rocm tests
* src/components/rocm/tests/Makefile: rocm: search for hipcc in
PAPI_ROCM_ROOT instead of using fixed path The path of hipcc in
the ROCm installation directory has changed. In order to be
location independent the rocm/tests Makefile should locate the
hipcc compiler in the installation directory rather than relying on
a fixed pathname.
* src/configure, src/configure.in: configure: search for rocm_smi
headers in PAPI_ROCMSMI_ROOT The configure script used to search
for rocm_smi headers in PAPI_ROCM_ROOT instead of
PAPI_ROCMSMI_ROOT. This was because the rocm headers are typically
installed under the same root. However, with rocm-6.0.0 the
rocm_smi.h causes a failure while building the sysdetect component
in PAPI (component that is enabled by default). Thus, we now look
explicitly for the rocm_smi header in PAPI_ROCMSMI_ROOT instead in
order to isolate the sysdetect & rocm components from rocm_smi.
* src/components/cuda/cupti_common.c: cuda: add cudaGetErrorString to
generate error messages cudaGetErrorString is used to the proper
disabled_message to the users whenever there is a cuda related
problem during initialization.
* src/components/cuda/cupti_common.c: cuda: refactor
get_gpu_compute_capability With exception made for trivial
functions (i.e. functions that cannot fail) every function should
return an error code for proper error handling. The
get_gpu_compute_capability does not account for error handling in
the case a cuda call failure happens.
* src/components/cuda/cupti_common.c: cuda: refactor
util_gpu_collection_kind With exception made for trivial functions
(i.e. functions that cannot fail) every function should return an
error code for proper error handling. The util_gpu_collection_kind
does not account for error handling in the case a cuda call failure
happens.
* src/components/cuda/cupti_common.c,
src/components/cuda/cupti_common.h,
src/components/cuda/cupti_profiler.c: cuda: refactor
cuptic_device_get_count With exception made for trivial functions
(i.e. functions that cannot fail) every function should return an
error code for proper error handling. The cuptic_device_get_count
does not account for error handling in the case a cuda call failure
happens.
2023-12-14 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: print all masks
descriptors for events that contain them
* src/components/rocm/roc_profiler.c: rocm: add coma separator
between event descriptor and masks
2023-12-18 Florian Weimer <fweimer@redhat.com>
* src/configure, src/configure.in: configure: Fix return values in
start thread routines Thread start routines must return a void *
value, and future compilers refuse to convert integers to pointers
with just a warning (the virtualtimer probe). Without this change,
the probe always fails to compile with future compilers (such as
GCC 14). For the tls probe, return a null pointer for future-
proofing, although current and upcoming C compilers do not treat
this omission as an error. Updates commit dd11311aadbd06ab6c76d
("configure: fix tls detection").
2023-12-14 Daniel Barry <dbarry@vols.utk.edu>
* src/papi_events.csv: presets: various cache presets for SPR CPUs
Defines the presets for data cache and total cache activity in the
Intel Sapphire Rapids architecture. These changes have been tested
on the Intel Sapphire Rapids architecture using the Counter
Analysis Toolkit.
2023-12-08 Daniel Barry <dbarry@vols.utk.edu>
* src/papi_events.csv: presets: add total cache presets for Zen4 CPUs
Add preset definitions for total L2 total cache hits and misses.
These changes have been tested on the AMD Zen4 architecture using
the Counter Analysis Toolkit.
* src/papi_events.csv: presets: correction to instr cache preset Fix
mistake introduced in commit
ef1cc48846b58156995db58f53314bd4c9ec9bc0, in which the definition
for PAPI_L2_ICM can realize negative values. These changes have
been tested on the AMD Zen4 architecture using the Counter Analysis
Toolkit.
2023-12-14 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/gen_seq_dlopen.sh: cat: reduce exec
time of instr cache benchmark Skip the most time-consuming kernels
in the CAT instruction cache benchmark. These changes have been
tested on the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/timing_kernels.c: cat: remove unused
variable Remove declaration for an unused variable. These changes
have been tested on the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/dcache.c: cat: account for proper
number of buffers Adjust the logic to properly account for how
many buffer sizes shall exceed the size of the last-level cache.
These changes have been tested on the Intel Sapphire Rapids
architecture.
2023-12-13 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/dcache.c,
src/counter_analysis_toolkit/driver.h,
src/counter_analysis_toolkit/hw_desc.h,
src/counter_analysis_toolkit/main.c: cat: read values from config
file as 'long long' Since some of the buffer sizes are very large,
then the values for the cache sizes provided in the config file
should be interpreted as type 'long long.' These changes have been
tested on the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/dcache.c: cat: remove unnecessary
typecast Remove a typecast to 'long long,' which is unnecessary
because the variable is already the type 'long long.' These
changes have been tested on the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/dcache.c,
src/counter_analysis_toolkit/dcache.h: cat: use macro for LLC
factor Create a macro to more easily define the factor by which
the LLC is multiplied to attain the largest buffer size used in the
pointer chase. These changes have been tested on the Intel
Sapphire Rapids architecture.
* src/counter_analysis_toolkit/dcache.c: cat: ensure proper integer
arithmetic Append 'LL' to constant values that are added or
multiplied with 'long long' variables. These changes have been
tested on the Intel Sapphire Rapids architecture.
2023-12-12 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/dcache.c: cat: allocate the proper max
buffer size Allocate enough space for the largest buffer size used
in the pointer chase. When values in the config file are provided,
this needs to account for them. These changes have been tested on
the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/dcache.c: cat: fix erroneous malloc
Fix an erroneous malloc() call by changing the size of each element
to that of 'long long.' These changes have been tested on the
Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/main.c: cat: fix memory leak Fix
memory leak by freeing dynamically allocated in the case it was not
previously freed. These changes have been tested on the Intel
Sapphire Rapids architecture.
* src/counter_analysis_toolkit/prepareArray.c: cat: clean-up comments
Remove in-line comments and fix typos in comments for readability.
2023-12-06 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/main.c: cat: place MPI_Barrier before
MPI_Finalize When MPI is used, no rank should reach MPI_Finalize
until all ranks' work has completed. These changes have been
tested on the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/main.c: cat: only measure latencies
once When MPI is used, only one rank needs to run the latency
tests. These changes have been tested on the Intel Sapphire Rapids
architecture.
2023-12-08 Daniel Barry <dbarry@vols.utk.edu>
* src/papi_events.csv: presets: add instr cache presets for Intel SPR
Defines the instruction cache presets for the Intel Sapphire Rapids
architecture. These changes have been tested on the Intel Sapphire
Rapids architecture using the Counter Analysis Toolkit.
* src/components/intel_gpu/README.md: intel_gpu: fix small typo Fix
small typo in the README for the Intel GPU component.
2023-12-06 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/prepareArray.c: cat: fix memory leak
Free the dynamically allocated memory at the end of the function
that sets up the pointer chain. These changes have been tested on
the AMD Zen4 architecture.
* src/counter_analysis_toolkit/dcache.c,
src/counter_analysis_toolkit/dcache.h,
src/counter_analysis_toolkit/prepareArray.c,
src/counter_analysis_toolkit/prepareArray.h,
src/counter_analysis_toolkit/timing_kernels.c,
src/counter_analysis_toolkit/timing_kernels.h: cat: store buffer
sizes as 'long long' Use 'long long' instead of 'int' for buffer
sizes to prevent overflow from occurring for large buffer sizes.
These changes have been tested on the AMD Zen4 architecture.
2023-12-04 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/timing_kernels.c: cat: properly
normalize counter values Ensure that the number of pointer chain
accesses is evenly divisible by the work macros to prevent
incorrectly normalizing event counts. These changes have been
tested on the AMD Zen4 architecture.
* src/counter_analysis_toolkit/.cat_cfg,
src/counter_analysis_toolkit/dcache.c,
src/counter_analysis_toolkit/hw_desc.h,
src/counter_analysis_toolkit/main.c: cat: fix logic for memory
hierarchy parameters Make distinction between the "L4" and "MM"
levels of the memory hierarchy. These changes have been tested on
the AMD Zen4 architecture.
* src/counter_analysis_toolkit/main.c: cat: larger default PPB value
Make the default pages-per-block (PPB) value larger to accommodate
more recent architectures. These changes have been tested on the
AMD Zen4 architecture.
* src/counter_analysis_toolkit/.cat_cfg,
src/counter_analysis_toolkit/dcache.c,
src/counter_analysis_toolkit/hw_desc.h,
src/counter_analysis_toolkit/main.c: cat: create parameter for max
PPB in config file Allow the user to change the pages-per-block
(PPB) value via the congfiguration file. These changes have been
tested on the AMD Zen4 architecture.
* src/counter_analysis_toolkit/main.c: cat: probe fewer buffers per
cache level Make the default number of buffer sizes three (per
cache level) to decrease the benchmark execution time while still
sufficiently sampling each level in the memory hierarchy. These
changes have been tested on the AMD Zen4 architecture.
2023-12-01 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/dcache.c: cat: exclude cache sizes
from tests Do not use the exact cache sizes in the sweep of buffer
sizes in the data-cache kernels because there tends to be transient
behavior at these boundaries. These changes have been tested on
the AMD Zen4 architecture.
2023-12-01 Giuseppe Congiu <gcongiu@icl.utk.edu>
* .github/workflows/ci.sh, .github/workflows/main.yml: ci: run tests
with and without PAPI debug enabled Tests should make sure real
use cases work as expected. Some tests might not working correctly
if -O0 is used as optimization level in the compiler. For example,
the ROCm runtime submits a kernel of 4 waves if the tests are built
using -O0, which makes the tests fail. Update the github test
configuration matrix to include testing without PAPI debug.
2023-11-15 Aurelian MELINTE <ame01@gmx.net>
* src/components/sysdetect/arm_cpu_utils.c: PAPI: ARM Cortexx A76
support (Raspberry Pi 5)
2023-11-29 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/tests/Makefile: rocm: change opt level to user
choice for tests
2023-11-16 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_dispatch.c,
src/components/rocm/roc_dispatch.h,
src/components/rocm/roc_profiler.c,
src/components/rocm/roc_profiler.h, src/components/rocm/rocm.c:
rocm: add rocp_evt_code_to_info support This function is needed to
allow papi_native_avail to extract qualifier descriptions for the
event identifier.
2023-11-14 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c, src/components/rocm/rocm.c:
rocm: add qualifier support This commit contains the core changes
of this feature set. It introduces the logic necessary to handle
event identifiers in such a way that device and instance attributes
are presented to the PAPI users as qualifiers. This means that
papi_native_avail will return: Native Events in Component: rocm ==
===================================================================
========== | rocm:::SQ_WAIT_INST_LDS
| | Number of wave-cycles spent waiting for LDS
instruction issue. In | | units of 4 cycles. (per-simd,
nondeterministic) | | :device=0
| | mandatory device qualifier [devices: 0,1]
| -----------------------------------------------------------------
--------------- | rocm:::TCP_TCP_TA_DATA_STALL_CYCLES
| | TCP stalls TA data interface. Now Windowed.
| | :device=0
| | mandatory device qualifier [devices: 0,1]
| | :instance=0
| | mandatory instance qualifier in range [0 - 15]
| -----------------------------------------------------------------
--------------- The PAPI user will be able to use event names in
the same form as before (all previous tests will still work) with
the relaxation on the order of device and instance numbers.
* src/components/rocm/roc_profiler.c: rocm: add finalize_features
function This function is needed as features will be generated on
the fly for rocprofiler rather than saved in the event table.
Therefore, the feature names have to be freed when the rocprofiler
context is closed.
* src/components/rocm/roc_profiler.c: rocm: add unique metric utility
functions for intercept mode In intercept mode we are only
interested in unique events, i.e., events that have the same name
and instance (can be from different devices). This is because in
intercept mode all unique events are monitored on all devices.
Though, only the counters for the actual requested events will be
presented to the user. This is a design choice that accounts for
the fact that once set, callbacks for dispatch queues cannot be
updated (this includes the monitored events).
* src/components/rocm/roc_profiler.c: rocm: add event name to info
utility functions Add functions to extract event info from the
name (e.g., device number and instance number).
* src/components/rocm/roc_profiler.c: rocm: remove useless comments
2023-11-03 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: remove useless check for
intercept code path
* src/components/rocm/roc_profiler.c: rocm: move init_callbacks call
init_callbacks should be called only once, i.e., when the
intercept_global_state is initialized. After that happens the
callbacks for the dispatch queues in all devices are already set
and cannot longer be changed.
* src/components/rocm/roc_profiler.c: rocm: remove intercept global
state macros Intercept mode macros were simple aliases to entries
in the global intercept mode state. Using explicit references to
the said data structure entries improves readability.
2023-11-21 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h,
src/components/rocm/roc_profiler.c: rocm: change type of device id
from unsigned int to int
2023-11-01 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: add event identifier
utility functions Add functions to create and query event id
attributes like device and instance.
2023-11-13 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h:
rocm: add bitmap utility functions Add rocc_dev_set and
rocc_dev_check. The first register the presence of a device in the
passed in bitmap while the second checks the bit corresponding to
the passed in device number is set in the bitmap.
2023-11-01 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h,
src/components/rocm/roc_dispatch.c,
src/components/rocm/roc_dispatch.h,
src/components/rocm/roc_profiler.c,
src/components/rocm/roc_profiler.h,
src/components/rocm/roc_profiler_config.h,
src/components/rocm/rocm.c: rocm: change the event id type to
uint64_t in backend Preparatory commit to increase the size of the
event id datatype in the component backend layer so to make it
ready for hosting event id encoded information, such as device and
instance numbers.
2023-07-21 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/template/README.md,
src/components/template/Rules.template,
src/components/template/template.c,
src/components/template/tests/Makefile,
src/components/template/tests/simple.c,
src/components/template/vendor_common.c,
src/components/template/vendor_common.h,
src/components/template/vendor_config.h,
src/components/template/vendor_dispatch.c,
src/components/template/vendor_dispatch.h,
src/components/template/vendor_profiler_v1.c,
src/components/template/vendor_profiler_v1.h: template: add
template for new components
2023-12-01 Anthony <adanalis@icl.utk.edu>
* src/counter_analysis_toolkit/timing_kernels.c: CAT: Initialize
variables to suppress warnings, and move them to correct scope.
2023-11-29 Daniel Barry <dbarry@vols.utk.edu>
* src/papi_events.csv: presets: add inst cache presets for Zen4 CPUs
Defines various instruction-cache related presets for Zen4. These
changes have been tested on the Zen4 architecture using the Counter
Analysis Toolkit.
2023-08-30 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/README.md: rocm: extend README with device
partitioning information
2023-11-01 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/sysdetect/tests/Makefile: sysdetect: add -ffree-form
to silence error in ARM comp
2023-11-09 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/README.md: rocm: add known problems with some
events to README
2023-11-17 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: fix bug in intercept mode
reset function
* src/components/rocm/roc_profiler.c: rocm: fix bug introduced by
commit 4991e1614
2023-11-14 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/libpfm4/.gitignore: libpfm4: remove leftover .gitignore file
Thu Sep 28 08:01:09 2023 +0000 Clément Foyer <clement.foyer@univ-reims.fr>
* src/libpfm4/lib/pfmlib_intel_x86_arch.c: libpfm4: update to commit
535c204 Original commit: Add Intel IceLake and Intel
SapphireRapid performance counters to the event table
2023-11-10 Anthony <adanalis@icl.utk.edu>
* src/counter_analysis_toolkit/dcache.c,
src/counter_analysis_toolkit/dcache.h: CAT: Add information about
the cache sizes in the header of the output file.
2023-11-22 Daniel Barry <dbarry@vols.utk.edu>
* src/papi_events.csv: presets: add data cache presets for Zen4 CPUs
Includes various data-cache related presets for Zen4. These
changes have been tested on the Zen4 architecture using the Counter
Analysis Toolkit.
2023-11-12 Anthony <adanalis@icl.utk.edu>
* src/counter_analysis_toolkit/main.c: CAT: Add missing option in the
usage output.
2023-11-09 Anthony <adanalis@icl.utk.edu>
* src/utils/Makefile: utils: Fix bogus "Disabled" message in
papi_component_avail for the sde component. When the sde component
is initialized, in the context of an application that uses PAPI, it
looks for the availability of libsde symbols. The rationale is that
if the application is not linked against libsde, there are no SDEs
to read, so the component disables itself. Therefore,
papi_component_avail, which does not export any SDEs itself, always
reported the sde component as "Disabled". Adding the symbols to the
utility resolves this problem.
2023-10-27 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: fix bug in intercept mode
path The intercept mode path keeps track of incercepted events
using the same hash table used to map event names to entries in the
native event table. The event names don't collide because intercept
mode keeps track of the base name of the event (discarding device
id and instance number), while native event table entries are
referenced as "name:device=N:instance=M". The reason is that events
are intercepted on all devices' dispatch queues regarless the
device id specified by the user (this approach follows rocprof
strategy). However, using only the event name without the instance
number will cause problems. Instances represent separate events and
should not be treated as a single event. The proposed patch uses a
separate has table for intercept mode and inserts the feature name
rather than the event base name. This means that events with more
than one instance will have an hash table key of the form
"name[M]", where M represents the instance. If the event only has
one instance the key will be "name".
2023-11-06 Giuseppe Congiu <gcongiu@icl.utk.edu>
* .github/workflows/ci.sh: ci: add --enable-warnings to github
actions
* src/configure, src/configure.in: configure: add -Wall to the
--enable-warnings configure flag
2023-10-24 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/configure, src/configure.in: configure: add --enable-warnings
flag The --enable-warnings configure flag allows for a maintainer
build mode where the compiler (gcc) enables extra warnings
(-Wextra).
2023-11-07 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: refactor
get_context_counters The function already takes rocp_ctx as input
argument thus there is no need to pass events_id as input argument
as well.
* src/components/rocm/roc_profiler.c: rocm: get rid of asserts
* src/components/rocm/roc_profiler.c: rocm: set return code outside
fn_fail in init_event_table
2023-11-08 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/configure, src/configure.in: configure: fix for issue #112
2023-09-13 Josh Minor <josh.minor@arm.com>
* src/components/perf_event/pe_libpfm4_events.c: Set size of
perf_attr_struct prior to getting pfm encoding
2023-11-07 William Cohen <wcohen@redhat.com>
* src/ctests/thrspecific.c: ctests/thrspecific: Have the threads
clean up after themselves Each thread is doing doing memory
allocations via malloc. They should also free the memory once they
are done to eliminate the following coverity issues: Error:
CPPCHECK_WARNING (CWE-401): [#def10]
papi-7.0.1/src/ctests/thrspecific.c:77: error[memleak]: Memory
leak: data.data # 75| } # 76|
processing = 0; # 77|-> } # 78| } # 79|
Error: CPPCHECK_WARNING (CWE-401): [#def11]
papi-7.0.1/src/ctests/thrspecific.c:77: error[memleak]: Memory
leak: data.id # 75| } # 76|
processing = 0; # 77|-> } # 78| } # 79|
* src/components/sysdetect/linux_cpu_utils.c: sysdetect: Eliminate
file resource leak in get_vendor_id() function This fix eliminates
the following issue reported by coverity: Error: RESOURCE_LEAK
(CWE-772): [#def9]
papi-7.0.1/src/components/sysdetect/linux_cpu_utils.c:900:
alloc_fn: Storage is returned from allocation function "fopen".
papi-7.0.1/src/components/sysdetect/linux_cpu_utils.c:900:
var_assign: Assigning: "fp" = storage returned from
"fopen("/proc/cpuinfo", "r")".
papi-7.0.1/src/components/sysdetect/linux_cpu_utils.c:906:
noescape: Resource "fp" is not freed or pointed-to in
"search_cpu_info".
papi-7.0.1/src/components/sysdetect/linux_cpu_utils.c:968:
leaked_storage: Variable "fp" going out of scope leaks the storage
it points to.
* src/components/net/linux-net.c: net: Ensure that strings copied are
NULL terminated The strncpy function may not put a NULL at the end
of the destination buffer if the source string is longer than the
specified copy size. To ensure that the the copied strings are null
terminated using snprintf instead and checking its return value to
ensure that the copied string was not truncated. The snprintf
function will always include a NULL at the end of copy. This
particular fix addresses the following two coverity issues: Error:
BUFFER_SIZE (CWE-170): [#def6] papi-7.0.1/src/components/net/linux-
net.c:346: buffer_size_warning: Calling "strncpy" with a maximum
size argument of 128 bytes on destination array
"_net_native_events[i].name" of size 128 bytes might leave the
destination string unterminated. Error: BUFFER_SIZE (CWE-170):
[#def7] papi-7.0.1/src/components/net/linux-net.c:347:
buffer_size_warning: Calling "strncpy" with a maximum size argument
of 128 bytes on destination array
"_net_native_events[i].description" of size 128 bytes might leave
the destination string unterminated.
* src/components/coretemp/linux-coretemp.c: coretemp: Ensure strings
copied during initialization are NULL terminated The strncpy
function will not place a NULL character at the end of the string
if the string being copied is the same length or longer than the
destination of the strncpy function. Switching the code in the
_coretemp_init_component function to use snprintf and checking the
return value of snprintf to verify the copied string fits in the
destination.
* src/components/coretemp/linux-coretemp.c: coretemp: add closedir
operation to function exit Coverity flagged a resource leak on one
of the possibe exit path of the generateEventList function. This
patch adds the missing closedir.
2023-10-25 Daniel Barry <dbarry@vols.utk.edu>
* src/components/rocm/README.md: rocm: update README For versions of
ROCM >= 5.2.0, the ROCM library path structure is different. The
README has been updated to reflect this difference. This was
verified on the Frontier supercomputer.
2023-09-29 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/sde_lib/Makefile: sde_lib: do not build with debug symbols by
default
* src/configure, src/configure.in: configure: do not build with debug
symbols by default Remove -g being added by default in configure.
2023-10-19 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/cupti_profiler.c: cuda: Fix papi_command_line
segfault when passed non-existent event name
2023-10-06 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/cupti_profiler.c, src/components/cuda/linux-
cuda.c: cuda: Improve CUDA component PAPI_read() overhead, issue 85
2023-10-06 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c,
.../rocm/tests/sample_multi_thread_monitoring.cpp,
.../rocm/tests/sample_single_thread_monitoring.cpp: rocm: fix
sampling mode multithread issue Issue #80 was causing sampling
mode multithreading not to work. This was caused by a bug in the
rocm component that tried to monitor multiple GPU devices using the
using the same rocprofiler queue. Assigning one independent queue
per device solves the issue.
2023-10-09 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: fix typo in ctx_open
2023-09-08 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.c: rocm: add logging to component
backend
* src/components/rocm/rocm.c: rocm: add logging to component frontend
* src/components/rocm/rocm.c: rocm: funnel exits through same point
in compomnent frontend
2023-07-20 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.c: rocm: refactor
rocc_dev_get_{count,id} functions
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h,
src/components/rocm/roc_profiler.c: rocm: fix warning in callback
function
2023-07-18 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h,
src/components/rocm/roc_profiler.c: rocm: move thread id get
function to roc_common
2023-07-17 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.c: rocm: fix warning in roc_common.c
* src/components/rocm/roc_profiler.h: rocm: remove roc_common.h from
roc_profiler.h
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h,
src/components/rocm/roc_profiler.c: rocm: move agent to id function
to roc_common
2023-07-14 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_profiler.h: rocm: remove leftover
err_get_last function header
* src/components/rocm/roc_dispatch.c,
src/components/rocm/roc_dispatch.h,
src/components/rocm/roc_profiler.c,
src/components/rocm/roc_profiler.h, src/components/rocm/rocm.c:
rocm: rename evt_get_descr to evt_code_to_descr
2023-07-13 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/roc_common.h,
src/components/rocm/roc_dispatch.h,
src/components/rocm/{rocp_config.h => roc_profiler_config.h}: rocm:
rename rocp_config.h to roc_profiler_config.h
* src/components/rocm/roc_profiler.c: rocm: reformat roc_profiler.c
code
* src/components/rocm/roc_profiler.c: rocm: remove FIXME comment
* src/components/rocm/roc_profiler.c: rocm: use snprintf instead of
strncpy
* src/components/rocm/roc_common.c, src/components/rocm/roc_common.h,
src/components/rocm/roc_profiler.c: rocm: extract all device
booking and checking functions
2023-07-12 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/rocm.c, src/components/rocm/rocp_config.h:
rocm: move extern declarations to config header The rocm lock and
the profiling mode variables need to be shared between the front-
end and the back-end. The reason for the lock is that this has to
be initialized by the front-end which is the only one with access
to the required information. This lock design in PAPI is flawed as
it is hard to extend.
* src/components/rocm/roc_profiler.c: rocm: remove unneeded comments
* src/components/rocm/Rules.rocm, src/components/rocm/{rocc.c =>
roc_common.c}, src/components/rocm/{rocc.h => roc_common.h},
src/components/rocm/{rocd.c => roc_dispatch.c},
src/components/rocm/{rocd.h => roc_dispatch.h},
src/components/rocm/{rocp.c => roc_profiler.c},
src/components/rocm/{rocp.h => roc_profiler.h},
src/components/rocm/rocm.c: rocm: rename source files for better
readability
2023-05-17 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/Rules.rocm, src/components/rocm/common.h,
src/components/rocm/rocc.c, src/components/rocm/rocc.h,
src/components/rocm/rocd.c, src/components/rocm/rocd.h,
src/components/rocm/rocm.c, src/components/rocm/rocp.c,
src/components/rocm/rocp.h, src/components/rocm/rocp_config.h:
rocm: extract shared functionality Some functionality can be
shared with other profiler versions, if and when these become
available. Thus, it makes sense to extract such functionality from
the specific profiler implementation and make it available to
future profiler versions.
2023-07-12 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/rocd.c, src/components/rocm/rocp.c,
src/components/rocm/rocp.h, src/configure, src/configure.in: rocm:
remove ROCM_PROF_ROCPROFILER guard This guard was introduced when
rocmtools was planned instead of rocprofiler V2.
2023-05-16 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/rocp.c: rocm: update returned error codes
Errors associated with rocprofiler calls are assigned PAPI_EMISC,
while errors caused by unexpected user actions (e.g. starting an
eventset that is already running) are assigned PAPI_EINVAL.
Everything that is not a memory allocation failure (PAPI_ENOMEM) is
assigned the PAPI_ECMP error.
* src/components/rocm/rocp.c: rocm: remove macros handling error
management
* src/components/rocm/rocp.c: rocm: rename hsa_agent_arr_t to
device_table_t
* src/components/rocm/rocp.c: rocm: replace trailing Ptr in rocm
functions with _p
2023-09-08 G-Ragghianti <ragghianti@icl.utk.edu>
* .github/workflows/ci.sh, .github/workflows/spack.sh: changing gcc
version for rocm compatibility
2023-09-29 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/sysdetect/tests/Makefile: sysdetect: fix compiler
flag selection in tests
* src/configure, src/configure.in: configure: fix tls detection
Configure TLS detection tests were failing because of wrong usage
of pthread_create(). Problem was caused by wrong definition of
thread functions which require void *f(void *) instead of int
f(void *) or void f(void *).
2023-09-26 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/smoke_tests/Makefile: smoke_tests: fix Makefile Makefile file
was missing a PAPI_ROOT path and also an additional -pthread in the
linker flags.
2023-09-15 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/linux-cuda.c, src/papi.h,
src/utils/papi_component_avail.c: cuda: Revert "utils:
papi_component_avail does not support cuda component counters"
This reverts commit 4f15f3d15463df5acfda26fbc6367756e1f62f03.
* src/components/lmsensors/linux-lmsensors.c: lmsensors: Replace
numerical literal 1024 with PATH_MAX macro
2023-09-05 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/lmsensors/README.md, src/components/lmsensors/linux-
lmsensors.c: lmsensors: Add lib/ to explicit search path to .so
loader
2023-09-15 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/coretemp/linux-coretemp.c: coretemp: Fix snprintf
warnings for gcc 10
2023-07-12 Caleb Han <calebhantech@gmail.com>
* src/sde_lib/sde_lib.hpp: sde_lib: fixed make bug
2023-09-18 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/sde/tests/Minimal/Minimal_Test.c: sde: Fix
Minimal_Test.c handle pointer
2023-07-06 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/rocp.c: rocm: fix snprintf handling The
expected return value from snprintf is < PAPI_MAX_STR_LEN. If it is
>= PAPI_MAX_STR_LEN, the input string was longer than the output
string and this is an unexpected condition that needs to be handled
properly.
* src/components/sysdetect/nvidia_gpu.c: sysdetect: fix snprintf n
argument in CUDA backend The n argument in snprintf specifies the
length of the output string not the one of the input string.
* src/components/sysdetect/amd_gpu.c: sysdetect: fix snprintf n
argument in ROCm backend The n argument in snprintf specifies the
length of the output string not the one of the input string.
* src/components/sysdetect/nvidia_gpu.c: sysdetect: do not null
terminate manually in CUDA backend snprintf will always null
terminate the output string regarless characters from input string
being dropped (i.e. if the output string is shorter than the input
string).
* src/components/sysdetect/amd_gpu.c: sysdetect: do not null
terminate manually in ROCm backend snprintf will always null
terminate the output string regarless characters from input string
being dropped (i.e. if the output string is shorter than the input
string).
2023-07-21 Lukas Alt <lukas.alt@rwth-aachen.de>
* src/components/rapl/linux-rapl.c: rapl: support for icelake-sp
2023-07-25 Daniel Barry <dbarry@vols.utk.edu>
* src/counter_analysis_toolkit/main.c: cat: add missing entry in
usage message Add a command-line flag for the instructions
benchmark to the usage message. These changes have been tested on
the Intel Sapphire Rapids architecture.
* src/counter_analysis_toolkit/main.c,
src/counter_analysis_toolkit/params.h: cat: add option for conf
file path Add an optional command-line flag for the path to the
configuration file. This is useful on systems which do not assume
the work directory is where the .cat_cfg file is located. These
changes have been tested on the Intel Sapphire Rapids architecture.
2023-09-06 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm_smi/rocs.c: rocm_smi: fix warning "variable
might be used uninitialized"
2023-09-01 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/tests/Makefile,
.../tests/hl_intercept_multi_thread_monitoring.cpp,
.../hl_intercept_single_thread_monitoring.cpp,
.../tests/hl_sample_single_thread_monitoring.cpp,
.../rocm/tests/multi_thread_monitoring.cpp,
.../rocm/tests/single_thread_monitoring.cpp: rocm: remove openmp
dependency Spack installation of PAPI + rocm component have
dependency issues with openmp caused by the AMD llvm compiler.
Because component tests are always built in PAPI this prevents
spack from installing PAPI in the system. Removing the openmp
dependency and replacing with pthreads solves the issue.
2023-09-06 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/cupti_profiler.c: cuda: fix event enumeration
2023-08-30 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/cupti_common.c: cuda: fix dangerous
dl_iterate_phdr operation
2023-08-15 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/linux-cuda.c, src/papi.h,
src/utils/papi_component_avail.c: utils: papi_component_avail does
not support cuda component counters
2023-08-24 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/tests/runtest.sh: cuda: Remove x flag from
cuda/tests/runtest.sh
2023-08-18 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/rocp.c: rocm: fix instanced events Some events
have multiple instances. The way the component was handling those
events was wrong, causing such events to not work. This patch fixes
the problem.
2023-08-23 Bert Wesarg <bert.wesarg@tu-dresden.de>
* src/components/rocm/rocp.c: rocm: prefer librocprofiler64.so.1
`librocprofiler64.so` was a linker script in 5.6 which was not be
able to `dlopen`ed. In 5.7 this has vanished completely, thus try
`.so.1` first.
Fri Jun 30 15:06:22 2023 -0400 William Cohen <wcohen@redhat.com>
* src/libpfm4/lib/pfmlib_amd64_perf_event.c,
src/libpfm4/lib/pfmlib_common.c,
src/libpfm4/lib/pfmlib_intel_skx_unc_cha.c,
src/libpfm4/lib/pfmlib_intel_x86.c,
src/libpfm4/lib/pfmlib_intel_x86_perf_event.c: libpfm4: update to
commit efd10fb Original commit: Correct the arguments in a
number of printf statements Adjusted the printf statements to fix
the following issues flagged by static analsysis: Error:
PRINTF_ARGS (CWE-685): [#def66]
libpfm-4.13.0/lib/pfmlib_intel_x86.c:87: extra_argument: This
argument was not used by the format string: "e->fstr". # 85|
__pfm_vbprintf(" any=%d", reg.sel_anythr); # 86| # 87|->
__pfm_vbprintf("]", e->fstr); # 88| # 89| for (i = 1 ; i
< e->count; i++) Error: PRINTF_ARGS (CWE-685): [#def11]
libpfm-4.13.0/lib/pfmlib_amd64_perf_event.c:78: missing_argument:
No argument for format specifier "%d". # 76| # 77| if
(e->count > 1) { # 78|-> DPRINT("%s: unsupported
count=%d\n", e->count); # 79| return
PFM_ERR_NOTSUPP; # 80| } Error: PRINTF_ARGS (CWE-685):
[#def14] libpfm-4.13.0/lib/pfmlib_common.c:1151: missing_argument:
No argument for format specifier "%d". # 1149| # 1150|
if (pfmlib_is_blacklisted_pmu(p)) { # 1151|->
DPRINT("%d PMU blacklisted, skipping initialization\n"); # 1152|
continue; # 1153| } Error: PRINTF_ARGS (CWE-685):
[#def15] libpfm-4.13.0/lib/pfmlib_common.c:1367: missing_argument:
No argument for format specifier "%s". # 1365|
ainfo->equiv= NULL; # 1366| if (*endptr) { #
1367|-> DPRINT("raw umask (%s) is not
a number\n"); # 1368| return
PFM_ERR_ATTR; # 1369| Error: PRINTF_ARGS (CWE-685): [#def34]
libpfm-4.13.0/lib/pfmlib_intel_skx_unc_cha.c:60: missing_argument:
No argument for format specifier "%x". # 58| f.val =
e->codes[1]; # 59| # 60|->
__pfm_vbprintf("[UNC_CHA_FILTER0=0x%"PRIx64" thread_id=%d
source=0x%x state=0x%x" # 61| "
state=0x%x]\n", # 62| f.val, Error:
PRINTF_ARGS (CWE-685): [#def83]
libpfm-4.13.0/lib/pfmlib_intel_x86_perf_event.c:100:
missing_argument: No argument for format specifier "%d". # 98| #
99| if (e->count > 2) { # 100|-> DPRINT("%s:
unsupported count=%d\n", e->count); # 101| return
PFM_ERR_NOTSUPP; # 102| }
2023-08-22 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/cupti_common.c,
src/components/cuda/cupti_common.h: cuda: fix get linked shared
library link error gcc 10.0
* src/components/cuda/cupti_common.c,
src/components/cuda/cupti_common.h,
src/components/cuda/cupti_profiler.c: cuda: Load cuda shared
libraries from linked/rpath/LD_LIBRARY_PATH
2023-08-13 Anustuv Pal <anustuv@icl.utk.edu>
* src/papi.h: papi.h: Fix warnings for -Wstrict-prototypes
2023-07-25 Daniel Barry <dbarry@vols.utk.edu>
* src/papi_events.csv: add more Ice Lake FLOPs presets Since there
are enough counters available to monitor both single- and double-
precision floating-point events, PAPI_FP_OPS, PAPI_FP_INS, and
PAPI_VEC_INS are all defined. These presets have been validated
using the Counter Analysis Toolkit. These changes have been tested
on the Intel Ice Lake architecture.
2023-07-31 Giuseppe Congiu <gcongiu@icl.utk.edu>
* src/components/rocm/tests/Makefile: rocm: temporarely remove all
tests from being built Spack has issues building rocm tests
because of a broken dependency in hip (openmp). To avoid spack
failing to build PAPI altogether this commits temporarely removes
the rocm component tests from being built. A better, and permanent,
solution will follow soon.
2023-07-26 Anustuv Pal <anustuv@icl.utk.edu>
* src/components/cuda/README.md, src/components/cuda/Rules.cuda,
src/components/cuda/cupti_common.c,
src/components/cuda/cupti_common.h,
src/components/cuda/cupti_config.h,
src/components/cuda/cupti_dispatch.c,
src/components/cuda/cupti_dispatch.h,
src/components/cuda/cupti_events.c,
src/components/cuda/cupti_events.h,
src/components/cuda/cupti_profiler.c,
src/components/cuda/cupti_profiler.h,
src/components/cuda/cupti_utils.c,
src/components/cuda/cupti_utils.h, src/components/cuda/htable.h,
src/components/cuda/lcuda_debug.h, src/components/cuda/linux-
cuda.c, src/components/cuda/sampling/Makefile,
src/components/cuda/sampling/README,
src/components/cuda/sampling/activity.c,
src/components/cuda/sampling/gpu_activity.c,
src/components/cuda/sampling/path.h.in,
src/components/cuda/sampling/test/matmul.cu,
.../cuda/sampling/test/sass_source_map.cubin,
.../cuda/tests/BlackScholes/BlackScholes.cu,
.../cuda/tests/BlackScholes/BlackScholes_gold.cpp,
.../tests/BlackScholes/BlackScholes_kernel.cuh,
src/components/cuda/tests/BlackScholes/Makefile,
.../cuda/tests/BlackScholes/NsightEclipse.xml,
.../cuda/tests/BlackScholes/README_SETUP.txt,
src/components/cuda/tests/BlackScholes/readme.txt,
.../cuda/tests/BlackScholes/testAllEvents.sh,
.../cuda/tests/BlackScholes/testSomeEvents.sh,
.../cuda/tests/BlackScholes/thr_BlackScholes.cu,
src/components/cuda/tests/HelloWorld.cu,
src/components/cuda/tests/HelloWorld_CUPTI11.cu,
src/components/cuda/tests/HelloWorld_NP_Ctx.cu,
src/components/cuda/tests/HelloWorld_noCuCtx.cu,
src/components/cuda/tests/LDLIB.src,
src/components/cuda/tests/Makefile,
src/components/cuda/tests/concurrent_profiling.cu,
.../cuda/tests/concurrent_profiling_noCuCtx.cu,
src/components/cuda/tests/cudaOpenMP.cu,
src/components/cuda/tests/cudaOpenMP_noCuCtx.cu,
src/components/cuda/tests/cudaTest_cupti_only.cu,
.../cuda/tests/cuda_ld_preload_example.README,
.../cuda/tests/cuda_ld_preload_example.c,
.../tests/cupti_multi_kernel_launch_monitoring.cu,
src/components/cuda/tests/gpu_work.h,
src/components/cuda/tests/likeComp_cupti_only.cu,
src/components/cuda/tests/nvlink_all.cu,
src/components/cuda/tests/nvlink_bandwidth.cu,
.../cuda/tests/nvlink_bandwidth_cupti_only.cu,
src/components/cuda/tests/pthreads.cu,
src/components/cuda/tests/pthreads_noCuCtx.cu,
src/components/cuda/tests/runAll.sh,
src/components/cuda/tests/runBW.sh,
src/components/cuda/tests/runCO.sh,
src/components/cuda/tests/runCTCO.sh,
src/components/cuda/tests/runSMG.sh,
src/components/cuda/tests/runtest.sh,
src/components/cuda/tests/simpleMultiGPU.cu,
.../cuda/tests/simpleMultiGPU_CUPTI11.cu,
.../cuda/tests/simpleMultiGPU_noCuCtx.cu,
.../cuda/tests/test_2thr_1gpu_not_allowed.cu,
.../cuda/tests/test_multi_read_and_reset.cu,
.../cuda/tests/test_multipass_event_fail.c,
.../cuda/tests/test_multipass_event_fail.cu: cuda: New cuda
component based on NVIDIA PerfWorks API.
2023-07-26 Kamil Iskra <iskra@mcs.anl.gov>
* src/components/powercap/linux-powercap.c: powercap: test counter
read permissions Check that the files inside /sys/class/powercap
/intel-rapl:<n> directories not only exist, but are readable. On
recent Linux kernels, "energy_uj" is by default readable by root
only, which is something that PAPI fails to detect, resulting in 0