[roofline] Profiling roofline counters only.
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_names/MI200
Target: MI210
Command: ./tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
Filtered sections: ['4']

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters (Roofline Only)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/3][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: tests/workloads/kernel_names/MI200/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:49.626364 132248557805376 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.184146 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:49.626994 132248557805376 simple_timer.cpp:55] [rocprofv3] './tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:49.820968 132248557805376 tool.cpp:2423] HSA version 8.21.0 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:49.903650 132248557805376 simple_timer.cpp:55] [rocprofv3] './tests/vcopy -n 1048576 -b 256 -i 3' ::     0.276657 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:49.926571 132248557805376 generateRocpd.cpp:583] writing SQL database for process 2523334 on node 2976770398
   |-> [rocprofiler-sdk] [m[0;31mE20260526 18:58:49.927393 132248557805376 generateRocpd.cpp:606] Opened result file: tests/workloads/kernel_names/MI200/out/pmc_1/smc4124-25-mi210-3c48/2523334_results.db (UUID=0001fa73-ffbf-7fbf-a1ed-3d8966f36c29)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.010963 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.007864 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.012213 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001233 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.013842 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.001615 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.024738 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.008653 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.165209 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.140454 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.180908 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.015670 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.180925 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000004 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.191756 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.010824 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.191771 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.191777 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.191784 132248557805376 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.191888 132248557805376 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000097 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.192103 132248557805376 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.265531 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.196035 132248557805376 simple_timer.cpp:55] [rocprofv3] output generation ::     0.290824 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:50.196108 132248557805376 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.292406 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: tests/workloads/kernel_names/MI200/out/pmc_1/2523334_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/3][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: tests/workloads/kernel_names/MI200/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:51.711282 128392751177536 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.185007 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:51.711875 128392751177536 simple_timer.cpp:55] [rocprofv3] './tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:51.904081 128392751177536 tool.cpp:2423] HSA version 8.21.0 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:51.986332 128392751177536 simple_timer.cpp:55] [rocprofv3] './tests/vcopy -n 1048576 -b 256 -i 3' ::     0.274456 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.008406 128392751177536 generateRocpd.cpp:583] writing SQL database for process 2523342 on node 2976770398
   |-> [rocprofiler-sdk] [m[0;31mE20260526 18:58:52.009212 128392751177536 generateRocpd.cpp:606] Opened result file: tests/workloads/kernel_names/MI200/out/pmc_1/smc4124-25-mi210-3c48/2523342_results.db (UUID=0001fa74-07e3-77e3-b0b5-bb2c99697bfe)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.092875 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.007776 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.094079 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001185 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.095683 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.001589 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.106064 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.008391 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.170126 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.064044 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.172357 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002214 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.172375 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000004 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.181836 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.009454 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.181850 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.181856 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.181862 128392751177536 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.181965 128392751177536 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.182169 128392751177536 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.173764 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.185315 128392751177536 simple_timer.cpp:55] [rocprofv3] output generation ::     0.197849 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:52.185369 128392751177536 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.198988 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: tests/workloads/kernel_names/MI200/out/pmc_1/2523342_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/3][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: tests/workloads/kernel_names/MI200/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:53.676900 139066107211584 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.185338 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:53.677494 139066107211584 simple_timer.cpp:55] [rocprofv3] './tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:53.870804 139066107211584 tool.cpp:2423] HSA version 8.21.0 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 18:58:53.956118 139066107211584 simple_timer.cpp:55] [rocprofv3] './tests/vcopy -n 1048576 -b 256 -i 3' ::     0.278625 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:53.977792 139066107211584 generateRocpd.cpp:583] writing SQL database for process 2523350 on node 2976770398
   |-> [rocprofiler-sdk] [m[0;31mE20260526 18:58:53.978575 139066107211584 generateRocpd.cpp:606] Opened result file: tests/workloads/kernel_names/MI200/out/pmc_1/smc4124-25-mi210-3c48/2523350_results.db (UUID=0001fa74-0f90-7f90-a229-acbd145c0605)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.061119 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.007800 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.062312 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001177 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.063976 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.001650 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.074720 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.008559 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.104303 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.029569 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.106722 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002404 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.106739 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000004 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.115637 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.008891 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.115651 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.115658 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.115664 139066107211584 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.115766 139066107211584 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000094 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.115953 139066107211584 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.138161 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.118775 139066107211584 simple_timer.cpp:55] [rocprofv3] output generation ::     0.161540 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 18:58:54.118826 139066107211584 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.162665 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: tests/workloads/kernel_names/MI200/out/pmc_1/2523350_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Checking for roofline.csv in tests/workloads/kernel_names/MI200
[roofline] Benchmark execution failed: 'L1'. Skipping roofline.
