Following counters might not be supported by rocprof: SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_ADD_F32
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: ['vecCopy']
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:44.257504 140268297027392 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304510 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:44.265812 140268297027392 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.478872 140268297027392 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:44.613793 140268297027392 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347982 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.653240 140268297027392 generateRocpd.cpp:582] writing SQL database for process 2382422 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:44.654505 140268297027392 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382422_results.db (UUID=00004314-2e59-7e59-b693-52f866234b2e)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.744414 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014423 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.745572 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001128 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.748178 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002577 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.753337 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003129 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.822731 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.069360 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.825414 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002652 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.825443 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.841085 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015627 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.841112 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.841124 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.841136 140268297027392 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.841337 140268297027392 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000181 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.841758 140268297027392 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.188518 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.847373 140268297027392 simple_timer.cpp:55] [rocprofv3] output generation ::     0.231025 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:44.847457 140268297027392 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.233612 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382422_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:47.122655 135000754925376 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305759 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:47.132510 135000754925376 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.345137 135000754925376 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:47.477386 135000754925376 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344876 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.517059 135000754925376 generateRocpd.cpp:582] writing SQL database for process 2382432 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:47.518317 135000754925376 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382432_results.db (UUID=00004314-3989-7989-bca4-0fa1cbe918e9)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.610877 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.612043 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001136 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.614631 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002559 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.619721 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003096 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.683029 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.063271 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.685792 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002729 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.685821 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.701946 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016111 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.701983 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.701996 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.702016 135000754925376 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.702226 135000754925376 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000196 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.702646 135000754925376 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.185588 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.708374 135000754925376 simple_timer.cpp:55] [rocprofv3] output generation ::     0.228465 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:47.708455 135000754925376 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.231015 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382432_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:49.986671 124191714807616 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304254 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:49.996351 124191714807616 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.207078 124191714807616 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:50.339528 124191714807616 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343178 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.371960 124191714807616 generateRocpd.cpp:582] writing SQL database for process 2382443 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:50.372985 124191714807616 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382443_results.db (UUID=00004314-44bb-74bb-a0ed-22d31a97c38c)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.449923 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.011170 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.451001 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001053 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.453186 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002164 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.457437 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002569 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.510329 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.052871 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.512512 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002160 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.512534 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.524802 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.012257 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.524823 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.524832 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.524841 124191714807616 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.524999 124191714807616 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000147 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.525376 124191714807616 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.153416 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.529680 124191714807616 simple_timer.cpp:55] [rocprofv3] output generation ::     0.187324 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:50.529745 124191714807616 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.190163 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382443_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 23 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:52.780648 130980361756480 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307502 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:52.790164 130980361756480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.004371 130980361756480 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:53.137879 130980361756480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347715 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.177163 130980361756480 generateRocpd.cpp:582] writing SQL database for process 2382453 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:53.178434 130980361756480 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382453_results.db (UUID=00004314-4fa1-7fa1-87c2-9da1c49813dc)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.273535 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014487 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.274707 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001140 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.277340 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002605 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.282670 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003265 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.354794 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.072095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.357446 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002620 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.357490 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.373480 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015976 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.373508 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.373520 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.373532 130980361756480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.373743 130980361756480 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000190 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.374201 130980361756480 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.197039 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.380238 130980361756480 simple_timer.cpp:55] [rocprofv3] output generation ::     0.239861 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:53.380325 130980361756480 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.242395 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382453_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:55.680840 137087521537856 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307168 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:55.690608 137087521537856 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:55.902300 137087521537856 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:56.034636 137087521537856 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344029 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.073981 137087521537856 generateRocpd.cpp:582] writing SQL database for process 2382463 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:56.075230 137087521537856 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382463_results.db (UUID=00004314-5af6-7af6-99e6-769228c20023)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.167435 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014551 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.168600 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001134 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.171208 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002580 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.176384 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003194 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.239355 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062942 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.242143 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002758 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.242172 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.258321 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016133 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.258354 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.258367 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.258379 137087521537856 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.258597 137087521537856 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.259262 137087521537856 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.185281 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.265291 137087521537856 simple_timer.cpp:55] [rocprofv3] output generation ::     0.228160 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:56.265391 137087521537856 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.230703 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382463_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:58.515658 128904723840832 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304517 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:58.525421 128904723840832 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:58.737847 128904723840832 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:58.868676 128904723840832 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343256 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:58.908331 128904723840832 generateRocpd.cpp:582] writing SQL database for process 2382474 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:58.909614 128904723840832 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382474_results.db (UUID=00004314-660b-760b-910a-e60f95cd5f1d)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.002996 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015385 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.004178 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001149 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.006747 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002536 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.012011 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.066459 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.054419 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.069139 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002650 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.069168 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.085091 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015908 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.085119 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.085131 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.085143 128904723840832 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.085360 128904723840832 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000197 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.085778 128904723840832 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.177448 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.091599 128904723840832 simple_timer.cpp:55] [rocprofv3] output generation ::     0.220393 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:59.091683 128904723840832 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.222953 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382474_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:01.343476 130361938644800 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303968 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:01.353331 130361938644800 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.569292 130361938644800 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:01.700546 130361938644800 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347215 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.739473 130361938644800 generateRocpd.cpp:582] writing SQL database for process 2382484 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:01.740750 130361938644800 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382484_results.db (UUID=00004314-7117-7117-8882-05643613c239)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.834666 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014365 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.835819 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001122 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.838082 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002235 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.843263 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003191 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.894469 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.051177 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.897302 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002801 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.897332 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.913202 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015856 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.913235 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.913247 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.913259 130361938644800 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.913472 130361938644800 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000195 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.913963 130361938644800 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.174490 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.920139 130361938644800 simple_timer.cpp:55] [rocprofv3] output generation ::     0.217105 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:01.920246 130361938644800 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.219651 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382484_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:04.180941 140093700849472 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304522 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:04.190720 140093700849472 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.402782 140093700849472 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:04.535489 140093700849472 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344770 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.574698 140093700849472 generateRocpd.cpp:582] writing SQL database for process 2382496 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:04.575971 140093700849472 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382496_results.db (UUID=00004314-7c2c-7c2c-a87c-7bcbfc7d8e79)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.670242 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014928 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.671443 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001171 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.674044 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002573 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.679243 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003131 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.773683 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094406 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.776387 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002675 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.776416 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.792469 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016038 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.792498 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.792511 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.792522 140093700849472 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.792738 140093700849472 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000196 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.793240 140093700849472 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.218542 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.799165 140093700849472 simple_timer.cpp:55] [rocprofv3] output generation ::     0.261208 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:04.799270 140093700849472 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.263729 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382496_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:07.074541 126053413179200 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304630 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:07.084855 126053413179200 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.296344 126053413179200 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:07.431413 126053413179200 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346559 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.470883 126053413179200 generateRocpd.cpp:582] writing SQL database for process 2382506 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:07.472198 126053413179200 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382506_results.db (UUID=00004314-877a-777a-a95f-07ae01a64229)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.564952 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014610 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.566156 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001150 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.568725 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002541 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.573906 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003181 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.666947 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.093013 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.669764 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002773 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.669793 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.685592 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015784 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.685619 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.685632 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.685643 126053413179200 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.685847 126053413179200 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000183 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.686338 126053413179200 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.215456 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.692503 126053413179200 simple_timer.cpp:55] [rocprofv3] output generation ::     0.258590 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:07.692597 126053413179200 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.261132 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382506_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:10.013999 126951468191552 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.313440 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:10.024138 126951468191552 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.234578 126951468191552 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:10.370905 126951468191552 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346768 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.409819 126951468191552 generateRocpd.cpp:582] writing SQL database for process 2382516 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:10.411109 126951468191552 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382516_results.db (UUID=00004314-92ed-72ed-a5e6-02ed838409ed)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.504199 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015057 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.505400 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001169 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.508041 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002613 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.513195 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003122 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.648446 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.135223 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.651255 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002778 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.651284 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.666820 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015521 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.666848 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.666860 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.666872 126951468191552 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.667098 126951468191552 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000206 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.667574 126951468191552 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.257755 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.673356 126951468191552 simple_timer.cpp:55] [rocprofv3] output generation ::     0.299971 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:10.673447 126951468191552 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.302490 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382516_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:13.001521 134819447869248 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.311966 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:13.011427 134819447869248 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.226749 134819447869248 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:13.362302 134819447869248 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.350875 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.401441 134819447869248 generateRocpd.cpp:582] writing SQL database for process 2382526 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:13.402724 134819447869248 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382526_results.db (UUID=00004314-9e9a-7e9a-8dd5-ab64754148b6)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.495926 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014867 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.497121 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001165 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.499787 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002638 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.505659 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003750 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.630097 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124409 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.632735 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002608 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.632764 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.649208 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016430 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.649236 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.649248 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.649259 134819447869248 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.649477 134819447869248 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000196 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.649944 134819447869248 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.248503 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.655742 134819447869248 simple_timer.cpp:55] [rocprofv3] output generation ::     0.290978 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:13.655823 134819447869248 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.293468 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382526_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:15.939219 131299910565696 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305093 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:15.947662 131299910565696 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.157868 131299910565696 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:16.291761 131299910565696 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344099 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.331804 131299910565696 generateRocpd.cpp:582] writing SQL database for process 2382536 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:16.333107 131299910565696 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/dl385-20-mi100-3c48/2382536_results.db (UUID=00004314-aa1a-7a1a-be20-f10fd22d4e7b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.426791 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014657 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.428024 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001199 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.430709 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002657 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.436065 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003266 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.512635 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076541 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.515514 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002850 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.515544 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.532108 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016549 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.532135 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.532148 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.532160 131299910565696 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.532367 131299910565696 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000189 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.532813 131299910565696 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.201010 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.538525 131299910565696 simple_timer.cpp:55] [rocprofv3] output generation ::     0.244276 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:16.538617 131299910565696 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.246805 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_substr/MI100/out/pmc_1/2382536_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
