Following counters might not be supported by rocprof: SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_I8
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[ 33%] Built target fmt
[ 33%] Built target gsl_assert
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:27.087232 132840310710080 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304145 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:27.097411 132840310710080 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.307951 132840310710080 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:27.440320 132840310710080 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342909 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.479668 132840310710080 generateRocpd.cpp:582] writing SQL database for process 2382066 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:27.480971 132840310710080 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382066_results.db (UUID=00004313-00e7-70e7-8f4c-7a1a62b6c8d7)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.598185 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014300 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.599310 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.601817 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002479 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.606799 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003057 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.675659 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.068832 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.678351 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002662 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.678380 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.695114 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016719 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.695151 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.695163 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.695175 132840310710080 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.695406 132840310710080 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000218 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.696057 132840310710080 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.216389 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.701478 132840310710080 simple_timer.cpp:55] [rocprofv3] output generation ::     0.258676 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:27.701564 132840310710080 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.261193 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382066_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:29.972015 128561158512448 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306481 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:29.981818 128561158512448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.192560 128561158512448 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:30.325511 128561158512448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343693 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.364875 128561158512448 generateRocpd.cpp:582] writing SQL database for process 2382077 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:30.366146 128561158512448 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382077_results.db (UUID=00004313-0c2a-7c2a-b823-3618a9bf583d)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.456835 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014765 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.458005 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001139 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.460582 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002549 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.465723 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003138 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.528565 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062813 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.531223 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002629 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.531252 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.547153 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015885 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.547186 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.547198 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.547210 128561158512448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.547436 128561158512448 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000212 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.547991 128561158512448 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.183117 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.554100 128561158512448 simple_timer.cpp:55] [rocprofv3] output generation ::     0.226153 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:30.554189 128561158512448 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.228628 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382077_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:32.829643 137331173207872 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305346 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:32.839390 137331173207872 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.050434 137331173207872 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:33.181404 137331173207872 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342013 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.220851 137331173207872 generateRocpd.cpp:582] writing SQL database for process 2382087 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:33.222176 137331173207872 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382087_results.db (UUID=00004313-1754-7754-88f3-f86ad7b40749)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.311340 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014702 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.312439 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001068 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.314909 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002443 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.319884 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003055 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.387782 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.067865 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.390474 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002663 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.390503 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.406462 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015944 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.406491 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.406503 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.406515 137331173207872 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.406725 137331173207872 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000190 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.407280 137331173207872 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.186430 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.413238 137331173207872 simple_timer.cpp:55] [rocprofv3] output generation ::     0.229338 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:33.413322 137331173207872 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.231868 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382087_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 24 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:35.692894 140269324746560 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306844 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:35.702586 140269324746560 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:35.917338 140269324746560 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:36.050648 140269324746560 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.348063 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.089654 140269324746560 generateRocpd.cpp:582] writing SQL database for process 2382097 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:36.090984 140269324746560 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382097_results.db (UUID=00004313-2282-7282-ab90-092221218882)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.180624 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014591 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.181751 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.184357 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002578 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.189415 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003097 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.261071 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.071621 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.263646 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002539 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.263675 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.279556 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015866 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.279587 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.279599 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.279611 140269324746560 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.279838 140269324746560 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000211 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.280412 140269324746560 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.190759 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.286347 140269324746560 simple_timer.cpp:55] [rocprofv3] output generation ::     0.233244 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:36.286438 140269324746560 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.235737 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382097_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:38.558601 133221390626624 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303650 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:38.568299 133221390626624 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:38.781544 133221390626624 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:38.914671 133221390626624 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346372 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:38.953667 133221390626624 generateRocpd.cpp:582] writing SQL database for process 2382107 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:38.954939 133221390626624 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382107_results.db (UUID=00004313-2db7-7db7-b7aa-ec2810478161)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.045231 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014285 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.046340 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001078 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.048552 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002184 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.053699 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003179 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.116422 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062694 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.119239 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002787 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.119269 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.134759 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015476 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.134786 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.134798 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.134809 133221390626624 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.135024 133221390626624 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000198 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.135492 133221390626624 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.181826 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.141295 133221390626624 simple_timer.cpp:55] [rocprofv3] output generation ::     0.224143 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:39.141379 133221390626624 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.226655 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382107_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:41.400966 130644410347328 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305829 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:41.410727 130644410347328 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.621969 130644410347328 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:41.750179 130644410347328 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.339452 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.790076 130644410347328 generateRocpd.cpp:582] writing SQL database for process 2382117 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:41.791352 130644410347328 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382117_results.db (UUID=00004313-38cf-78cf-b01d-c2cc57befade)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.880451 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014424 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.881555 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001072 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.884061 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002478 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.889053 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003089 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.942964 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.053883 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.945617 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002609 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.945647 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.961674 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016013 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.961702 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.961714 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.961726 130644410347328 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.961934 130644410347328 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.962353 130644410347328 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.172278 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.968129 130644410347328 simple_timer.cpp:55] [rocprofv3] output generation ::     0.215471 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:41.968208 130644410347328 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.217978 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382117_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:44.246937 131479491997504 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302438 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:44.255728 131479491997504 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.471911 131479491997504 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:44.600880 131479491997504 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345153 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.639769 131479491997504 generateRocpd.cpp:582] writing SQL database for process 2382127 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:44.641089 131479491997504 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382127_results.db (UUID=00004313-43f1-73f1-837b-fb2a9dbb186b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.729753 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014026 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.730866 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001083 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.733099 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.738163 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003129 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.789691 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.051500 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.792319 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002599 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.792348 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.808556 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016193 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.808589 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.808601 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.808613 131479491997504 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.808826 131479491997504 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000199 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.809461 131479491997504 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.169693 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.815426 131479491997504 simple_timer.cpp:55] [rocprofv3] output generation ::     0.212070 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:44.815510 131479491997504 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.214580 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382127_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:47.066722 139299002142528 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.301941 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:47.076363 139299002142528 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.287301 139299002142528 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:47.418163 139299002142528 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341800 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.457828 139299002142528 generateRocpd.cpp:582] writing SQL database for process 2382137 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:47.459106 139299002142528 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382137_results.db (UUID=00004313-4ef5-7ef5-b480-5298d54ab75f)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.547784 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014554 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.548930 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001115 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.551329 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002370 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.556260 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003052 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.650373 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094084 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.653043 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002635 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.653072 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.669717 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016629 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.669751 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.669763 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.669774 139299002142528 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.670022 139299002142528 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000229 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.670616 139299002142528 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.212789 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.676478 139299002142528 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255810 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:47.676569 139299002142528 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.258357 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382137_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:49.934730 129916251676480 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306287 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:49.944687 129916251676480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.155994 129916251676480 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:50.292263 129916251676480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347577 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.332534 129916251676480 generateRocpd.cpp:582] writing SQL database for process 2382147 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:50.333828 129916251676480 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382147_results.db (UUID=00004313-5a24-7a24-b87f-1304764ae5a5)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.423516 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014434 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.424661 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001114 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.427221 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002532 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.432310 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003116 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.524935 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.092596 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.527479 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002514 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.527508 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.543565 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016042 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.543594 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.543606 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.543618 129916251676480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.543825 129916251676480 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000190 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.544290 129916251676480 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.211756 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.550307 129916251676480 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255498 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:50.550399 129916251676480 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.258082 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382147_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:52.846451 135982104686400 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.309538 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:52.856485 135982104686400 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.067755 135982104686400 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:53.202837 135982104686400 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346352 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.242409 135982104686400 generateRocpd.cpp:582] writing SQL database for process 2382158 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:53.243680 135982104686400 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382158_results.db (UUID=00004313-6581-7581-83ee-9d06a6c37880)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.334076 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014963 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.335227 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001119 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.337780 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002520 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.342949 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003177 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.477960 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.134968 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.480709 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002701 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.480739 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.496721 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015964 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.496756 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.496768 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.496781 135982104686400 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.497028 135982104686400 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000226 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.497681 135982104686400 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.255272 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.503771 135982104686400 simple_timer.cpp:55] [rocprofv3] output generation ::     0.298469 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:53.503870 135982104686400 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.300983 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382158_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:55.815389 124847156596544 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.312649 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:55.825244 124847156596544 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.036827 124847156596544 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:56.169801 124847156596544 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344557 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.209321 124847156596544 generateRocpd.cpp:582] writing SQL database for process 2382168 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:56.210617 124847156596544 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382168_results.db (UUID=00004313-7117-7117-93cf-6ebff42ba0c0)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.297661 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014876 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.298789 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001097 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.301296 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002478 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.306025 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002957 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.430144 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124090 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.432751 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002577 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.432781 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.448325 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015529 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.448353 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.448365 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.448377 124847156596544 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.448581 124847156596544 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000189 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.449064 124847156596544 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.239743 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.454799 124847156596544 simple_timer.cpp:55] [rocprofv3] output generation ::     0.282559 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:56.454888 124847156596544 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.285037 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382168_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:58.743427 128988691709760 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306541 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:58.753173 128988691709760 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:58.965607 128988691709760 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:47:59.098827 128988691709760 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345654 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.138385 128988691709760 generateRocpd.cpp:582] writing SQL database for process 2382178 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:47:59.139650 128988691709760 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/dl385-20-mi100-3c48/2382178_results.db (UUID=00004313-7c8d-7c8d-8edd-d2f9d393c2ca)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.228083 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014579 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.229251 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001138 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.231850 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002571 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.236958 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003175 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.313227 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076224 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.315903 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002646 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.315931 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.331514 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015568 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.331545 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.331557 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.331568 128988691709760 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.331798 128988691709760 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.332359 128988691709760 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.193975 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.338255 128988691709760 simple_timer.cpp:55] [rocprofv3] output generation ::     0.236929 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:47:59.338341 128988691709760 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.239463 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/device_filter/MI100/out/pmc_1/2382178_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
