Using report_hypre_128_1.sqlite for SQL queries. Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/cudaapisum.py report_hypre_128_1.sqlite]... Time(%) Total Time (ns) Num Calls Average Minimum Maximum StdDev Name ------- --------------- --------- ------------ --------- ----------- ------------ ------------------------- 85.9 892,950,294 38 23,498,691.9 274 383,712,584 88,065,717.9 cudaFree 7.7 80,229,013 21 3,820,429.2 2,927 78,645,954 17,145,040.7 cudaMalloc 2.7 28,562,322 26 1,098,550.8 6,469 1,411,796 542,313.0 cudaMemcpy 2.3 23,777,735 6 3,962,955.8 3,866,117 4,167,558 116,013.6 cudaMallocHost 1.0 10,600,294 6 1,766,715.7 1,654,775 1,893,557 93,166.1 cudaFreeHost 0.2 1,564,868 39 40,124.8 872 140,274 41,957.1 cudaDeviceSynchronize 0.1 1,073,463 21 51,117.3 12,105 94,300 27,948.5 cudaMemcpyAsync 0.1 525,133 43 12,212.4 4,381 31,275 9,162.3 cudaLaunchKernel 0.0 387,263 12 32,271.9 14,977 52,143 13,592.8 cudaMemset 0.0 42,668 83 514.1 352 2,965 323.1 cudaEventDestroy 0.0 38,697 80 483.7 373 2,119 306.7 cudaEventCreateWithFlags 0.0 37,411 14 2,672.2 958 12,911 3,008.0 cudaEventQuery 0.0 35,925 14 2,566.1 1,034 12,866 3,002.6 cudaStreamSynchronize 0.0 23,017 14 1,644.1 783 2,444 534.4 cudaEventRecord 0.0 21,382 3 7,127.3 3,515 13,648 5,658.0 cudaStreamCreateWithFlags 0.0 19,604 3 6,534.7 2,617 11,997 4,877.0 cudaStreamDestroy 0.0 5,561 3 1,853.7 1,350 2,288 472.8 cuInit 0.0 2,362 3 787.3 569 1,157 321.9 cudaEventCreate Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/gpukernsum.py report_hypre_128_1.sqlite]... Time(%) Total Time (ns) Instances Average Minimum Maximum StdDev Name ------- --------------- --------- -------- ------- ------- -------- ---------------------------------------------------------------------------------------------------- 48.0 983,610 12 81,967.5 77,535 84,319 2,726.4 void axpy_kernel_val(cublasAxpyParamsVal) 23.3 477,818 8 59,727.3 54,911 63,135 3,808.2 void dot_kernel, cublasGemvTensorStr… 19.1 391,229 12 32,602.4 14,016 67,584 19,806.4 void nrm2_kernel(cublasNrm2Params) 7.7 157,407 3 52,469.0 52,415 52,512 49.4 void scal_kernel_val(cublasScalParamsVal) 1.9 38,144 8 4,768.0 4,608 5,632 350.1 void reduce_1Block_kernel, cublasGemvTensorS… Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/gpumemtimesum.py report_hypre_128_1.sqlite]... Time(%) Total Time (ns) Operations Average Minimum Maximum StdDev Operation ------- --------------- ---------- --------- ------- --------- --------- ------------------ 57.8 16,321,896 23 709,647.7 1,600 1,358,678 692,419.5 [CUDA memcpy HtoD] 40.8 11,518,352 23 500,797.9 2,208 1,277,175 635,532.4 [CUDA memcpy DtoH] 1.2 335,229 12 27,935.8 25,760 29,280 966.6 [CUDA memset] 0.2 54,720 1 54,720.0 54,720 54,720 0.0 [CUDA memcpy DtoD] Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/gpumemsizesum.py report_hypre_128_1.sqlite]... Total Operations Average Minimum Maximum StdDev Operation ----------- ---------- ---------- ---------- ---------- --------- ------------------ 196,668.594 23 8,550.808 0.109 16,384.000 8,365.379 [CUDA memcpy HtoD] 196,608.000 12 16,384.000 16,384.000 16,384.000 0.000 [CUDA memset] 16,384.000 1 16,384.000 16,384.000 16,384.000 0.000 [CUDA memcpy DtoD] 147,456.109 23 6,411.135 0.008 16,384.000 8,175.790 [CUDA memcpy DtoH] Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/osrtsum.py report_hypre_128_1.sqlite]... SKIPPED: report_hypre_128_1.sqlite does not contain OS Runtime trace data Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/nvtxsum.py report_hypre_128_1.sqlite]... Time(%) Total Time (ns) Instances Average Minimum Maximum StdDev Style Range ------- --------------- --------- ---------------- -------------- -------------- -------- ------- ------------------- 99.0 12,894,222,629 1 12,894,222,629.0 12,894,222,629 12,894,222,629 0.0 PushPop solve_data 0.5 67,167,958 1 67,167,958.0 67,167,958 67,167,958 0.0 PushPop MPI:MPI_Init_thread 0.3 42,826,997 1 42,826,997.0 42,826,997 42,826,997 0.0 PushPop MPI:MPI_Finalize 0.1 18,562,542 6,994 2,654.1 2,201 139,871 3,424.3 PushPop MPI:MPI_Allreduce 0.0 203,346 6 33,891.0 2,317 56,167 18,707.4 PushPop MPI:MPI_Allgatherv 0.0 97,954 38 2,577.7 76 11,774 4,102.3 PushPop MPI:MPI_Waitall 0.0 79,065 12 6,588.8 2,672 14,205 3,832.6 PushPop MPI:MPI_Scan 0.0 76,648 17 4,508.7 998 33,032 7,663.4 PushPop MPI:MPI_Allgather 0.0 30,743 57 539.4 77 8,020 1,476.5 PushPop MPI:MPI_Bcast 0.0 10,013 1 10,013.0 10,013 10,013 0.0 PushPop MPI:MPI_Barrier Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/openmpevtsum.py report_hypre_128_1.sqlite]... SKIPPED: report_hypre_128_1.sqlite does not contain OpenMP event data. Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/vulkanmarkerssum.py report_hypre_128_1.sqlite]... SKIPPED: report_hypre_128_1.sqlite does not contain Vulkan Debug Extension (Vulkan Debug Util) data Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/pixsum.py report_hypre_128_1.sqlite]... SKIPPED: report_hypre_128_1.sqlite does not contain DX11/DX12 CPU debug markers Running [/apps/packages/compilers/nvidia-hpcsdk/Linux_x86_64/21.7/profilers/Nsight_Systems/target-linux-x64/reports/khrdebugsum.py report_hypre_128_1.sqlite]... SKIPPED: report_hypre_128_1.sqlite does not contain KHR Extension (KHR_DEBUG) data