<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Thank you Barry and Stefano,<div class=""><br class=""/></div><div class=""><div class="">Below is the output from the example, which I ran with an added option since my mpi is not gpu aware. I believe this may be responsible for the error. The reason I chose to compile with the option </div><div class=""><br class=""/></div><div class=""><blockquote type="cite" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div class="" style="word-wrap: break-word; line-break: after-white-space;"><div class=""><div class=""><blockquote type="cite" class=""><div class=""><div class="" style="word-wrap: break-word; line-break: after-white-space;"><div class="">--download-hypre-configure-arguments=--enable-unified-memory \</div></div></div></blockquote></div></div></div></blockquote></div></blockquote><br class=""/></div><div class="">is because it was in config/examples/arch-ci-linux-cuda-pkgs.py . There are several other examples and there is no other particular reason why I chose this one, other than using hyper. I didn’t think too much about it. After recompiling without this option the example ran successfully. I will see about combining openmpi with cuda support. </div><div class=""><br class=""/></div><div class="">Thanks!</div><div class=""><br class=""/></div><div class="">For the sake of reference:</div><div class=""><br class=""/></div><div class="">With the --download-hypre-configure-arguments=--enable-unified-memory option</div><div class=""><br class=""/></div><div class="">$ mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -use_gpu_aware_mpi 0 -info > log.ex19 2>&1</div><div class=""><br class=""/></div><div class=""><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscDetermineInitialFPTrap(): Floating point trapping is off by default 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda supported, initializing</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not supported</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not supported</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscInitialize_Common(): PETSc successfully started: number of processors = 1</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscGetHostName(): Rejecting domainname, likely is NIS node021.(none)</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscInitialize_Common(): Running on machine: node021</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Duplicating a communicator 140679929097504 30289408 max tags = 8388607</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929097504 30289408</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929097504 30289408</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929097504 30289408</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Duplicating a communicator 140679929096992 30157712 max tags = 8388607</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] DMGetDMSNES(): Creating new DMSNES</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscGetHostName(): Rejecting domainname, likely is NIS node021.(none)</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] configure(): Configured device 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">lid velocity = 0.0016, prandtl # = 1., grashof # = 1.</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2500 X 2500; storage space: 0 unneeded,48400 used</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 20</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2500) < 0.6. Do not use</span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""> </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">CompressedRow routines.</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] DMGetDMKSP(): Creating new DMKSP</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscDeviceContextSetupGlobalContext_Private(): Initializing global PetscDeviceContext</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""> </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">0 SNES Function norm 0.0406612</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] ISColoringCreate(): Number of colors 20</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PetscCommDuplicate(): Using internal PETSc communicator 140679929096992 30157712</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatFDColoringSetUp_SeqXAIJ(): ncolors 20, brows 66 and bcols 15 are used.</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] SNESComputeJacobian(): Rebuilding preconditioner</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] PCSetUp(): Setting up PC for first time</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatConvert(): Check superclass seqhypre seqaijcusparse -> 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatConvert(): Check superclass mpihypre seqaijcusparse -> 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatConvert(): Check superclass mpihypre seqaijcusparse -> 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_seqhypre_C (seqaijcusparse) -> 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_mpihypre_C (seqaijcusparse) -> 0</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_hypre_C (seqaijcusparse) -> 1</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">--------------------------------------------------------------------------</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">Primary job</span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""> </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">terminated normally, but 1 process returned</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">a non-zero exit code. Per user-direction, the job has been aborted.</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">--------------------------------------------------------------------------</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">--------------------------------------------------------------------------</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">mpiexec detected that one or more processes exited with non-zero status, thus causing</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">the job to be terminated. The first process to do so was:</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""> </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">Process name: [[51372,1],0]</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""> </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">Exit code:</span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""> </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">1</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">--------------------------------------------------------------------------</span><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""/><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=""/></div></div></div><div class="">Without the <span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">--download-hypre-configure-arguments=--enable-unified-memory option</span></div><div class=""><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""><br class=""/></span></div><div class=""><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">$ </span><font color="#000000" class=""><span style="caret-color: rgb(0, 0, 0);" class="">mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -use_gpu_aware_mpi 0 -info > log.ex19 2>&1</span></font></div><div class=""><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""><br class=""/></span></div><div class=""><font color="#000000" class=""><span style="caret-color: rgb(0, 0, 0);" class="">[0] PetscDetermineInitialFPTrap(): Floating point trapping is off by default 0<br class=""/>[0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda supported, initializing<br class=""/>[0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not supported<br class=""/>[0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not supported<br class=""/>[0] PetscInitialize_Common(): PETSc successfully started: number of processors = 1<br class=""/>[0] PetscGetHostName(): Rejecting domainname, likely is NIS node021.(none)<br class=""/>[0] PetscInitialize_Common(): Running on machine: node021<br class=""/>[0] PetscCommDuplicate(): Duplicating a communicator 140322706697504 29662720 max tags = 8388607<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706697504 29662720<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706697504 29662720<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706697504 29662720<br class=""/>[0] PetscCommDuplicate(): Duplicating a communicator 140322706696992 29531024 max tags = 8388607<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] DMGetDMSNES(): Creating new DMSNES<br class=""/>[0] PetscGetHostName(): Rejecting domainname, likely is NIS node021.(none)<br class=""/>[0] configure(): Configured device 0<br class=""/>lid velocity = 0.0016, prandtl # = 1., grashof # = 1.<br class=""/>[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2500 X 2500; storage space: 0 unneeded,48400 used<br class=""/>[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br class=""/>[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 20<br class=""/>[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2500) < 0.6. Do not use CompressedRow routines.<br class=""/>[0] DMGetDMKSP(): Creating new DMKSP<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] PetscDeviceContextSetupGlobalContext_Private(): Initializing global PetscDeviceContext<br class=""/> 0 SNES Function norm 0.0406612 <br class=""/>[0] ISColoringCreate(): Number of colors 20<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] PetscCommDuplicate(): Using internal PETSc communicator 140322706696992 29531024<br class=""/>[0] MatFDColoringSetUp_SeqXAIJ(): ncolors 20, brows 66 and bcols 15 are used.<br class=""/>[0] SNESComputeJacobian(): Rebuilding preconditioner<br class=""/>[0] PCSetUp(): Setting up PC for first time<br class=""/>[0] MatConvert(): Check superclass seqhypre seqaijcusparse -> 0<br class=""/>[0] MatConvert(): Check superclass mpihypre seqaijcusparse -> 0<br class=""/>[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_seqhypre_C (seqaijcusparse) -> 0<br class=""/>[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_mpihypre_C (seqaijcusparse) -> 0<br class=""/>[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_hypre_C (seqaijcusparse) -> 1<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.001654795047e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 4.066115181565e-02 at iteration 33<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=33<br class=""/>[0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 2.890238131751e+01 near zero implies inconsistent rhs<br class=""/>[0] PetscSplitReductionGet(): Putting reduction data in an MPI_Comm 29662720<br class=""/>[0] SNESLineSearchApply_BT(): Initial fnorm 4.066115181565e-02 gnorm 3.338338626166e-06<br class=""/>[0] SNESSolve_NEWTONLS(): fnorm=4.0661151815649638e-02, gnorm=3.3383386261659113e-06, ynorm=5.4373378910396353e-01, lssucceed=0<br class=""/> 1 SNES Function norm 3.33834e-06 <br class=""/>[0] SNESComputeJacobian(): Rebuilding preconditioner<br class=""/>[0] PCSetUp(): Setting up PC with same nonzero pattern<br class=""/>[0] MatConvert(): Check superclass seqhypre seqaijcusparse -> 0<br class=""/>[0] MatConvert(): Check superclass mpihypre seqaijcusparse -> 0<br class=""/>[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_seqhypre_C (seqaijcusparse) -> 0<br class=""/>[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_mpihypre_C (seqaijcusparse) -> 0<br class=""/>[0] MatConvert(): Check specialized (1) MatConvert_seqaijcusparse_hypre_C (seqaijcusparse) -> 1<br class=""/>[0] PetscCommGetComm(): Reusing a communicator 29662720 68829840<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.753325754967e-11 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 3.338338626166e-06 at iteration 29<br class=""/>[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged<br class=""/>[0] SNESSolve_NEWTONLS(): iter=1, linear solve iterations=29<br class=""/>[0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 3.172675080131e+01 near zero implies inconsistent rhs<br class=""/>[0] SNESLineSearchApply_BT(): Initial fnorm 3.338338626166e-06 gnorm 2.754150439906e-11<br class=""/>[0] SNESSolve_NEWTONLS(): fnorm=3.3383386261659113e-06, gnorm=2.7541504399056686e-11, ynorm=1.6805315020558734e-05, lssucceed=0<br class=""/> 2 SNES Function norm 2.754e-11 <br class=""/>[0] SNESConvergedDefault(): Converged due to function norm 2.754150439906e-11 < 4.066115181565e-10 (relative tolerance)<br class=""/>Number of SNES iterations = 2<br class=""/>[0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 29531024<br class=""/>[0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 140322706696992 is being unlinked from inner PETSc comm 29531024<br class=""/>[0] PetscCommDestroy(): Deleting PETSc MPI_Comm 29531024<br class=""/>[0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 29531024<br class=""/>[0] PetscFinalize(): PetscFinalize() called<br class=""/>[0] Petsc_DelViewer(): Removing viewer data attribute in an MPI_Comm 29662720<br class=""/>[0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 29662720<br class=""/>[0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 140322706697504 is being unlinked from inner PETSc comm 29662720<br class=""/>[0] PetscCommDestroy(): Deleting PETSc MPI_Comm 29662720<br class=""/>[0] Petsc_DelReduction(): Deleting reduction data in an MPI_Comm 29662720<br class=""/>[0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 29662720<br class=""/></span></font><br class=""/></div><div class=""><br class=""/></div><div class=""><br class=""/><div><br class=""/><blockquote type="cite" class=""><div class="">On Jul 14, 2022, at 1:56 PM, Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" class="">stefano.zampini@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"/><div class=""><div dir="auto" class="">You don't need unified memory for boomeramg to work. </div><br class=""/><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 14, 2022, 18:55 Barry Smith <<a href="mailto:bsmith@petsc.dev" class="">bsmith@petsc.dev</a>> wrote:<br class=""/></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""><br class=""/></div> So the PETSc test all run, including the test that uses a GPU.<div class=""><br class=""/></div><div class=""> The hypre test is failing. It is impossible to tell from the output why. </div><div class=""><br class=""/></div><div class=""> You can run it manually, cd src/snes/tutorials</div><div class=""><br class=""/></div><div class="">make ex19</div><div class="">mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -info > somefile</div><div class=""><br class=""/></div><div class="">then take a look at the output in somefile and send it to us. </div><div class=""><br class=""/></div><div class=""> Barry</div><div class=""><br class=""/></div><div class=""><br class=""/><div class=""><br class=""/><blockquote type="cite" class=""><div class="">On Jul 14, 2022, at 12:32 PM, Juan Pablo de Lima Costa Salazar via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank" rel="noreferrer" class="">petsc-users@mcs.anl.gov</a>> wrote:</div><br class=""/><div class=""><div style="word-wrap:break-word;line-break:after-white-space" class="">Hello,<div class=""><br class=""/></div><div class="">I was hoping to get help regarding a runtime error I am encountering <span class="">on a cluster node with 4 Tesla K40m GPUs</span> after configuring PETSc with the following command:</div><div class=""><br class=""/></div><div class="">$./configure --force \</div><div class=""> --with-precision=double \</div><div class=""> --with-debugging=0 \</div><div class=""> --COPTFLAGS=-O3 \</div><div class=""> --CXXOPTFLAGS=-O3 \</div><div class=""> --FOPTFLAGS=-O3 \</div><div class=""> PETSC_ARCH=linux64GccDPInt32-spack \</div><div class=""> --download-fblaslapack \</div><div class=""> --download-openblas \</div><div class=""> --download-hypre \</div><div class=""> --download-hypre-configure-arguments=--enable-unified-memory \</div><div class=""> --with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \</div><div class=""> --with-cuda=1 \</div><div class=""> --download-suitesparse \</div><div class=""> --download-dir=downloads \</div><div class=""> --with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc \</div><div class=""> --with-packages-download-dir=downloads \</div><div class=""> --download-sowing=downloads/v1.1.26-p4.tar.gz \</div><div class=""> --with-cuda-arch=35</div><br class=""/><div class="">When I run</div><div class=""><br class=""/></div><div class="">$ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda PETSC_ARCH=linux64GccDPInt32-spack check</div><div class="">Running check examples to verify correct installation<br class=""/>Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and PETSC_ARCH=linux64GccDPInt32-spack<br class=""/>C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br class=""/>C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br class=""/>3,5c3,15<br class=""/>< 1 SNES Function norm 4.12227e-06 <br class=""/>< 2 SNES Function norm 6.098e-11 <br class=""/>< Number of SNES iterations = 2<br class=""/>---<br class=""/>> CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139<br class=""/>> CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139<br class=""/>> --------------------------------------------------------------------------<br class=""/>> Primary job terminated normally, but 1 process returned<br class=""/>> a non-zero exit code. Per user-direction, the job has been aborted.<br class=""/>> --------------------------------------------------------------------------<br class=""/>> --------------------------------------------------------------------------<br class=""/>> mpiexec detected that one or more processes exited with non-zero status, thus causing<br class=""/>> the job to be terminated. The first process to do so was:<br class=""/>> <br class=""/>> Process name: [[52712,1],0]<br class=""/>> Exit code: 1<br class=""/>> --------------------------------------------------------------------------<br class=""/>/home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials<br class=""/>Possible problem with ex19 running with hypre, diffs above<br class=""/>=========================================<br class=""/>C/C++ example src/snes/tutorials/ex19 run successfully with cuda<br class=""/>C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse<br class=""/>Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process<br class=""/>Completed test examples<br class=""/><br class=""/></div><div class="">I have compiled the code on the head node (without GPUs) and on the compute node where there are 4 GPUs. </div><div class=""><br class=""/></div><div class=""><font class=""><span class="">$nvidia-debugdump -l<br class=""/>Found 4 NVIDIA devices<br class=""/><span style="white-space:pre-wrap" class=""> </span>Device ID: 0<br class=""/><span style="white-space:pre-wrap" class=""> </span>Device name: Tesla K40m<br class=""/><span style="white-space:pre-wrap" class=""> </span>GPU internal ID: 0320717032250<br class=""/><br class=""/><span style="white-space:pre-wrap" class=""> </span>Device ID: 1<br class=""/><span style="white-space:pre-wrap" class=""> </span>Device name: Tesla K40m<br class=""/><span style="white-space:pre-wrap" class=""> </span>GPU internal ID: 0320717031968<br class=""/><br class=""/><span style="white-space:pre-wrap" class=""> </span>Device ID: 2<br class=""/><span style="white-space:pre-wrap" class=""> </span>Device name: Tesla K40m<br class=""/><span style="white-space:pre-wrap" class=""> </span>GPU internal ID: 0320717032246<br class=""/><br class=""/><span style="white-space:pre-wrap" class=""> </span>Device ID: 3<br class=""/><span style="white-space:pre-wrap" class=""> </span>Device name: Tesla K40m<br class=""/><span style="white-space:pre-wrap" class=""> </span>GPU internal ID: 0320717032235</span></font></div><div class=""><font class=""><span class=""><br class=""/></span></font></div><div class=""><font class=""><span class="">Attached are the log files form configure and make.</span></font></div><div class=""><font class=""><span class=""><br class=""/></span></font></div><div class=""><font class="">Any pointers are highly appreciated. My intention is to use PETSc as a linear solver for OpenFOAM, leveraging the availability of GPUs at the same time. Currently I can run PETSc without GPU support. </font></div><div class=""><font class=""><br class=""/></font></div><div class=""><font class="">Cheers,</font></div><div class=""><font class="">Juan S.</font></div><div class=""><font class=""><br class=""/></font></div><div class=""></div></div><div style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""></div></div><div style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""></div><div class=""><font class=""><br class=""/></font></div><div class=""><br class=""/></div><div class=""><br class=""/></div><div class=""><br class=""/></div></div><span id="m_-5781595029270269969cid:8DC2CCDC-0FE5-4765-B588-199A913130BF" class=""><configure.log.tar.gz></span><span id="m_-5781595029270269969cid:C86A7C79-FFAD-45ED-A9DE-4F61EEC7B01F" class=""><make.log.tar.gz></span></div></blockquote></div><br class=""/></div></div></blockquote></div>
</div></blockquote></div><br class=""/></div></body></html>