<div dir="ltr">Macros,<div> yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use.</div><div> Also, I found <a href="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus">https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus</a>, which might be similar to your machine (4 GPUs per node). The key point is: <span style="color:rgba(0,0,0,0.87);font-family:proxima-nova,sans-serif;font-size:19px">The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set </span>CUDA_VISIBLE_DEVICES <span style="color:rgba(0,0,0,0.87);font-family:proxima-nova,sans-serif;font-size:19px">for each MPI rank.</span></div><div> So you can try the helper script <span style="background-color:rgb(245,245,245);color:rgb(54,70,78);font-family:"Roboto Mono",SFMono-Regular,Consolas,Menlo,monospace;font-size:16.15px;font-variant-ligatures:none">set_affinity_gpu_polaris.sh </span>to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with</div><div> <span style="color:rgb(0,0,0);font-family:"Courier New",monospace;font-size:16px">srun -N 2 -n 16 </span><span style="background-color:rgb(245,245,245);color:rgb(54,70,78);font-family:"Roboto Mono",SFMono-Regular,Consolas,Menlo,monospace;font-size:16.15px;font-variant-ligatures:none">set_affinity_gpu_polaris.sh </span><span style="color:rgb(0,0,0);font-family:"Courier New",monospace;font-size:16px">/home/mnv/Firemodels_fork/fds/</span><span style="color:rgb(0,0,0);font-family:"Courier New",monospace;font-size:16px">Build/ompi_gnu_linux/fds_ompi_</span><span style="color:rgb(0,0,0);font-family:"Courier New",monospace;font-size:16px">gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span><br></div><div><br></div><div> Then, check again with <span style="color:rgb(0,0,0);font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16px">nvidia-smi to see if GPU memory is evenly allocated.</span></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, both the slurm <span style="font-family:"Courier New",monospace">scontrol show job_id -dd</span> and looking at
<span style="font-family:"Courier New",monospace">CUDA_VISIBLE_DEVICES</span> does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion
using slurm I would like to hear it.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated
MPI definitions, etc.). </div></div></div></blockquote><div><br></div><div>The PETSc configure examples are in the repository:</div><div><br></div><div> <a href="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" target="_blank">https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads</a></div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks!<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I configured the library --with-cuda and when compiling I get a compilation error with CUDAC:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o</span>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:1" target="_blank">curand2.cu:1</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this
message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:141:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:2" target="_blank">curand2.cu:2</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:198:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:1" target="_blank">curand2.cu:1</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this
message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:149:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:2" target="_blank">curand2.cu:2</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:208:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(len);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(s);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(flg);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(n);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(s);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(n);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(b);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(b);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(tmp);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(haystack);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(needle);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(tmp);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);
</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">fatal error: too many errors emitted, stopping now [-ferror-limit=]</span></div>
<div><span style="font-family:"Courier New",monospace">20 errors generated.</span></div>
<div><span style="font-family:"Courier New",monospace">Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.</span></div>
<div><span style="font-family:"Courier New",monospace">gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1</span></div>
<div><span style="font-family:"Courier New",monospace">gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2</span></div>
<div><span style="font-family:"Courier New",monospace">**************************ERROR*************************************</span></div>
<div><span style="font-family:"Courier New",monospace"> Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log</span></div>
<div><span style="font-family:"Courier New",monospace"> Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to <a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a></span></div>
<div><span style="font-family:"Courier New",monospace">********************************************************************</span></div>
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_1321177721242751015m_3108646833317763144appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_1321177721242751015m_3108646833317763144divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Monday, August 21, 2023 4:17 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov" target="_blank">collin.guan@nist.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">That is a good question. Looking at <a href="https://slurm.schedmd.com/gres.html#GPU_Management" target="_blank">https://slurm.schedmd.com/gres.html#GPU_Management</a>,
I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated.
<div><br>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
It says in the script it has allocated 2.4GB</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Best,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos<br>
</div>
<div id="m_1321177721242751015m_3108646833317763144x_m_3869060330462788085appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_1321177721242751015m_3108646833317763144x_m_3869060330462788085divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Monday, August 21, 2023 3:29 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov" target="_blank">collin.guan@nist.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi, Macros,
<div> If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node.</div>
<div> The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good.<br>
<div><br>
</div>
</div>
<div> Thanks.</div>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in
the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">Mon Aug 21 14:36:07 2023 </span>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |</span></div>
<div><span style="font-family:"Courier New",monospace">|-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |</span></div>
<div><span style="font-family:"Courier New",monospace">| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | MIG M. |</span></div>
<div><span style="font-family:"Courier New",monospace">|=========================================+======================+======================|</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace"> </span></div>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| Processes: |</span></div>
<div><span style="font-family:"Courier New",monospace">| GPU GI CI PID Type Process name GPU Memory |</span></div>
<div><span style="font-family:"Courier New",monospace">| ID ID Usage |</span></div>
<div><span style="font-family:"Courier New",monospace">|=======================================================================================|</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><span style="font-family:"Courier New",monospace">#!/bin/bash</span></div>
<div><span style="font-family:"Courier New",monospace"># ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -J test </span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --partition=gpu</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks=16</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks-per-node=8</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --cpus-per-task=1</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --nodes=2</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --time=01:00:00</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --gres=gpu:4</span></div>
<br>
<div><span style="font-family:"Courier New",monospace">export OMP_NUM_THREADS=1</span></div>
<div><span style="font-family:"Courier New",monospace"># modules</span></div>
<div><span style="font-family:"Courier New",monospace">module load cuda/11.7</span></div>
<div><span style="font-family:"Courier New",monospace">module load gcc/11.2.1/toolset</span></div>
<div><span style="font-family:"Courier New",monospace">module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">cd /home/mnv/Firemodels_fork/fds/Issues/PETSc</span></div>
<div><br>
</div>
<div></div>
<span style="font-family:"Courier New",monospace">srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span>
<div></div>
<span style="font-family:"Courier New",monospace"> </span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thank you for the advice,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_1321177721242751015m_3108646833317763144x_m_3869060330462788085x_m_-2525567993800845248appendonsend"></div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div></blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>