[petsc-dev] [GPU] Crash on ex19 with mpirun -np 2 (optimized build)

Projet_TRIOU triou at cea.fr
Wed Jan 15 09:17:07 CST 2014


I tried to rebuild the optimized PETSc library by changing several 
options and ran:

mpirun -np 2 ./ex19 -cuda_show_devices -dm_mat_type aijcusp -dm_vec_type 
cusp
-ksp_type fgmres -ksp_view -log_summary -pc_type none
-snes_monitor_short -snes_rtol 1.e-5

Options used:
--with-pthread=1 -O3  -> crash
--with-pthread=0 -O2  -> crash
--with-debugging=1 --with-pthread=1 -O2 -> OK

So --with-debugging=1 is the key to avoid the crash. Not
good for the performance of course...

If it can helps,

Pierre

> Previously, I had noticed strange behaviour when running the GPU code 
> with the threadComm package. It might be worth trying to disable that 
> code in the build to see if the problem persists?
> -Paul
>
>
> On Tue, Jan 14, 2014 at 9:19 AM, Karl Rupp <rupp at mcs.anl.gov 
> <mailto:rupp at mcs.anl.gov>> wrote:
>
>     Hi Pierre,
>
>
>     >> I could reproduce the problem and also get some uninitialized
>     variable
>
>             warnings in Valgrind. The debug version detects these
>             errors, hence
>             you only see the errors in the debug build. For the
>             optimized build,
>             chances are good that the computed values are either wrong
>             or may
>             become wrong in other environments. I'll see what I can do
>             when I'm
>             again at GPU machine tomorrow (parallel GPU debugging via
>             SSH is not
>             great...)
>
>         Sorry, I mean:
>
>         Parallel calculation on CPU or GPU run well with PETSc non
>         optimized library
>         Parallel calculation on GPU crashes with PETSc optimized
>         library (on CPU
>         it is OK)
>
>
>     The fact that it happens to run in one mode out of {debug,
>     optimized} but not in the other is at most a lucky coincidence,
>     but it still means that this is a bug we need to solve :-)
>
>
>
>         I could add that the "mpirun -np 1 ex19" runs well for all
>         builds on CPU
>         and GPU.
>
>
>     I see valgrind warnings in the vector scatter routines, which is
>     likely the reason why it doesn't work with multiple MPI ranks.
>
>     Best regards,
>     Karli
>
>


-- 
*Trio_U support team*
Marthe ROUX (Saclay)
Pierre LEDAC (Grenoble)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_gpu_linux_opt.log
Type: text/x-log
Size: 14966 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_gpu_linux.log
Type: text/x-log
Size: 14946 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_cpu_linux_opt.log
Type: text/x-log
Size: 15780 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_cpu_linux.log
Type: text/x-log
Size: 15761 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_gpu_linux_opt.log
Type: text/x-log
Size: 16021 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_gpu_linux.log
Type: text/x-log
Size: 16012 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_cpu_linux_opt.log
Type: text/x-log
Size: 15694 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_cpu_linux.log
Type: text/x-log
Size: 15675 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0007.bin>


More information about the petsc-dev mailing list