[petsc-dev] [GPU] Crash on ex19 with mpirun -np 2 (optimized build)
Projet_TRIOU
triou at cea.fr
Wed Jan 15 09:17:07 CST 2014
I tried to rebuild the optimized PETSc library by changing several
options and ran:
mpirun -np 2 ./ex19 -cuda_show_devices -dm_mat_type aijcusp -dm_vec_type
cusp
-ksp_type fgmres -ksp_view -log_summary -pc_type none
-snes_monitor_short -snes_rtol 1.e-5
Options used:
--with-pthread=1 -O3 -> crash
--with-pthread=0 -O2 -> crash
--with-debugging=1 --with-pthread=1 -O2 -> OK
So --with-debugging=1 is the key to avoid the crash. Not
good for the performance of course...
If it can helps,
Pierre
> Previously, I had noticed strange behaviour when running the GPU code
> with the threadComm package. It might be worth trying to disable that
> code in the build to see if the problem persists?
> -Paul
>
>
> On Tue, Jan 14, 2014 at 9:19 AM, Karl Rupp <rupp at mcs.anl.gov
> <mailto:rupp at mcs.anl.gov>> wrote:
>
> Hi Pierre,
>
>
> >> I could reproduce the problem and also get some uninitialized
> variable
>
> warnings in Valgrind. The debug version detects these
> errors, hence
> you only see the errors in the debug build. For the
> optimized build,
> chances are good that the computed values are either wrong
> or may
> become wrong in other environments. I'll see what I can do
> when I'm
> again at GPU machine tomorrow (parallel GPU debugging via
> SSH is not
> great...)
>
> Sorry, I mean:
>
> Parallel calculation on CPU or GPU run well with PETSc non
> optimized library
> Parallel calculation on GPU crashes with PETSc optimized
> library (on CPU
> it is OK)
>
>
> The fact that it happens to run in one mode out of {debug,
> optimized} but not in the other is at most a lucky coincidence,
> but it still means that this is a bug we need to solve :-)
>
>
>
> I could add that the "mpirun -np 1 ex19" runs well for all
> builds on CPU
> and GPU.
>
>
> I see valgrind warnings in the vector scatter routines, which is
> likely the reason why it doesn't work with multiple MPI ranks.
>
> Best regards,
> Karli
>
>
--
*Trio_U support team*
Marthe ROUX (Saclay)
Pierre LEDAC (Grenoble)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_gpu_linux_opt.log
Type: text/x-log
Size: 14966 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_gpu_linux.log
Type: text/x-log
Size: 14946 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_cpu_linux_opt.log
Type: text/x-log
Size: 15780 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_2_cpu_linux.log
Type: text/x-log
Size: 15761 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_gpu_linux_opt.log
Type: text/x-log
Size: 16021 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_gpu_linux.log
Type: text/x-log
Size: 16012 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_cpu_linux_opt.log
Type: text/x-log
Size: 15694 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_np_1_cpu_linux.log
Type: text/x-log
Size: 15675 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140115/df605b9a/attachment-0007.bin>
More information about the petsc-dev
mailing list