[petsc-dev] [GPU] Crash on ex19 with mpirun -np 2 (optimized build)

Karl Rupp rupp at mcs.anl.gov
Wed Jan 15 09:41:39 CST 2014


Hi Pierre,

 > I tried to rebuild the optimized PETSc library by changing several
> options and ran:
>
> mpirun -np 2 ./ex19 -cuda_show_devices -dm_mat_type aijcusp -dm_vec_type
> cusp
> -ksp_type fgmres -ksp_view -log_summary -pc_type none
> -snes_monitor_short -snes_rtol 1.e-5
>
> Options used:
> --with-pthread=1 -O3  -> crash
> --with-pthread=0 -O2  -> crash
> --with-debugging=1 --with-pthread=1 -O2 -> OK
>
> So --with-debugging=1 is the key to avoid the crash. Not
> good for the performance of course...

thanks for the input. I get crashes even when using --with-debugging=1, 
and valgrind spits out a couple of errors as soon as
   -dm_vec_type cusp
is provided. I'll keep digging, the error is somewhere in the VecScatter 
routines when using CUSP...

Best regards,
Karli


>
> If it can helps,
>
> Pierre
>
>> Previously, I had noticed strange behaviour when running the GPU code
>> with the threadComm package. It might be worth trying to disable that
>> code in the build to see if the problem persists?
>> -Paul
>>
>>
>> On Tue, Jan 14, 2014 at 9:19 AM, Karl Rupp <rupp at mcs.anl.gov
>> <mailto:rupp at mcs.anl.gov>> wrote:
>>
>>     Hi Pierre,
>>
>>
>>     >> I could reproduce the problem and also get some uninitialized
>>     variable
>>
>>             warnings in Valgrind. The debug version detects these
>>             errors, hence
>>             you only see the errors in the debug build. For the
>>             optimized build,
>>             chances are good that the computed values are either wrong
>>             or may
>>             become wrong in other environments. I'll see what I can do
>>             when I'm
>>             again at GPU machine tomorrow (parallel GPU debugging via
>>             SSH is not
>>             great...)
>>
>>         Sorry, I mean:
>>
>>         Parallel calculation on CPU or GPU run well with PETSc non
>>         optimized library
>>         Parallel calculation on GPU crashes with PETSc optimized
>>         library (on CPU
>>         it is OK)
>>
>>
>>     The fact that it happens to run in one mode out of {debug,
>>     optimized} but not in the other is at most a lucky coincidence,
>>     but it still means that this is a bug we need to solve :-)
>>
>>
>>
>>         I could add that the "mpirun -np 1 ex19" runs well for all
>>         builds on CPU
>>         and GPU.
>>
>>
>>     I see valgrind warnings in the vector scatter routines, which is
>>     likely the reason why it doesn't work with multiple MPI ranks.
>>
>>     Best regards,
>>     Karli
>>
>>
>
>
> --
> *Trio_U support team*
> Marthe ROUX (Saclay)
> Pierre LEDAC (Grenoble)




More information about the petsc-dev mailing list