[petsc-dev] [GPU] Crash on ex19 with mpirun -np 2 (optimized build)
Karl Rupp
rupp at mcs.anl.gov
Wed Jan 15 09:41:39 CST 2014
Hi Pierre,
> I tried to rebuild the optimized PETSc library by changing several
> options and ran:
>
> mpirun -np 2 ./ex19 -cuda_show_devices -dm_mat_type aijcusp -dm_vec_type
> cusp
> -ksp_type fgmres -ksp_view -log_summary -pc_type none
> -snes_monitor_short -snes_rtol 1.e-5
>
> Options used:
> --with-pthread=1 -O3 -> crash
> --with-pthread=0 -O2 -> crash
> --with-debugging=1 --with-pthread=1 -O2 -> OK
>
> So --with-debugging=1 is the key to avoid the crash. Not
> good for the performance of course...
thanks for the input. I get crashes even when using --with-debugging=1,
and valgrind spits out a couple of errors as soon as
-dm_vec_type cusp
is provided. I'll keep digging, the error is somewhere in the VecScatter
routines when using CUSP...
Best regards,
Karli
>
> If it can helps,
>
> Pierre
>
>> Previously, I had noticed strange behaviour when running the GPU code
>> with the threadComm package. It might be worth trying to disable that
>> code in the build to see if the problem persists?
>> -Paul
>>
>>
>> On Tue, Jan 14, 2014 at 9:19 AM, Karl Rupp <rupp at mcs.anl.gov
>> <mailto:rupp at mcs.anl.gov>> wrote:
>>
>> Hi Pierre,
>>
>>
>> >> I could reproduce the problem and also get some uninitialized
>> variable
>>
>> warnings in Valgrind. The debug version detects these
>> errors, hence
>> you only see the errors in the debug build. For the
>> optimized build,
>> chances are good that the computed values are either wrong
>> or may
>> become wrong in other environments. I'll see what I can do
>> when I'm
>> again at GPU machine tomorrow (parallel GPU debugging via
>> SSH is not
>> great...)
>>
>> Sorry, I mean:
>>
>> Parallel calculation on CPU or GPU run well with PETSc non
>> optimized library
>> Parallel calculation on GPU crashes with PETSc optimized
>> library (on CPU
>> it is OK)
>>
>>
>> The fact that it happens to run in one mode out of {debug,
>> optimized} but not in the other is at most a lucky coincidence,
>> but it still means that this is a bug we need to solve :-)
>>
>>
>>
>> I could add that the "mpirun -np 1 ex19" runs well for all
>> builds on CPU
>> and GPU.
>>
>>
>> I see valgrind warnings in the vector scatter routines, which is
>> likely the reason why it doesn't work with multiple MPI ranks.
>>
>> Best regards,
>> Karli
>>
>>
>
>
> --
> *Trio_U support team*
> Marthe ROUX (Saclay)
> Pierre LEDAC (Grenoble)
More information about the petsc-dev
mailing list