[petsc-users] superlu_dist+parmetis

Smith, Barry F. bsmith at mcs.anl.gov
Fri Feb 15 00:38:39 CST 2019



> On Feb 15, 2019, at 12:30 AM, Marius Buerkle <mbuerkle at web.de> wrote:
> 
> It works with all options for "-mat_superlu_dist_colperm" save parmetis. or do you mean to change some other option besides colperm?

  No, I didn't have any other suggestions. It could be that this option just more easily introduces zero pivots or you could report it to Sherry Li, the main SuperLU_DIST developer and see what she says.

   Barry

> 
>> Gesendet: Freitag, 15. Februar 2019 um 04:16 Uhr
>> Von: "Smith, Barry F." <bsmith at mcs.anl.gov>
>> An: "Marius Buerkle" <mbuerkle at web.de>
>> Cc: "PETSc users list" <petsc-users at mcs.anl.gov>
>> Betreff: Re: [petsc-users] superlu_dist+parmetis
>> 
>> 
>>  Given the message the likely cause is SuperLU_DIST got a zero pivot on process 6. Presumably the colperm parmetis is one induced the zero pivot.
>> 
>>  Have you tried other superlu_dist options to find others that do not cause this?
>> 
>>   Barry
>> 
>> 
>>> On Feb 14, 2019, at 6:44 PM, Marius Buerkle via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>> 
>>> Dear PETSc team,
>>> 
>>> I try to run superlu_dist+parmetis with " -mat_superlu_dist_colperm parmetis -mat_superlu_dist_parsymbfact" which gives me the following error
>>> 
>>> [6]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero
>>> [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [6]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [6]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [651]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>> [653]PETSC ERROR: ------------------------------------------------------------------------
>>> [653]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>>> [653]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [653]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [653]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [653]PETSC ERROR: likely location of problem given in stack below
>>> [653]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>> [653]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>> [653]PETSC ERROR:       INSTEAD the line number of the start of the function
>>> [653]PETSC ERROR:       is given.
>>> [653]PETSC ERROR: [653] SuperLU_DIST:pzgssvx line 465 /home/cdfmat_marius/prog/petsc/git/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>>> [653]PETSC ERROR: [653] MatLUFactorNumeric_SuperLU_DIST line 314 /home/cdfmat_marius/prog/petsc/git/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>>> [653]PETSC ERROR: [653] MatLUFactorNumeric line 3124 /home/cdfmat_marius/prog/petsc/git/petsc/src/mat/interface/matrix.c
>>> [653]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>> [653]PETSC ERROR: Signal received
>>> [653]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>> [653]PETSC ERROR: Petsc Development GIT revision: v3.10.3-980-g66b342c  GIT Date: 2018-12-26 13:49:21 -0600
>>> [653]PETSC ERROR: /home/cdfmat_marius/prog/transomat_latest_openmpi4.0/transomat on a  named h023 by cdfmat_marius Wed Feb 13 23:58:21 2019
>>> 
>>> Any idea?
>>> 
>>> best,
>>> marius
>> 
>> 



More information about the petsc-users mailing list