<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Tim:</div><div class="gmail_quote">With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run ex56 with np=3 or larger np successfully.</div><div class="gmail_quote"><br></div><div class="gmail_quote">With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up to np=3. </div><div class="gmail_quote"><br></div><div class="gmail_quote">For np=4:</div><div class="gmail_quote">mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 -start_in_debugger </div><div class="gmail_quote"><br></div><div class="gmail_quote">code crashes inside mumps:</div><div class="gmail_quote"><div class="gmail_quote">Program received signal SIGSEGV, Segmentation fault.</div><div class="gmail_quote">0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph (</div><div class="gmail_quote">    id=..., first=..., last=..., ipe=..., </div><div class="gmail_quote">    pe=<error reading variable: Cannot access memory at address 0x0>, work=...)</div><div class="gmail_quote">    at dana_aux_par.F:1450</div><div class="gmail_quote">1450                MAPTAB(J) = I</div><div class="gmail_quote">(gdb) bt</div><div class="gmail_quote">#0  0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph (</div><div class="gmail_quote">    id=..., first=..., last=..., ipe=..., </div><div class="gmail_quote">    pe=<error reading variable: Cannot access memory at address 0x0>, work=...)</div><div class="gmail_quote">    at dana_aux_par.F:1450</div><div class="gmail_quote">#1  0x00007f33d759207c in dmumps_parallel_analysis::dmumps_parmetis_ord (</div><div class="gmail_quote">    id=..., ord=..., work=...) at dana_aux_par.F:400</div><div class="gmail_quote">#2  0x00007f33d7592d14 in dmumps_parallel_analysis::dmumps_do_par_ord (id=..., </div><div class="gmail_quote">    ord=..., work=...) at dana_aux_par.F:351</div><div class="gmail_quote">#3  0x00007f33d7593aa9 in dmumps_parallel_analysis::dmumps_ana_f_par (id=..., </div><div class="gmail_quote">    work1=..., work2=..., nfsiz=..., </div><div class="gmail_quote">    fils=<error reading variable: Cannot access memory at address 0x0>, </div><div class="gmail_quote">    frere=<error reading variable: Cannot access memory at address 0x0>)</div><div class="gmail_quote">    at dana_aux_par.F:98</div><div class="gmail_quote">#4  0x00007f33d74c622a in dmumps_ana_driver (id=...) at dana_driver.F:563</div><div class="gmail_quote">#5  0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108</div><div class="gmail_quote">#6  0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1, </div><div class="gmail_quote">    comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=..., dkeep=..., </div><div class="gmail_quote">    keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, </div><div class="gmail_quote">    nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, </div><div class="gmail_quote">    a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., </div><div class="gmail_quote">    eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0, rhs=..., </div><div class="gmail_quote">    rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., </div><div class="gmail_quote">    rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., </div><div class="gmail_quote">---Type <return> to continue, or q <return> to quit---</div><div class="gmail_quote">    ar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., </div><div class="gmail_quote">    colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, </div><div class="gmail_quote">    rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, irhs_sparse=..., </div><div class="gmail_quote">    irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, </div><div class="gmail_quote">    nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, </div><div class="gmail_quote">    nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., tmpdirlen=20, </div><div class="gmail_quote">    prefixlen=20, write_problemlen=20) at dmumps_f77.F:260</div><div class="gmail_quote">#7  0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at mumps_c.c:415</div><div class="gmail_quote">#8  0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280, A=0x14bafc0, </div><div class="gmail_quote">    r=0x160cc30, c=0x1609ed0, info=0x15c6708)</div><div class="gmail_quote">    at /scratch/hzhang/petsc/src/mat/impls/aij/mpi/mumps/mumps.c:1487</div><div class="gmail_quote"><br></div><div class="gmail_quote">-mat_mumps_icntl_29 = 0 or 1 give same error.<br></div><div class="gmail_quote">I'm cc'ing this email to mumps developer, who may help to resolve this matter.</div><div class="gmail_quote"><br></div><div class="gmail_quote">Hong</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
I have some problems with PETSc using MUMPS and PARMETIS.<br>
In some cases it works fine, but in some others it doesn't, so I am<br>
trying to understand what is happening.<br>
<br>
I just picked the following example:<br>
<a href="http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex53.c.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/<wbr>petsc-current/src/ksp/ksp/<wbr>examples/tutorials/ex53.c.html</a><br>
<br>
Now, when I start it with less than 4 processes it works as expected:<br>
mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1<br>
-mat_mumps_icntl_29 2<br>
<br>
But with 4 or more processes, it crashes, but only when I am using Parmetis:<br>
mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1<br>
-mat_mumps_icntl_29 2<br>
<br>
Metis worked in every case I tried without any problems.<br>
<br>
I wonder if I am doing something wrong or if this is a general problem<br>
or even a bug? Is Parmetis supposed to work with that example with 4<br>
processes?<br>
<br>
Thanks a lot and kind regards.<br>
<br>
Volker<br>
<br>
<br>
Here is the error log of process 0:<br>
<br>
Entering DMUMPS 5.0.1 driver with JOB, N =   1       10000<br>
 ==============================<wbr>===================<br>
 MUMPS compiled with option -Dmetis<br>
 MUMPS compiled with option -Dparmetis<br>
 ==============================<wbr>===================<br>
L U Solver for unsymmetric matrices<br>
Type of parallelism: Working host<br>
<br>
 ****** ANALYSIS STEP ********<br>
<br>
 ** Max-trans not allowed because matrix is distributed<br>
Using ParMETIS for parallel ordering.<br>
[0]PETSC ERROR:<br>
------------------------------<wbr>------------------------------<wbr>------------<br>
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,<br>
probably memory access out of range<br>
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>
[0]PETSC ERROR: or see<br>
<a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/<wbr>documentation/faq.html#<wbr>valgrind</a><br>
[0]PETSC ERROR: or try <a href="http://valgrind.org" rel="noreferrer" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac<br>
OS X to find memory corruption errors<br>
[0]PETSC ERROR: likely location of problem given in stack below<br>
[0]PETSC ERROR: ---------------------  Stack Frames<br>
------------------------------<wbr>------<br>
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br>
[0]PETSC ERROR:       INSTEAD the line number of the start of the function<br>
[0]PETSC ERROR:       is given.<br>
[0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395<br>
/fsgarwinhpc/133/petsc/<wbr>sources/petsc-3.7.4a/src/mat/<wbr>impls/aij/mpi/mumps/mumps.c<br>
[0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927<br>
/fsgarwinhpc/133/petsc/<wbr>sources/petsc-3.7.4a/src/mat/<wbr>interface/matrix.c<br>
[0]PETSC ERROR: [0] PCSetUp_LU line 101<br>
/fsgarwinhpc/133/petsc/<wbr>sources/petsc-3.7.4a/src/ksp/<wbr>pc/impls/factor/lu/lu.c<br>
[0]PETSC ERROR: [0] PCSetUp line 930<br>
/fsgarwinhpc/133/petsc/<wbr>sources/petsc-3.7.4a/src/ksp/<wbr>pc/interface/precon.c<br>
[0]PETSC ERROR: [0] KSPSetUp line 305<br>
/fsgarwinhpc/133/petsc/<wbr>sources/petsc-3.7.4a/src/ksp/<wbr>ksp/interface/itfunc.c<br>
[0]PETSC ERROR: [0] KSPSolve line 563<br>
/fsgarwinhpc/133/petsc/<wbr>sources/petsc-3.7.4a/src/ksp/<wbr>ksp/interface/itfunc.c<br>
[0]PETSC ERROR: --------------------- Error Message<br>
------------------------------<wbr>------------------------------<wbr>--<br>
[0]PETSC ERROR: Signal received<br>
[0]PETSC ERROR: See<br>
<a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/<wbr>documentation/faq.html</a> for trouble<br>
shooting.<br>
[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016<br>
[0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed<br>
Oct 19 16:39:49 2016<br>
[0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc<br>
--with-fc=mpiifort --with-shared-libraries=1<br>
--with-valgrind-dir=~/usr/<wbr>valgrind/<br>
--with-mpi-dir=/home/software/<wbr>intel/Intel-2016.4/compilers_<wbr>and_libraries_2016.4.258/<wbr>linux/mpi<br>
--download-scalapack --download-mumps --download-metis<br>
--download-metis-shared=0 --download-parmetis<br>
--download-parmetis-shared=0<br>
[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file<br>
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0<br>
</blockquote></div><br></div></div>