<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Alfredo:</div><div class="gmail_quote">Sure, I got the tarball of mumps-5.0.2, and will test it and update petsc-mumps interface. I'll let you know if problem remains.</div><div class="gmail_quote"><br></div><div class="gmail_quote">Hong</div><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>
sorry for the late reply. The petsc installation went supersmooth and<br>
I could easily reproduce the issue. I dumped the matrix generated by<br>
petsc and read it back with a standalone mumps tester in order to<br>
confirm the bug. This bug has been already reported by another user,<br>
was fixed a few months ago and the fix was included in the 5.0.2<br>
release. Could you please check if everything works well with mumps<br>
5.0.2?<br>
<br>
Kind regards,<br>
te MUMPS team<br>
<div class="m_8646920820409431598HOEnZb"><div class="m_8646920820409431598h5"><br>
<br>
<br>
<br>
On Thu, Oct 20, 2016 at 4:44 PM, Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>
> Alfredo:<br>
> It would be much easier to install petsc with mumps, parmetis, and<br>
> debugging this case. Here is what you can do on a linux machine<br>
> (see <a href="http://www.mcs.anl.gov/petsc/documentation/installation.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/d<wbr>ocumentation/installation.html</a><wbr>):<br>
><br>
> 1) get petsc-release:<br>
> git clone -b maint <a href="https://bitbucket.org/petsc/petsc" rel="noreferrer" target="_blank">https://bitbucket.org/petsc/pe<wbr>tsc</a> petsc<br>
><br>
> cd petsc<br>
> git pull<br>
> export PETSC_DIR=$PWD<br>
> export PETSC_ARCH=<><br>
><br>
> 2) configure petsc with additional options<br>
> '--download-metis --download-parmetis --download-mumps --download-scalapack<br>
> --download-ptscotch'<br>
> see <a href="http://www.mcs.anl.gov/petsc/documentation/installation.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/d<wbr>ocumentation/installation.html</a><br>
><br>
> 3) build petsc and test<br>
> make<br>
> make test<br>
><br>
> 4) test ex53.c:<br>
> cd $PETSC_DIR/src/ksp/ksp/example<wbr>s/tutorials<br>
> make ex53<br>
> mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2<br>
> -mat_mumps_icntl_29 2<br>
><br>
> 5) debugging ex53.c:<br>
> mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2<br>
> -mat_mumps_icntl_29 2 -start_in_debugger<br>
><br>
> Give it a try. Contact us if you cannot reproduce this case.<br>
><br>
> Hong<br>
><br>
>> Dear all,<br>
>> this may well be due to a bug in the parallel analysis. Do you think you<br>
>> can reproduce the problem in a standalone MUMPS program (i.e., without going<br>
>> through PETSc) ? that would save a lot of time to track the bug since we do<br>
>> not have a PETSc install at hand. Otherwise we'll give it a shot at<br>
>> installing petsc and reproducing the problem on our side.<br>
>><br>
>> Kind regards,<br>
>> the MUMPS team<br>
>><br>
>><br>
>><br>
>> On Wed, Oct 19, 2016 at 8:32 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
>>><br>
>>><br>
>>> Tim,<br>
>>><br>
>>> You can/should also run with valgrind to determine exactly the first<br>
>>> point with memory corruption issues.<br>
>>><br>
>>> Barry<br>
>>><br>
>>> > On Oct 19, 2016, at 11:08 AM, Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>
>>> ><br>
>>> > Tim:<br>
>>> > With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run ex56<br>
>>> > with np=3 or larger np successfully.<br>
>>> ><br>
>>> > With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up to<br>
>>> > np=3.<br>
>>> ><br>
>>> > For np=4:<br>
>>> > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2<br>
>>> > -mat_mumps_icntl_29 2 -start_in_debugger<br>
>>> ><br>
>>> > code crashes inside mumps:<br>
>>> > Program received signal SIGSEGV, Segmentation fault.<br>
>>> > 0x00007f33d75857cb in<br>
>>> > dmumps_parallel_analysis::dmum<wbr>ps_build_scotch_graph (<br>
>>> > id=..., first=..., last=..., ipe=...,<br>
>>> > pe=<error reading variable: Cannot access memory at address 0x0>,<br>
>>> > work=...)<br>
>>> > at dana_aux_par.F:1450<br>
>>> > 1450 MAPTAB(J) = I<br>
>>> > (gdb) bt<br>
>>> > #0 0x00007f33d75857cb in<br>
>>> > dmumps_parallel_analysis::dmum<wbr>ps_build_scotch_graph (<br>
>>> > id=..., first=..., last=..., ipe=...,<br>
>>> > pe=<error reading variable: Cannot access memory at address 0x0>,<br>
>>> > work=...)<br>
>>> > at dana_aux_par.F:1450<br>
>>> > #1 0x00007f33d759207c in dmumps_parallel_analysis::dmum<wbr>ps_parmetis_ord<br>
>>> > (<br>
>>> > id=..., ord=..., work=...) at dana_aux_par.F:400<br>
>>> > #2 0x00007f33d7592d14 in dmumps_parallel_analysis::dmum<wbr>ps_do_par_ord<br>
>>> > (id=...,<br>
>>> > ord=..., work=...) at dana_aux_par.F:351<br>
>>> > #3 0x00007f33d7593aa9 in dmumps_parallel_analysis::dmum<wbr>ps_ana_f_par<br>
>>> > (id=...,<br>
>>> > work1=..., work2=..., nfsiz=...,<br>
>>> > fils=<error reading variable: Cannot access memory at address 0x0>,<br>
>>> > frere=<error reading variable: Cannot access memory at address<br>
>>> > 0x0>)<br>
>>> > at dana_aux_par.F:98<br>
>>> > #4 0x00007f33d74c622a in dmumps_ana_driver (id=...) at<br>
>>> > dana_driver.F:563<br>
>>> > #5 0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108<br>
>>> > #6 0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1,<br>
>>> > comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=...,<br>
>>> > dkeep=...,<br>
>>> > keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=...,<br>
>>> > ahere=0,<br>
>>> > nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=...,<br>
>>> > jcn_lochere=1,<br>
>>> > a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0,<br>
>>> > eltvar=...,<br>
>>> > eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0,<br>
>>> > rhs=...,<br>
>>> > rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=...,<br>
>>> > infog=...,<br>
>>> > rinfog=..., deficiency=0, lwk_user=0, size_schur=0,<br>
>>> > listvar_schur=...,<br>
>>> > ---Type <return> to continue, or q <return> to quit---<br>
>>> > ar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0,<br>
>>> > colsca=...,<br>
>>> > colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1,<br>
>>> > lrhs=0, lredrhs=0,<br>
>>> > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0,<br>
>>> > irhs_sparse=...,<br>
>>> > irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=...,<br>
>>> > isol_lochere=0,<br>
>>> > nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0,<br>
>>> > mblock=0, nblock=0,<br>
>>> > nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=...,<br>
>>> > write_problem=..., tmpdirlen=20,<br>
>>> > prefixlen=20, write_problemlen=20) at dmumps_f77.F:260<br>
>>> > #7 0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at<br>
>>> > mumps_c.c:415<br>
>>> > #8 0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280,<br>
>>> > A=0x14bafc0,<br>
>>> > r=0x160cc30, c=0x1609ed0, info=0x15c6708)<br>
>>> > at /scratch/hzhang/petsc/src/mat/<wbr>impls/aij/mpi/mumps/mumps.c:14<wbr>87<br>
>>> ><br>
>>> > -mat_mumps_icntl_29 = 0 or 1 give same error.<br>
>>> > I'm cc'ing this email to mumps developer, who may help to resolve this<br>
>>> > matter.<br>
>>> ><br>
>>> > Hong<br>
>>> ><br>
>>> ><br>
>>> > Hi all,<br>
>>> ><br>
>>> > I have some problems with PETSc using MUMPS and PARMETIS.<br>
>>> > In some cases it works fine, but in some others it doesn't, so I am<br>
>>> > trying to understand what is happening.<br>
>>> ><br>
>>> > I just picked the following example:<br>
>>> ><br>
>>> > <a href="http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex53.c.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/p<wbr>etsc-current/src/ksp/ksp/examp<wbr>les/tutorials/ex53.c.html</a><br>
>>> ><br>
>>> > Now, when I start it with less than 4 processes it works as expected:<br>
>>> > mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1<br>
>>> > -mat_mumps_icntl_29 2<br>
>>> ><br>
>>> > But with 4 or more processes, it crashes, but only when I am using<br>
>>> > Parmetis:<br>
>>> > mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1<br>
>>> > -mat_mumps_icntl_29 2<br>
>>> ><br>
>>> > Metis worked in every case I tried without any problems.<br>
>>> ><br>
>>> > I wonder if I am doing something wrong or if this is a general problem<br>
>>> > or even a bug? Is Parmetis supposed to work with that example with 4<br>
>>> > processes?<br>
>>> ><br>
>>> > Thanks a lot and kind regards.<br>
>>> ><br>
>>> > Volker<br>
>>> ><br>
>>> ><br>
>>> > Here is the error log of process 0:<br>
>>> ><br>
>>> > Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000<br>
>>> > ==============================<wbr>===================<br>
>>> > MUMPS compiled with option -Dmetis<br>
>>> > MUMPS compiled with option -Dparmetis<br>
>>> > ==============================<wbr>===================<br>
>>> > L U Solver for unsymmetric matrices<br>
>>> > Type of parallelism: Working host<br>
>>> ><br>
>>> > ****** ANALYSIS STEP ********<br>
>>> ><br>
>>> > ** Max-trans not allowed because matrix is distributed<br>
>>> > Using ParMETIS for parallel ordering.<br>
>>> > [0]PETSC ERROR:<br>
>>> ><br>
>>> > ------------------------------<wbr>------------------------------<wbr>------------<br>
>>> > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,<br>
>>> > probably memory access out of range<br>
>>> > [0]PETSC ERROR: Try option -start_in_debugger or<br>
>>> > -on_error_attach_debugger<br>
>>> > [0]PETSC ERROR: or see<br>
>>> > <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/d<wbr>ocumentation/faq.html#valgrind</a><br>
>>> > [0]PETSC ERROR: or try <a href="http://valgrind.org" rel="noreferrer" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac<br>
>>> > OS X to find memory corruption errors<br>
>>> > [0]PETSC ERROR: likely location of problem given in stack below<br>
>>> > [0]PETSC ERROR: --------------------- Stack Frames<br>
>>> > ------------------------------<wbr>------<br>
>>> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not<br>
>>> > available,<br>
>>> > [0]PETSC ERROR: INSTEAD the line number of the start of the<br>
>>> > function<br>
>>> > [0]PETSC ERROR: is given.<br>
>>> > [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395<br>
>>> ><br>
>>> > /fsgarwinhpc/133/petsc/sources<wbr>/petsc-3.7.4a/src/mat/impls/<wbr>aij/mpi/mumps/mumps.c<br>
>>> > [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927<br>
>>> > /fsgarwinhpc/133/petsc/sources<wbr>/petsc-3.7.4a/src/mat/interfac<wbr>e/matrix.c<br>
>>> > [0]PETSC ERROR: [0] PCSetUp_LU line 101<br>
>>> ><br>
>>> > /fsgarwinhpc/133/petsc/sources<wbr>/petsc-3.7.4a/src/ksp/pc/<wbr>impls/factor/lu/lu.c<br>
>>> > [0]PETSC ERROR: [0] PCSetUp line 930<br>
>>> ><br>
>>> > /fsgarwinhpc/133/petsc/sources<wbr>/petsc-3.7.4a/src/ksp/pc/<wbr>interface/precon.c<br>
>>> > [0]PETSC ERROR: [0] KSPSetUp line 305<br>
>>> ><br>
>>> > /fsgarwinhpc/133/petsc/sources<wbr>/petsc-3.7.4a/src/ksp/ksp/<wbr>interface/itfunc.c<br>
>>> > [0]PETSC ERROR: [0] KSPSolve line 563<br>
>>> ><br>
>>> > /fsgarwinhpc/133/petsc/sources<wbr>/petsc-3.7.4a/src/ksp/ksp/<wbr>interface/itfunc.c<br>
>>> > [0]PETSC ERROR: --------------------- Error Message<br>
>>> > ------------------------------<wbr>------------------------------<wbr>--<br>
>>> > [0]PETSC ERROR: Signal received<br>
>>> > [0]PETSC ERROR: See<br>
>>> > <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/d<wbr>ocumentation/faq.html</a> for trouble<br>
>>> > shooting.<br>
>>> > [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016<br>
>>> > [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed<br>
>>> > Oct 19 16:39:49 2016<br>
>>> > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc<br>
>>> > --with-fc=mpiifort --with-shared-libraries=1<br>
>>> > --with-valgrind-dir=~/usr/valg<wbr>rind/<br>
>>> ><br>
>>> > --with-mpi-dir=/home/software/<wbr>intel/Intel-2016.4/compilers_a<wbr>nd_libraries_2016.4.258/linux/<wbr>mpi<br>
>>> > --download-scalapack --download-mumps --download-metis<br>
>>> > --download-metis-shared=0 --download-parmetis<br>
>>> > --download-parmetis-shared=0<br>
>>> > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file<br>
>>> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0<br>
>>> ><br>
>>><br>
>><br>
>><br>
>><br>
>> --<br>
>> ------------------------------<wbr>-----------<br>
>> Alfredo Buttari, PhD<br>
>> CNRS-IRIT<br>
>> 2 rue Camichel, 31071 Toulouse, France<br>
>> <a href="http://buttari.perso.enseeiht.fr" rel="noreferrer" target="_blank">http://buttari.perso.enseeiht.<wbr>fr</a><br>
><br>
><br>
<br>
<br>
<br>
--<br>
------------------------------<wbr>-----------<br>
Alfredo Buttari, PhD<br>
CNRS-IRIT<br>
2 rue Camichel, 31071 Toulouse, France<br>
<a href="http://buttari.perso.enseeiht.fr" rel="noreferrer" target="_blank">http://buttari.perso.enseeiht.<wbr>fr</a><br>
</div></div></blockquote></div><br></div></div>