[petsc-users] Irritating behavior of MUMPS with PETSc

Gunnar Jansen jansen.gunnar at gmail.com
Thu Jun 26 07:37:17 CDT 2014


Ok, I tried superlu_dist as well. Unfortunately the system seems to hang at
more or less the same position.

Sadly I can not check another version of openmpi since only this version is
installed on the cluster at the time (which it needs to be because of CUDA
for other programmers).

The -info command told me that the processes were successfully started on
both nodes. In the GMRES case this also leads to a clean run-through of the
program.

The -log_trace tells me that the problem occurs within the numeric
factorization of the matrix.

    [5] 0.00311184 Event begin: MatLUFactorSym
    [1] 0.0049789 Event begin: MatLUFactorSym
    [3] 0.00316596 Event begin: MatLUFactorSym
    [4] 0.00345397 Event begin: MatLUFactorSym
    [0] 0.00546789 Event end: MatLUFactorSym
    [0] 0.0054841 Event begin: MatLUFactorNum
    [2] 0.00545907 Event end: MatLUFactorSym
    [2] 0.005476 Event begin: MatLUFactorNum
    [1] 0.00542402 Event end: MatLUFactorSym
    [1] 0.00544 Event begin: MatLUFactorNum
    [4] 0.00369906 Event end: MatLUFactorSym
    [4] 0.00372505 Event begin: MatLUFactorNum
    [3] 0.00371909 Event end: MatLUFactorSym
    [3] 0.00374603 Event begin: MatLUFactorNum
    [5] 0.00367594 Event end: MatLUFactorSym
    [5] 0.00370193 Event begin: MatLUFactorNum

Any hints?




2014-06-25 17:17 GMT+02:00 Satish Balay <balay at mcs.anl.gov>:

> Suggest running the non-mumps case with -log_summary [to confirm that
> '-np 6' is actually used in both cases]
>
> Secondly - you can try a 'release' version of openmpi or mpich and see
> if that works. [I don't see a mention of openmpi-1.9a on the website]
>
> Also you can try -log_trace to see where its hanging [or figure out how
> to run code in debugger on this cluster]. But that might not help in
> figuring out the solution to the hang..
>
> Satish
>
> On Wed, 25 Jun 2014, Matthew Knepley wrote:
>
> > On Wed, Jun 25, 2014 at 7:09 AM, Gunnar Jansen <jansen.gunnar at gmail.com>
> > wrote:
> >
> > > You are right about the queuing system. The job is submitted with a PBS
> > > script specifying the number of nodes/processors. On the cluster petsc
> is
> > > configured in a module environment which sets the appropriate flags for
> > > compilers/rules etc.
> > >
> > > The same exact job script on the same exact nodes with a standard
> krylov
> > > method does not give any trouble but executes nicely on all processors
> (and
> > > also give the correct result).
> > >
> > > Therefore my suspicion is a missing flag in the mumps interface. Is
> this
> > > maybe rather a topic for the mumps-dev team?
> > >
> >
> > I doubt this. The whole point of MPI is to shield code from these
> details.
> >
> > Can you first try this system with SuperLU_dist?
>
> >
> >   Thanks,
> >
> >      MAtt
> >
> >
> > > Best, Gunnar
> > >
> > >
> > >
> > > 2014-06-25 15:52 GMT+02:00 Dave May <dave.mayhem23 at gmail.com>:
> > >
> > > This sounds weird.
> > >>
> > >> The launch line you provided doesn't include any information regarding
> > >> how many processors (nodes/nodes per core to use). I presume you are
> using
> > >> a queuing system. My guess is that there could be an issue with
> either (i)
> > >> your job script, (ii) the configuration of the job scheduler on the
> > >> machine, or (iii) the mpi installation on the machine.
> > >>
> > >> Have you been able to successfully run other petsc (or any mpi) codes
> > >> with the same launch options (2 nodes, 3 procs per node)?
> > >>
> > >> Cheers.
> > >>   Dave
> > >>
> > >>
> > >>
> > >>
> > >> On 25 June 2014 15:44, Gunnar Jansen <jansen.gunnar at gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> i try to solve a problem in parallel with MUMPS as the direct
> solver. As
> > >>> long as I run the program on only 1 node with 6 processors
> everything works
> > >>> fine! But using 2 nodes with 3 processors each gets mumps stuck in
> the
> > >>> factorization.
> > >>>
> > >>> For the purpose of testing I run the ex2.c on a resolution of 100x100
> > >>> (which is of course way to small for a direct solver in parallel).
> > >>>
> > >>> The code is run with :
> > >>> mpirun ./ex2 -on_error_abort -pc_type lu
> -pc_factor_mat_solver_package
> > >>> mumps -ksp_type preonly -log_summary -options_left -m 100 -n 100
> > >>> -mat_mumps_icntl_4 3
> > >>>
> > >>> The petsc-configuration I used is:
> > >>> --prefix=/opt/Petsc/3.4.4.extended --with-mpi=yes
> > >>> --with-mpi-dir=/opt/Openmpi/1.9a/ --with-debugging=no
> --download-mumps
> > >>>  --download-scalapack --download-parmetis --download-metis
> > >>>
> > >>> Is this common behavior? Or is there an error in the petsc
> configuration
> > >>> I am using here?
> > >>>
> > >>> Best,
> > >>> Gunnar
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140626/20b5677a/attachment.html>


More information about the petsc-users mailing list