<div dir="ltr">Ok, I tried superlu_dist as well. Unfortunately the system seems to hang at more or less the same position. <br><br>Sadly I can not check another version of openmpi since only this version is installed on the cluster at the time (which it needs to be because of CUDA for other programmers).<div>

<br>The -info command told me that the processes were successfully started on both nodes. In the GMRES case this also leads to a clean run-through of the program.<br><br>The -log_trace tells me that the problem occurs within the numeric factorization of the matrix.<br>

<div><br></div><div>    [5] 0.00311184 Event begin: MatLUFactorSym</div><div>    [1] 0.0049789 Event begin: MatLUFactorSym</div><div>    [3] 0.00316596 Event begin: MatLUFactorSym</div><div>    [4] 0.00345397 Event begin: MatLUFactorSym</div>

<div>    [0] 0.00546789 Event end: MatLUFactorSym</div><div>    [0] 0.0054841 Event begin: MatLUFactorNum</div><div>    [2] 0.00545907 Event end: MatLUFactorSym</div><div>    [2] 0.005476 Event begin: MatLUFactorNum</div>

<div>    [1] 0.00542402 Event end: MatLUFactorSym</div><div>    [1] 0.00544 Event begin: MatLUFactorNum</div><div>    [4] 0.00369906 Event end: MatLUFactorSym</div><div>    [4] 0.00372505 Event begin: MatLUFactorNum</div>

<div>    [3] 0.00371909 Event end: MatLUFactorSym</div><div>    [3] 0.00374603 Event begin: MatLUFactorNum</div><div>    [5] 0.00367594 Event end: MatLUFactorSym</div><div>    [5] 0.00370193 Event begin: MatLUFactorNum<br>

<br>Any hints?<br><br><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-06-25 17:17 GMT+02:00 Satish Balay <span dir="ltr"><<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>></span>:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Suggest running the non-mumps case with -log_summary [to confirm that<br>

'-np 6' is actually used in both cases]<br>

<br>

Secondly - you can try a 'release' version of openmpi or mpich and see<br>

if that works. [I don't see a mention of openmpi-1.9a on the website]<br>

<br>

Also you can try -log_trace to see where its hanging [or figure out how<br>

to run code in debugger on this cluster]. But that might not help in<br>

figuring out the solution to the hang..<br>

<span class="HOEnZb"><font color="#888888"><br>

Satish<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

On Wed, 25 Jun 2014, Matthew Knepley wrote:<br>

<br>

> On Wed, Jun 25, 2014 at 7:09 AM, Gunnar Jansen <<a href="mailto:jansen.gunnar@gmail.com">jansen.gunnar@gmail.com</a>><br>

> wrote:<br>

><br>

> > You are right about the queuing system. The job is submitted with a PBS<br>

> > script specifying the number of nodes/processors. On the cluster petsc is<br>

> > configured in a module environment which sets the appropriate flags for<br>

> > compilers/rules etc.<br>

> ><br>

> > The same exact job script on the same exact nodes with a standard krylov<br>

> > method does not give any trouble but executes nicely on all processors (and<br>

> > also give the correct result).<br>

> ><br>

> > Therefore my suspicion is a missing flag in the mumps interface. Is this<br>

> > maybe rather a topic for the mumps-dev team?<br>

> ><br>

><br>

> I doubt this. The whole point of MPI is to shield code from these details.<br>

><br>

> Can you first try this system with SuperLU_dist?<br>

<br>

><br>

>   Thanks,<br>

><br>

>      MAtt<br>

><br>

><br>

> > Best, Gunnar<br>

> ><br>

> ><br>

> ><br>

> > 2014-06-25 15:52 GMT+02:00 Dave May <<a href="mailto:dave.mayhem23@gmail.com">dave.mayhem23@gmail.com</a>>:<br>

> ><br>

> > This sounds weird.<br>

> >><br>

> >> The launch line you provided doesn't include any information regarding<br>

> >> how many processors (nodes/nodes per core to use). I presume you are using<br>

> >> a queuing system. My guess is that there could be an issue with either (i)<br>

> >> your job script, (ii) the configuration of the job scheduler on the<br>

> >> machine, or (iii) the mpi installation on the machine.<br>

> >><br>

> >> Have you been able to successfully run other petsc (or any mpi) codes<br>

> >> with the same launch options (2 nodes, 3 procs per node)?<br>

> >><br>

> >> Cheers.<br>

> >>   Dave<br>

> >><br>

> >><br>

> >><br>

> >><br>

> >> On 25 June 2014 15:44, Gunnar Jansen <<a href="mailto:jansen.gunnar@gmail.com">jansen.gunnar@gmail.com</a>> wrote:<br>

> >><br>

> >>> Hi,<br>

> >>><br>

> >>> i try to solve a problem in parallel with MUMPS as the direct solver. As<br>

> >>> long as I run the program on only 1 node with 6 processors everything works<br>

> >>> fine! But using 2 nodes with 3 processors each gets mumps stuck in the<br>

> >>> factorization.<br>

> >>><br>

> >>> For the purpose of testing I run the ex2.c on a resolution of 100x100<br>

> >>> (which is of course way to small for a direct solver in parallel).<br>

> >>><br>

> >>> The code is run with :<br>

> >>> mpirun ./ex2 -on_error_abort -pc_type lu -pc_factor_mat_solver_package<br>

> >>> mumps -ksp_type preonly -log_summary -options_left -m 100 -n 100<br>

> >>> -mat_mumps_icntl_4 3<br>

> >>><br>

> >>> The petsc-configuration I used is:<br>

> >>> --prefix=/opt/Petsc/3.4.4.extended --with-mpi=yes<br>

> >>> --with-mpi-dir=/opt/Openmpi/1.9a/ --with-debugging=no --download-mumps<br>

> >>>  --download-scalapack --download-parmetis --download-metis<br>

> >>><br>

> >>> Is this common behavior? Or is there an error in the petsc configuration<br>

> >>> I am using here?<br>

> >>><br>

> >>> Best,<br>

> >>> Gunnar<br>

> >>><br>

> >><br>

> >><br>

> ><br>

><br>

><br>

><br>

<br>

</div></div></blockquote></div><br></div>