<div dir="ltr">Ok, I tried superlu_dist as well. Unfortunately the system seems to hang at more or less the same position. <br><br>Sadly I can not check another version of openmpi since only this version is installed on the cluster at the time (which it needs to be because of CUDA for other programmers).<div>
<br>The -info command told me that the processes were successfully started on both nodes. In the GMRES case this also leads to a clean run-through of the program.<br><br>The -log_trace tells me that the problem occurs within the numeric factorization of the matrix.<br>
<div><br></div><div> [5] 0.00311184 Event begin: MatLUFactorSym</div><div> [1] 0.0049789 Event begin: MatLUFactorSym</div><div> [3] 0.00316596 Event begin: MatLUFactorSym</div><div> [4] 0.00345397 Event begin: MatLUFactorSym</div>
<div> [0] 0.00546789 Event end: MatLUFactorSym</div><div> [0] 0.0054841 Event begin: MatLUFactorNum</div><div> [2] 0.00545907 Event end: MatLUFactorSym</div><div> [2] 0.005476 Event begin: MatLUFactorNum</div>
<div> [1] 0.00542402 Event end: MatLUFactorSym</div><div> [1] 0.00544 Event begin: MatLUFactorNum</div><div> [4] 0.00369906 Event end: MatLUFactorSym</div><div> [4] 0.00372505 Event begin: MatLUFactorNum</div>
<div> [3] 0.00371909 Event end: MatLUFactorSym</div><div> [3] 0.00374603 Event begin: MatLUFactorNum</div><div> [5] 0.00367594 Event end: MatLUFactorSym</div><div> [5] 0.00370193 Event begin: MatLUFactorNum<br>
<br>Any hints?<br><br><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-06-25 17:17 GMT+02:00 Satish Balay <span dir="ltr"><<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Suggest running the non-mumps case with -log_summary [to confirm that<br>
'-np 6' is actually used in both cases]<br>
<br>
Secondly - you can try a 'release' version of openmpi or mpich and see<br>
if that works. [I don't see a mention of openmpi-1.9a on the website]<br>
<br>
Also you can try -log_trace to see where its hanging [or figure out how<br>
to run code in debugger on this cluster]. But that might not help in<br>
figuring out the solution to the hang..<br>
<span class="HOEnZb"><font color="#888888"><br>
Satish<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Wed, 25 Jun 2014, Matthew Knepley wrote:<br>
<br>
> On Wed, Jun 25, 2014 at 7:09 AM, Gunnar Jansen <<a href="mailto:jansen.gunnar@gmail.com">jansen.gunnar@gmail.com</a>><br>
> wrote:<br>
><br>
> > You are right about the queuing system. The job is submitted with a PBS<br>
> > script specifying the number of nodes/processors. On the cluster petsc is<br>
> > configured in a module environment which sets the appropriate flags for<br>
> > compilers/rules etc.<br>
> ><br>
> > The same exact job script on the same exact nodes with a standard krylov<br>
> > method does not give any trouble but executes nicely on all processors (and<br>
> > also give the correct result).<br>
> ><br>
> > Therefore my suspicion is a missing flag in the mumps interface. Is this<br>
> > maybe rather a topic for the mumps-dev team?<br>
> ><br>
><br>
> I doubt this. The whole point of MPI is to shield code from these details.<br>
><br>
> Can you first try this system with SuperLU_dist?<br>
<br>
><br>
> Thanks,<br>
><br>
> MAtt<br>
><br>
><br>
> > Best, Gunnar<br>
> ><br>
> ><br>
> ><br>
> > 2014-06-25 15:52 GMT+02:00 Dave May <<a href="mailto:dave.mayhem23@gmail.com">dave.mayhem23@gmail.com</a>>:<br>
> ><br>
> > This sounds weird.<br>
> >><br>
> >> The launch line you provided doesn't include any information regarding<br>
> >> how many processors (nodes/nodes per core to use). I presume you are using<br>
> >> a queuing system. My guess is that there could be an issue with either (i)<br>
> >> your job script, (ii) the configuration of the job scheduler on the<br>
> >> machine, or (iii) the mpi installation on the machine.<br>
> >><br>
> >> Have you been able to successfully run other petsc (or any mpi) codes<br>
> >> with the same launch options (2 nodes, 3 procs per node)?<br>
> >><br>
> >> Cheers.<br>
> >> Dave<br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> On 25 June 2014 15:44, Gunnar Jansen <<a href="mailto:jansen.gunnar@gmail.com">jansen.gunnar@gmail.com</a>> wrote:<br>
> >><br>
> >>> Hi,<br>
> >>><br>
> >>> i try to solve a problem in parallel with MUMPS as the direct solver. As<br>
> >>> long as I run the program on only 1 node with 6 processors everything works<br>
> >>> fine! But using 2 nodes with 3 processors each gets mumps stuck in the<br>
> >>> factorization.<br>
> >>><br>
> >>> For the purpose of testing I run the ex2.c on a resolution of 100x100<br>
> >>> (which is of course way to small for a direct solver in parallel).<br>
> >>><br>
> >>> The code is run with :<br>
> >>> mpirun ./ex2 -on_error_abort -pc_type lu -pc_factor_mat_solver_package<br>
> >>> mumps -ksp_type preonly -log_summary -options_left -m 100 -n 100<br>
> >>> -mat_mumps_icntl_4 3<br>
> >>><br>
> >>> The petsc-configuration I used is:<br>
> >>> --prefix=/opt/Petsc/3.4.4.extended --with-mpi=yes<br>
> >>> --with-mpi-dir=/opt/Openmpi/1.9a/ --with-debugging=no --download-mumps<br>
> >>> --download-scalapack --download-parmetis --download-metis<br>
> >>><br>
> >>> Is this common behavior? Or is there an error in the petsc configuration<br>
> >>> I am using here?<br>
> >>><br>
> >>> Best,<br>
> >>> Gunnar<br>
> >>><br>
> >><br>
> >><br>
> ><br>
><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div>