Stalling once linear system becomes a certain size

Mon Apr 7 09:42:27 CDT 2008

Matt,

>  > I'm using PETSC_COMM_SELF in order to construct the same matrix
>  > on each processor (and solve the system with a different
>  > right-hand side vector on each processor),

So its a bunch of similar sequential solves - over PETSC_COMM_SELF. So
a seq solve on a given mpi-thread should not affect another seq solve
on another thread..

Satish

On Mon, 7 Apr 2008, Matthew Knepley wrote:

> It sounds like he is saying that the iterative solvers fail to
> converge. It could be
> that the systems become much more ill-conditioned. When solving anything,
> first use LU
> 
>   -ksp_type preonly -pc_type lu
> 
> to determine if the system is consistent. Then use something simple, like
> GMRES by itself
> 
>   -ksp_type gmres -pc_type none -ksp_monitor_singular_value
> -ksp_gmres_restart 500
> 
> to get an idea of the condition number. Then start trying other solvers and PCs.
> 
>    Matt
> 
> On Mon, Apr 7, 2008 at 8:28 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > On Mon, 7 Apr 2008, David Knezevic wrote:
> >
> >  > Hello,
> >  >
> >  > I am trying to run a PETSc code on a parallel machine (it may be relevant that
> >  > each node contains four AMD Opteron Quad-Core 64-bit processors (16 cores in
> >  > all) as an SMP unit with 32GB of memory) and I'm observing some behaviour I
> >  > don't understand.
> >  >
> >  > I'm using PETSC_COMM_SELF in order to construct the same matrix on each
> >  > processor (and solve the system with a different right-hand side vector on
> >  > each processor), and when each linear system is around 315x315 (block-sparse),
> >  > then each linear system is solved very quickly on each processor (approx
> >  > 7x10^{-4} seconds), but when I increase the size of the linear system to
> >  > 350x350 (or larger), the linear solves completely stall. I've tried a number
> >  > of different solvers and preconditioners, but nothing seems to help. Also,
> >  > this code has worked very well on other machines, although the machines I have
> >  > used it on before have not had this architecture in which each node is an SMP
> >  > unit. I was wondering if you have observed this kind of issue before?
> >  >
> >  > I'm using PETSc 2.3.3, compiled with the Intel 10.1 compiler.
> >
> >  I would sugest running the code in a debugger to determine the exact
> >  location where the stall happens [with the minimum number of procs]
> >
> >  mpiexec -n 4 ./exe -start_in_debugger
> >
> >  By default the above tries to open xterms on the localhost - so to get
> >  this working on the cluster - you might need proper
> >  ssh-x11-portforwarding setup to the node, and then use the extra
> >  command line option '-display'
> >
> >  [when the job kinda hangs - I would do ctrl-c in gdb and look at the
> >  stack trace on each mpi-thread]
> >
> >  Satish
> >
> >
> 
> 
> 
>