[petsc-users] problem of running jobs on cluster

Wen Jiang jiangwen84 at gmail.com
Mon Oct 24 15:37:19 CDT 2011


Hi guys,

I reported this problem a few days ago but I still cannot get it fixed.
Right now I am learning how to debug the parallel code. And I just want to
get some suggestions before I figure out how the debugger works.

This is just a big run of my own fem code, which has almost the same
structure as the ex3 in ksp examples. This code ( the largest dof I used is
around 65,000 ) is running totally fine on one compute node with any number
of processes.  And the code with smaller dof ( less than 5000) is also
working fine on more than one compute node. However, I am encountering a
problem when I tries to run a large job ( for example, dof = 10,000 ) on two
compute nodes.

The problem is that my code will get stuck at the MatAssemblyEnd() stage. I
use the option -info to print information about the code and find that only
some of the processes gives the MatAssemblyEnd_SeqAIJ() information and thus
the code gets stuck there.

I have several questions here,

1. In ex3, the comments said that the matrix is intentionally laid out
across processors differently from the way it is assembled. As far as I
understand, this means that the MatSetValues() will insert the values to
different processors.( am I correct?). Since generating the entries on the
'wrong' process is expensive, I am just wondering whether there is a better
way to do it especially for the assembly the global stiffness matrix in FEM.
( In my code, the MatSetValues will add a 64 by 64 element stiffness matrix
every time )

2. Since my code (dofs around 10,000 ) is working fine on single node but
get stuck on two nodes, I am guessing that might be due to the large chuck
of data which needs to be communicated between different nodes in the stage
of MatAssembly ? Will the data communication be slower between different
nodes than within single node?

I appreciate any of your suggestion and I will also keep working on the
debugging.

Thanks,
Wen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111024/533dbd08/attachment.htm>


More information about the petsc-users mailing list