profiling PETSc code

Matthew Knepley knepley at gmail.com
Wed Aug 2 16:50:41 CDT 2006


On 8/2/06, Matt Funk <mafunk at nmsu.edu> wrote:
> Hi Matt,
>
> thanks for all the help so far. The -info option is really very helpful. So i
> think i straightened the actual errors out. However, now i am back to the
> original question i had. That is why it takes so much longer on 4 procs than
> on 1 proc.

So you have a 1.5 load imbalance for MatMult(), which probably cascades to
give the 133! load imbalance for VecDot(). You probably have either:

  1) VERY bad laod imbalance

  2) a screwed up network

  3) bad contention on the network (loaded cluster)

Can you help us narrow this down?


   Matt

> I profiled the KSPSolve(...) as stage 2:
>
> For 1 proc i have:
> --- Event Stage 2: Stage 2 of ChomboPetscInterface
>
> VecDot              4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 18  0  0  0   2 18  0  0  0   474
> VecNorm             8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00
> 4.0e+03  1 36  0  0 28   7 36  0  0 33   214
> VecAYPX             4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 18  0  0  0   5 18  0  0  0   173
> MatMult             4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  9  0  0  0  12  9  0  0  0    32
> MatSolve            8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  3 18  0  0  0  36 18  0  0  0    22
> KSPSolve            4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 0.0e+00
> 1.2e+04  7100  0  0 84  97100  0  0100    45
> PCApply             8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  3 18  0  0  0  38 18  0  0  0    21
>
>
> for 4 procs i have :
> --- Event Stage 2: Stage 2 of ChomboPetscInterface
>
> VecDot              4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 0.0e+00
> 4.0e+03  8 18  0  0  5   9 18  0  0 14     1
> VecNorm             8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 0.0e+00
> 8.0e+03  0 36  0  0 10   0 36  0  0 29   133
> VecSet              8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAYPX             4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 0.0e+00
> 0.0e+00  0 18  0  0  0   0 18  0  0  0   410
> VecScatterBegin     4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 38  0  0  0  0  45  0  0  0  0     0
> VecScatterEnd       4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 37  0  0  0  0  44  0  0  0  0     0
> MatMult             4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 0.0e+00
> 0.0e+00 75  9  0  0  0  89  9  0  0  0     0
> MatSolve            8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 18  0  0  0   0 18  0  0  0    83
> MatLUFactorNum         1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    21
> MatILUFactorSym        1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetup               1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve            4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 0.0e+00
> 2.8e+04 84100  0  0 34 100100  0  0100     1
> PCSetUp                1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     6
> PCSetUpOnBlocks     4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply             8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 0.0e+00
> 8.0e+03  1 18  0  0 10   1 18  0  0 29    28
> ------------------------------------------------------------------------------------------------------------------------
>
> Now if i understand it right, all these calls summarize all calls between the
> pop and push commands. That would mean that the majority of the time is spend
> in the MatMult and in within that the VecScatterBegin and VecScatterEnd
> commands (if i understand it right).
>
> My problem size is really small. So i was wondering if the problem lies in
> that (namely that the major time is simply spend communicating between
> processors, or whether there is still something wrong with how i wrote the
> code?)
>
>
> thanks
> mat
>
>
>
> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote:
> > On 8/1/06, Matt Funk <mafunk at nmsu.edu> wrote:
> > > Actually the errors occur on my calls to a PETSc functions after calling
> > > PETSCInitialize.
> >
> > Yes, it is the error I pointed out in the last message.
> >
> >    Matt
> >
> > > mat
>
>


-- 
"Failure has a thousand explanations. Success doesn't need one" -- Sir
Alec Guiness




More information about the petsc-users mailing list