profiling PETSc code
Matt Funk
mafunk at nmsu.edu
Tue Aug 15 13:39:31 CDT 2006
On Tuesday 15 August 2006 11:52, Barry Smith wrote:
> > MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00
>
> ^^^^^
> balance
>
> Hmmm, I would guess that the matrix entries are not so well balanced?
> One process takes 1.4 times as long for the triangular solves as the other
> so either one matrix has many more entries or one processor is slower then
> the other.
>
> Barry
Well it would seem that way at first, but i don't know how that could be since
i allocate an exactly equal amount of points on both processor (see previous
email).
Further i used the -mat_view_info option. Here is what it gives me:
...
Matrix Object:
type=mpiaij, rows=119808, cols=119808
[1] PetscCommDuplicateUsing internal PETSc communicator 91 168
[1] PetscCommDuplicateUsing internal PETSc communicator 91 168
...
...
Matrix Object:
type=seqaij, rows=59904, cols=59904
total: nonzeros=407400, allocated nonzeros=407400
not using I-node routines
Matrix Object:
type=seqaij, rows=59904, cols=59904
total: nonzeros=407400, allocated nonzeros=407400
not using I-node routines
...
So to me it look s well split up. Is there anything else that somebody can
think of. The machine i am running on is all same processors.
By the way, i am not sure if the '[1] PetscCommDuplicateUsing internal PETSc
communicator 91 168' is something i need to worry about??
mat
>
> On Tue, 15 Aug 2006, Matt Funk wrote:
> > Hi Matt,
> >
> > sorry for the delay since the last email, but there were some other
> > things i needed to do.
> >
> > Anyway, I hope that maybe I can get some more help from you guys with
> > respect to the loadimbalance problem i have. Here is the situtation:
> > I run my code on 2 procs. I profile my KSPSolve call and here is what i
> > get:
> >
> > ...
> >
> > --- Event Stage 2: Stage 2 of ChomboPetscInterface
> >
> > VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 0.0e+00
> > 2.0e+04 2 8 0 0 56 7 8 0 0 56 245
> > VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 0.0e+00
> > 1.6e+04 3 7 0 0 44 13 7 0 0 44 95
> > VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> > VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> > VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 0.0e+00
> > 0.0e+00 1 7 0 0 0 3 7 0 0 0 465
> > VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 0.0e+00
> > 0.0e+00 0 3 0 0 0 2 3 0 0 0 527
> > VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 2.1e+04
> > 0.0e+00 0 0100100 0 0 0100100 0 0
> > VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 1 0 0 0 0 4 0 0 0 0 0
> > MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 2.1e+04
> > 0.0e+00 5 32100100 0 25 32100100 0 361
> > MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00
> > 0.0e+00 10 43 0 0 0 47 43 0 0 0 214
> > MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 0.0e+00
> > 0.0e+00 0 0 0 0 0 0 0 0 0 0 110
> > MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 2.1e+04
> > 3.6e+04 21100100100100 100100100100100 278
> > PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 0.0e+00
> > 3.0e+00 0 0 0 0 0 0 0 0 0 0 75
> > PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 0.0e+00
> > 3.0e+00 0 0 0 0 0 0 0 0 0 0 61
> > PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 0.0e+00
> > 0.0e+00 10 43 0 0 0 49 43 0 0 0 207
> > -------------------------------------------------------------------------
> >-----------------------------------------------
> >
> > ...
> >
> >
> > Some things to note are the following:
> > I allocate my vector as:
> > VecCreateMPI(PETSC_COMM_WORLD, //communicator
> > a_totallocal_numPoints[a_thisproc], //local points on this proc
> > a_totalglobal_numPoints, //total number of global points
> > &m_globalRHSVector); //the vector to be created
> >
> > where the vector a_totallocal_numPoints is :
> > a_totallocal_numPoints: 59904 59904
> >
> > The matrix is allocated as:
> > m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator
> > a_totallocal_numPoints[a_thisproc], //total number of local rows
> > (that is rows residing on this proc
> > a_totallocal_numPoints[a_thisproc], //total number of columns
> > corresponding to local part of parallel vector
> > a_totalglobal_numPoints, //number of global rows
> > a_totalglobal_numPoints, //number of global columns
> > PETSC_NULL,
> > a_NumberOfNZPointsInDiagonalMatrix,
> > PETSC_NULL,
> > a_NumberOfNZPointsInOffDiagonalMatrix,
> > &m_globalMatrix);
> >
> > With the info option i checked and there is no extra mallocs at all.
> > My problems setup is symmetric so it seems that everything is set up so
> > that it should be essentially perfectly balanced. However, the numbers
> > given above certainly do not reflect that.
> >
> > However, the in all other parts of my code (except the PETSc call), i get
> > the expected, almost perfect loadbalance.
> >
> > Is there anything that i am overlooking? Any help is greatly appreciated.
> >
> > thanks
> > mat
> >
> > On Wednesday 02 August 2006 16:21, Matt Funk wrote:
> >> Hi Matt,
> >>
> >> It could be a bad load imbalance because i don't let PETSc decide. I
> >> need to fix that anyway, so i think i'll try that first and then let you
> >> know. Thanks though for the quick response and helping me to interpret
> >> those numbers ...
> >>
> >>
> >> mat
> >>
> >> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote:
> >>> On 8/2/06, Matt Funk <mafunk at nmsu.edu> wrote:
> >>>> Hi Matt,
> >>>>
> >>>> thanks for all the help so far. The -info option is really very
> >>>> helpful. So i think i straightened the actual errors out. However, now
> >>>> i am back to the original question i had. That is why it takes so much
> >>>> longer on 4 procs than on 1 proc.
> >>>
> >>> So you have a 1.5 load imbalance for MatMult(), which probably cascades
> >>> to give the 133! load imbalance for VecDot(). You probably have either:
> >>>
> >>> 1) VERY bad laod imbalance
> >>>
> >>> 2) a screwed up network
> >>>
> >>> 3) bad contention on the network (loaded cluster)
> >>>
> >>> Can you help us narrow this down?
> >>>
> >>>
> >>> Matt
> >>>
> >>>> I profiled the KSPSolve(...) as stage 2:
> >>>>
> >>>> For 1 proc i have:
> >>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface
> >>>>
> >>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474
> >>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00
> >>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214
> >>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173
> >>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32
> >>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22
> >>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00
> >>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45
> >>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21
> >>>>
> >>>>
> >>>> for 4 procs i have :
> >>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface
> >>>>
> >>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00
> >>>> 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1
> >>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00
> >>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133
> >>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410
> >>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0
> >>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0
> >>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00
> >>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0
> >>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83
> >>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21
> >>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00
> >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00
> >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00
> >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00
> >>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1
> >>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00
> >>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6
> >>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00
> >>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00
> >>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28
> >>>> ----------------------------------------------------------------------
> >>>>- -- -----------------------------------------------
> >>>>
> >>>> Now if i understand it right, all these calls summarize all calls
> >>>> between the pop and push commands. That would mean that the majority
> >>>> of the time is spend in the MatMult and in within that the
> >>>> VecScatterBegin and VecScatterEnd commands (if i understand it right).
> >>>>
> >>>> My problem size is really small. So i was wondering if the problem
> >>>> lies in that (namely that the major time is simply spend communicating
> >>>> between processors, or whether there is still something wrong with how
> >>>> i wrote the code?)
> >>>>
> >>>>
> >>>> thanks
> >>>> mat
> >>>>
> >>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote:
> >>>>> On 8/1/06, Matt Funk <mafunk at nmsu.edu> wrote:
> >>>>>> Actually the errors occur on my calls to a PETSc functions after
> >>>>>> calling PETSCInitialize.
> >>>>>
> >>>>> Yes, it is the error I pointed out in the last message.
> >>>>>
> >>>>> Matt
> >>>>>
> >>>>>> mat
More information about the petsc-users
mailing list