[petsc-dev] bad cpu/MPI performance problem

Mark Adams mfadams at lbl.gov
Sun Jan 8 15:32:25 CST 2023


On Sun, Jan 8, 2023 at 4:13 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    There is a bug in the routine DMPlexLabelComplete_Internal()! The code
> should definitely not have the code route around if (nroots >=0) because
> checking the nroots value to decide on the code route is simply nonsense
> (if one "knows" "by contract" that nroots is >=0 then the if () test is not
> needed.
>
>    The first thing to do is to fix the bug with a PetscCheck() remove the
> nonsensical if (nroots >=0) check and rerun you code to see what happens.
>

This does not fix the bug right? It just fails cleanly, right?

I do have lots of empty processors in the first GAMG coarse grid. I just
saw that the first GAMG coarse grid reduces the processor count to 4, from
4K.
This is one case where coarse grids could be repartitioned, for once that
can be used.

Do you have a bug fix suggestion for me to try?

Thanks


>   Barry
>
> Yes it is possible that in your run the nroots is always >= 0 and some MPI
> bug is causing the problem but this doesn't change the fact that the
> current code is buggy and needs to be fixed before blaming some other bug
> for the problem.
>
>
>
> On Jan 8, 2023, at 4:04 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Sun, Jan 8, 2023 at 2:44 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Sun, Jan 8, 2023 at 9:28 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   Mark,
>>>
>>>   Looks like the error checking in PetscCommDuplicate() is doing its
>>> job. It is reporting an attempt to use an PETSc object constructer on a
>>> subset of ranks of an MPI_Comm (which is, of course, fundamentally
>>> impossible in the PETSc/MPI model)
>>>
>>> Note that nroots can be negative on a particular rank but
>>> DMPlexLabelComplete_Internal() is collective on sf based on the comment in
>>> the code below
>>>
>>>
>>> struct _p_PetscSF {
>>> ....
>>>   PetscInt     nroots;  /* Number of root vertices on current process
>>> (candidates for incoming edges) */
>>>
>>> But the next routine calls a collective only when nroots >= 0
>>>
>>> static PetscErrorCode DMPlexLabelComplete_Internal(DM dm, DMLabel label,
>>> PetscBool completeCells){
>>> ...
>>>   PetscCall(PetscSFGetGraph(sfPoint, &nroots, NULL, NULL, NULL));
>>>   if (nroots >= 0) {
>>>     DMLabel         lblRoots, lblLeaves;
>>>     IS              valueIS, pointIS;
>>>     const PetscInt *values;
>>>     PetscInt        numValues, v;
>>>
>>>     /* Pull point contributions from remote leaves into local roots */
>>>     PetscCall(DMLabelGather(label, sfPoint, &lblLeaves));
>>>
>>>
>>> The code is four years old? How come this problem of calling the
>>> constructure on a subset of ranks hasn't come up since day 1?
>>>
>>
>> The contract here is that it should be impossible to have nroots < 0
>> (meaning the SF is not setup) on a subset of processes. Do we know that
>> this is happening?
>>
>
> Can't imagine a code bug here. Very simple code.
>
> This code does use GAMG as the coarse grid solver in a pretty extreme way.
> GAMG is fairly complicated and not used on such small problems with high
> parallelism.
> It is conceivable that its a GAMG bug, but that is not what was going on
> in my initial emal here.
>
> Here is a run that timed out, but it should not have so I think this is
> the same issue. I always have perfectly distributed grids like this.
>
> DM Object: box 2048 MPI processes
>   type: plex
> box in 2 dimensions:
>   Min/Max of 0-cells per rank: 8385/8580
>   Min/Max of 1-cells per rank: 24768/24960
>   Min/Max of 2-cells per rank: 16384/16384
> Labels:
>   celltype: 3 strata with value/size (1 (24768), 3 (16384), 0 (8385))
>   depth: 3 strata with value/size (0 (8385), 1 (24768), 2 (16384))
>   marker: 1 strata with value/size (1 (385))
>   Face Sets: 1 strata with value/size (1 (381))
>   Defined by transform from:
>   DM_0x84000002_1 in 2 dimensions:
>     Min/Max of 0-cells per rank:   2145/2244
>     Min/Max of 1-cells per rank:   6240/6336
>     Min/Max of 2-cells per rank:   4096/4096
>   Labels:
>     celltype: 3 strata with value/size (1 (6240), 3 (4096), 0 (2145))
>     depth: 3 strata with value/size (0 (2145), 1 (6240), 2 (4096))
>     marker: 1 strata with value/size (1 (193))
>     Face Sets: 1 strata with value/size (1 (189))
>     Defined by transform from:
>     DM_0x84000002_2 in 2 dimensions:
>       Min/Max of 0-cells per rank:     561/612
>       Min/Max of 1-cells per rank:     1584/1632
>       Min/Max of 2-cells per rank:     1024/1024
>     Labels:
>       celltype: 3 strata with value/size (1 (1584), 3 (1024), 0 (561))
>       depth: 3 strata with value/size (0 (561), 1 (1584), 2 (1024))
>       marker: 1 strata with value/size (1 (97))
>       Face Sets: 1 strata with value/size (1 (93))
>       Defined by transform from:
>       DM_0x84000002_3 in 2 dimensions:
>         Min/Max of 0-cells per rank:       153/180
>         Min/Max of 1-cells per rank:       408/432
>         Min/Max of 2-cells per rank:       256/256
>       Labels:
>         celltype: 3 strata with value/size (1 (408), 3 (256), 0 (153))
>         depth: 3 strata with value/size (0 (153), 1 (408), 2 (256))
>         marker: 1 strata with value/size (1 (49))
>         Face Sets: 1 strata with value/size (1 (45))
>         Defined by transform from:
>         DM_0x84000002_4 in 2 dimensions:
>           Min/Max of 0-cells per rank:         45/60
>           Min/Max of 1-cells per rank:         108/120
>           Min/Max of 2-cells per rank:         64/64
>         Labels:
>           celltype: 3 strata with value/size (1 (108), 3 (64), 0 (45))
>           depth: 3 strata with value/size (0 (45), 1 (108), 2 (64))
>           marker: 1 strata with value/size (1 (25))
>           Face Sets: 1 strata with value/size (1 (21))
>           Defined by transform from:
>           DM_0x84000002_5 in 2 dimensions:
>             Min/Max of 0-cells per rank:           15/24
>             Min/Max of 1-cells per rank:           30/36
>             Min/Max of 2-cells per rank:           16/16
>           Labels:
>             celltype: 3 strata with value/size (1 (30), 3 (16), 0 (15))
>             depth: 3 strata with value/size (0 (15), 1 (30), 2 (16))
>             marker: 1 strata with value/size (1 (13))
>             Face Sets: 1 strata with value/size (1 (9))
>             Defined by transform from:
>             DM_0x84000002_6 in 2 dimensions:
>               Min/Max of 0-cells per rank:             6/12
>               Min/Max of 1-cells per rank:             9/12
>               Min/Max of 2-cells per rank:             4/4
>             Labels:
>               depth: 3 strata with value/size (0 (6), 1 (9), 2 (4))
>               celltype: 3 strata with value/size (0 (6), 1 (9), 3 (4))
>               marker: 1 strata with value/size (1 (7))
>               Face Sets: 1 strata with value/size (1 (3))
> 0 TS dt 0.001 time 0.
> MHD    0) time =         0, Eergy=  2.3259668003585e+00 (plot ID 0)
>     0 SNES Function norm 5.415286407365e-03
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> slurmstepd: error: *** STEP 245100.0 ON crusher002 CANCELLED AT
> 2023-01-08T15:32:43 DUE TO TIME LIMIT ***
>
>
>
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> On Jan 8, 2023, at 12:21 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> I am running on Crusher, CPU only, 64 cores per node with Plex/PetscFE.
>>> In going up to 64 nodes, something really catastrophic is happening.
>>> I understand I am not using the machine the way it was intended, but I
>>> just want to see if there are any options that I could try for a quick
>>> fix/help.
>>>
>>> In a debug build I get a stack trace on many but not all of the 4K
>>> processes.
>>> Alas, I am not sure why this job was terminated but every process that I
>>> checked, that had an "ERROR", had this stack:
>>>
>>> 11:57 main *+=
>>> crusher:/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data$ grep ERROR
>>> slurm-245063.out |g 3160
>>> [3160]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3160]PETSC ERROR: Caught signal number 15 Terminate: Some process (or
>>> the batch system) has told this process to end
>>> [3160]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> [3160]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
>>> https://petsc.org/release/faq/
>>> [3160]PETSC ERROR: ---------------------  Stack Frames
>>> ------------------------------------
>>> [3160]PETSC ERROR: The line numbers in the error traceback are not
>>> always exact.
>>> [3160]PETSC ERROR: #1 MPI function
>>> [3160]PETSC ERROR: #2 PetscCommDuplicate() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/tagm.c:248
>>> [3160]PETSC ERROR: #3 PetscHeaderCreate_Private() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/inherit.c:56
>>> [3160]PETSC ERROR: #4 PetscSFCreate() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:65
>>> [3160]PETSC ERROR: #5 DMLabelGather() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/label/dmlabel.c:1932
>>> [3160]PETSC ERROR: #6 DMPlexLabelComplete_Internal() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:177
>>> [3160]PETSC ERROR: #7 DMPlexLabelComplete() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:227
>>> [3160]PETSC ERROR: #8 DMCompleteBCLabels_Internal() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:5301
>>> [3160]PETSC ERROR: #9 DMCopyDS() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6117
>>> [3160]PETSC ERROR: #10 DMCopyDisc() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6143
>>> [3160]PETSC ERROR: #11 SetupDiscretization() at
>>> /gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/mhd_2field.c:755
>>>
>>> Maybe the MPI is just getting overwhelmed*.*
>>>
>>> And I was able to get one run to to work (one TS with beuler), and the
>>> solver performance was horrendous and I see this (attached):
>>>
>>> Time (sec):           1.601e+02     1.001   1.600e+02
>>> VecMDot           111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00
>>> 0.0e+00 1.1e+05 30  4  0  0 23  30  4  0  0 23   499
>>> VecNorm           163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00
>>> 0.0e+00 1.6e+05 39  2  0  0 34  39  2  0  0 34   139
>>> VecNormalize      154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00
>>> 0.0e+00 1.5e+05 38  2  0  0 32  38  2  0  0 32   189
>>> etc,
>>> KSPSolve               3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09
>>> 6.0e+01 2.8e+05 72 95 45 72 58  72 95 45 72 58  4772
>>>
>>> Any ideas would be welcome,
>>> Thanks,
>>> Mark
>>> <cushersolve.txt>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20230108/4ffdb25f/attachment-0001.html>


More information about the petsc-dev mailing list