[petsc-dev] bad cpu/MPI performance problem

Jed Brown jed at jedbrown.org
Sun Jan 8 15:29:24 CST 2023


nroots=-1 is like a collective setup_called=false flag, except that this SF is always in that not-set-up state for serial runs. I don't think that's a bug per so, though perhaps you'd like it to be conveyed differently.

Barry Smith <bsmith at petsc.dev> writes:

>    There is a bug in the routine DMPlexLabelComplete_Internal()! The code should definitely not have the code route around if (nroots >=0) because checking the nroots value to decide on the code route is simply nonsense (if one "knows" "by contract" that nroots is >=0 then the if () test is not needed.
>
>    The first thing to do is to fix the bug with a PetscCheck() remove the nonsensical if (nroots >=0) check and rerun you code to see what happens.
>
>   Barry
>
> Yes it is possible that in your run the nroots is always >= 0 and some MPI bug is causing the problem but this doesn't change the fact that the current code is buggy and needs to be fixed before blaming some other bug for the problem.
>
>
>
>> On Jan 8, 2023, at 4:04 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> 
>> 
>> 
>> On Sun, Jan 8, 2023 at 2:44 PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> On Sun, Jan 8, 2023 at 9:28 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>> 
>>>>   Mark,
>>>> 
>>>>   Looks like the error checking in PetscCommDuplicate() is doing its job. It is reporting an attempt to use an PETSc object constructer on a subset of ranks of an MPI_Comm (which is, of course, fundamentally impossible in the PETSc/MPI model)
>>>> 
>>>> Note that nroots can be negative on a particular rank but DMPlexLabelComplete_Internal() is collective on sf based on the comment in the code below
>>>> 
>>>> 
>>>> struct _p_PetscSF {
>>>> ....
>>>>   PetscInt     nroots;  /* Number of root vertices on current process (candidates for incoming edges) */
>>>> 
>>>> But the next routine calls a collective only when nroots >= 0 
>>>> 
>>>> static PetscErrorCode DMPlexLabelComplete_Internal(DM dm, DMLabel label, PetscBool completeCells){
>>>> ...
>>>>   PetscCall(PetscSFGetGraph(sfPoint, &nroots, NULL, NULL, NULL));
>>>>   if (nroots >= 0) {
>>>>     DMLabel         lblRoots, lblLeaves;
>>>>     IS              valueIS, pointIS;
>>>>     const PetscInt *values;
>>>>     PetscInt        numValues, v;
>>>> 
>>>>     /* Pull point contributions from remote leaves into local roots */
>>>>     PetscCall(DMLabelGather(label, sfPoint, &lblLeaves));
>>>> 
>>>> 
>>>> The code is four years old? How come this problem of calling the constructure on a subset of ranks hasn't come up since day 1? 
>>> 
>>> The contract here is that it should be impossible to have nroots < 0 (meaning the SF is not setup) on a subset of processes. Do we know that this is happening?
>> 
>> Can't imagine a code bug here. Very simple code.
>> 
>> This code does use GAMG as the coarse grid solver in a pretty extreme way.
>> GAMG is fairly complicated and not used on such small problems with high parallelism.
>> It is conceivable that its a GAMG bug, but that is not what was going on in my initial emal here.
>> 
>> Here is a run that timed out, but it should not have so I think this is the same issue. I always have perfectly distributed grids like this.
>> 
>> DM Object: box 2048 MPI processes
>>   type: plex
>> box in 2 dimensions:
>>   Min/Max of 0-cells per rank: 8385/8580
>>   Min/Max of 1-cells per rank: 24768/24960
>>   Min/Max of 2-cells per rank: 16384/16384
>> Labels:
>>   celltype: 3 strata with value/size (1 (24768), 3 (16384), 0 (8385))
>>   depth: 3 strata with value/size (0 (8385), 1 (24768), 2 (16384))
>>   marker: 1 strata with value/size (1 (385))
>>   Face Sets: 1 strata with value/size (1 (381))
>>   Defined by transform from:
>>   DM_0x84000002_1 in 2 dimensions:
>>     Min/Max of 0-cells per rank:   2145/2244
>>     Min/Max of 1-cells per rank:   6240/6336
>>     Min/Max of 2-cells per rank:   4096/4096
>>   Labels:
>>     celltype: 3 strata with value/size (1 (6240), 3 (4096), 0 (2145))
>>     depth: 3 strata with value/size (0 (2145), 1 (6240), 2 (4096))
>>     marker: 1 strata with value/size (1 (193))
>>     Face Sets: 1 strata with value/size (1 (189))
>>     Defined by transform from:
>>     DM_0x84000002_2 in 2 dimensions:
>>       Min/Max of 0-cells per rank:     561/612
>>       Min/Max of 1-cells per rank:     1584/1632
>>       Min/Max of 2-cells per rank:     1024/1024
>>     Labels:
>>       celltype: 3 strata with value/size (1 (1584), 3 (1024), 0 (561))
>>       depth: 3 strata with value/size (0 (561), 1 (1584), 2 (1024))
>>       marker: 1 strata with value/size (1 (97))
>>       Face Sets: 1 strata with value/size (1 (93))
>>       Defined by transform from:
>>       DM_0x84000002_3 in 2 dimensions:
>>         Min/Max of 0-cells per rank:       153/180
>>         Min/Max of 1-cells per rank:       408/432
>>         Min/Max of 2-cells per rank:       256/256
>>       Labels:
>>         celltype: 3 strata with value/size (1 (408), 3 (256), 0 (153))
>>         depth: 3 strata with value/size (0 (153), 1 (408), 2 (256))
>>         marker: 1 strata with value/size (1 (49))
>>         Face Sets: 1 strata with value/size (1 (45))
>>         Defined by transform from:
>>         DM_0x84000002_4 in 2 dimensions:
>>           Min/Max of 0-cells per rank:         45/60
>>           Min/Max of 1-cells per rank:         108/120
>>           Min/Max of 2-cells per rank:         64/64
>>         Labels:
>>           celltype: 3 strata with value/size (1 (108), 3 (64), 0 (45))
>>           depth: 3 strata with value/size (0 (45), 1 (108), 2 (64))
>>           marker: 1 strata with value/size (1 (25))
>>           Face Sets: 1 strata with value/size (1 (21))
>>           Defined by transform from:
>>           DM_0x84000002_5 in 2 dimensions:
>>             Min/Max of 0-cells per rank:           15/24
>>             Min/Max of 1-cells per rank:           30/36
>>             Min/Max of 2-cells per rank:           16/16
>>           Labels:
>>             celltype: 3 strata with value/size (1 (30), 3 (16), 0 (15))
>>             depth: 3 strata with value/size (0 (15), 1 (30), 2 (16))
>>             marker: 1 strata with value/size (1 (13))
>>             Face Sets: 1 strata with value/size (1 (9))
>>             Defined by transform from:
>>             DM_0x84000002_6 in 2 dimensions:
>>               Min/Max of 0-cells per rank:             6/12
>>               Min/Max of 1-cells per rank:             9/12
>>               Min/Max of 2-cells per rank:             4/4
>>             Labels:
>>               depth: 3 strata with value/size (0 (6), 1 (9), 2 (4))
>>               celltype: 3 strata with value/size (0 (6), 1 (9), 3 (4))
>>               marker: 1 strata with value/size (1 (7))
>>               Face Sets: 1 strata with value/size (1 (3))
>> 0 TS dt 0.001 time 0.
>> MHD    0) time =         0, Eergy=  2.3259668003585e+00 (plot ID 0)
>>     0 SNES Function norm 5.415286407365e-03
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> slurmstepd: error: *** STEP 245100.0 ON crusher002 CANCELLED AT 2023-01-08T15:32:43 DUE TO TIME LIMIT ***
>> 
>>  
>>> 
>>>   Thanks,
>>> 
>>>     Matt
>>>  
>>>>> On Jan 8, 2023, at 12:21 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>>>>> 
>>>>> I am running on Crusher, CPU only, 64 cores per node with Plex/PetscFE. 
>>>>> In going up to 64 nodes, something really catastrophic is happening. 
>>>>> I understand I am not using the machine the way it was intended, but I just want to see if there are any options that I could try for a quick fix/help.
>>>>> 
>>>>> In a debug build I get a stack trace on many but not all of the 4K processes. 
>>>>> Alas, I am not sure why this job was terminated but every process that I checked, that had an "ERROR", had this stack:
>>>>> 
>>>>> 11:57 main *+= crusher:/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data$ grep ERROR slurm-245063.out |g 3160
>>>>> [3160]PETSC ERROR: ------------------------------------------------------------------------
>>>>> [3160]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>>>>> [3160]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [3160]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>>> [3160]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>>>> [3160]PETSC ERROR: The line numbers in the error traceback are not always exact.
>>>>> [3160]PETSC ERROR: #1 MPI function
>>>>> [3160]PETSC ERROR: #2 PetscCommDuplicate() at /gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/tagm.c:248
>>>>> [3160]PETSC ERROR: #3 PetscHeaderCreate_Private() at /gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/inherit.c:56
>>>>> [3160]PETSC ERROR: #4 PetscSFCreate() at /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:65
>>>>> [3160]PETSC ERROR: #5 DMLabelGather() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/label/dmlabel.c:1932
>>>>> [3160]PETSC ERROR: #6 DMPlexLabelComplete_Internal() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:177
>>>>> [3160]PETSC ERROR: #7 DMPlexLabelComplete() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:227
>>>>> [3160]PETSC ERROR: #8 DMCompleteBCLabels_Internal() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:5301
>>>>> [3160]PETSC ERROR: #9 DMCopyDS() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6117
>>>>> [3160]PETSC ERROR: #10 DMCopyDisc() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6143
>>>>> [3160]PETSC ERROR: #11 SetupDiscretization() at /gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/mhd_2field.c:755
>>>>> 
>>>>> Maybe the MPI is just getting overwhelmed. 
>>>>> 
>>>>> And I was able to get one run to to work (one TS with beuler), and the solver performance was horrendous and I see this (attached):
>>>>> 
>>>>> Time (sec):           1.601e+02     1.001   1.600e+02
>>>>> VecMDot           111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00 0.0e+00 1.1e+05 30  4  0  0 23  30  4  0  0 23   499
>>>>> VecNorm           163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00 0.0e+00 1.6e+05 39  2  0  0 34  39  2  0  0 34   139
>>>>> VecNormalize      154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00 0.0e+00 1.5e+05 38  2  0  0 32  38  2  0  0 32   189
>>>>> etc,
>>>>> KSPSolve               3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09 6.0e+01 2.8e+05 72 95 45 72 58  72 95 45 72 58  4772
>>>>> 
>>>>> Any ideas would be welcome,
>>>>> Thanks,
>>>>> Mark
>>>>> <cushersolve.txt>
>>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>


More information about the petsc-dev mailing list