[petsc-dev] bad cpu/MPI performance problem
Mark Adams
mfadams at lbl.gov
Sun Jan 8 15:04:37 CST 2023
On Sun, Jan 8, 2023 at 2:44 PM Matthew Knepley <knepley at gmail.com> wrote:
> On Sun, Jan 8, 2023 at 9:28 AM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>> Mark,
>>
>> Looks like the error checking in PetscCommDuplicate() is doing its job.
>> It is reporting an attempt to use an PETSc object constructer on a subset
>> of ranks of an MPI_Comm (which is, of course, fundamentally impossible in
>> the PETSc/MPI model)
>>
>> Note that nroots can be negative on a particular rank but
>> DMPlexLabelComplete_Internal() is collective on sf based on the comment in
>> the code below
>>
>>
>> struct _p_PetscSF {
>> ....
>> PetscInt nroots; /* Number of root vertices on current process
>> (candidates for incoming edges) */
>>
>> But the next routine calls a collective only when nroots >= 0
>>
>> static PetscErrorCode DMPlexLabelComplete_Internal(DM dm, DMLabel label,
>> PetscBool completeCells){
>> ...
>> PetscCall(PetscSFGetGraph(sfPoint, &nroots, NULL, NULL, NULL));
>> if (nroots >= 0) {
>> DMLabel lblRoots, lblLeaves;
>> IS valueIS, pointIS;
>> const PetscInt *values;
>> PetscInt numValues, v;
>>
>> /* Pull point contributions from remote leaves into local roots */
>> PetscCall(DMLabelGather(label, sfPoint, &lblLeaves));
>>
>>
>> The code is four years old? How come this problem of calling the
>> constructure on a subset of ranks hasn't come up since day 1?
>>
>
> The contract here is that it should be impossible to have nroots < 0
> (meaning the SF is not setup) on a subset of processes. Do we know that
> this is happening?
>
Can't imagine a code bug here. Very simple code.
This code does use GAMG as the coarse grid solver in a pretty extreme way.
GAMG is fairly complicated and not used on such small problems with high
parallelism.
It is conceivable that its a GAMG bug, but that is not what was going on in
my initial emal here.
Here is a run that timed out, but it should not have so I think this is the
same issue. I always have perfectly distributed grids like this.
DM Object: box 2048 MPI processes
type: plex
box in 2 dimensions:
Min/Max of 0-cells per rank: 8385/8580
Min/Max of 1-cells per rank: 24768/24960
Min/Max of 2-cells per rank: 16384/16384
Labels:
celltype: 3 strata with value/size (1 (24768), 3 (16384), 0 (8385))
depth: 3 strata with value/size (0 (8385), 1 (24768), 2 (16384))
marker: 1 strata with value/size (1 (385))
Face Sets: 1 strata with value/size (1 (381))
Defined by transform from:
DM_0x84000002_1 in 2 dimensions:
Min/Max of 0-cells per rank: 2145/2244
Min/Max of 1-cells per rank: 6240/6336
Min/Max of 2-cells per rank: 4096/4096
Labels:
celltype: 3 strata with value/size (1 (6240), 3 (4096), 0 (2145))
depth: 3 strata with value/size (0 (2145), 1 (6240), 2 (4096))
marker: 1 strata with value/size (1 (193))
Face Sets: 1 strata with value/size (1 (189))
Defined by transform from:
DM_0x84000002_2 in 2 dimensions:
Min/Max of 0-cells per rank: 561/612
Min/Max of 1-cells per rank: 1584/1632
Min/Max of 2-cells per rank: 1024/1024
Labels:
celltype: 3 strata with value/size (1 (1584), 3 (1024), 0 (561))
depth: 3 strata with value/size (0 (561), 1 (1584), 2 (1024))
marker: 1 strata with value/size (1 (97))
Face Sets: 1 strata with value/size (1 (93))
Defined by transform from:
DM_0x84000002_3 in 2 dimensions:
Min/Max of 0-cells per rank: 153/180
Min/Max of 1-cells per rank: 408/432
Min/Max of 2-cells per rank: 256/256
Labels:
celltype: 3 strata with value/size (1 (408), 3 (256), 0 (153))
depth: 3 strata with value/size (0 (153), 1 (408), 2 (256))
marker: 1 strata with value/size (1 (49))
Face Sets: 1 strata with value/size (1 (45))
Defined by transform from:
DM_0x84000002_4 in 2 dimensions:
Min/Max of 0-cells per rank: 45/60
Min/Max of 1-cells per rank: 108/120
Min/Max of 2-cells per rank: 64/64
Labels:
celltype: 3 strata with value/size (1 (108), 3 (64), 0 (45))
depth: 3 strata with value/size (0 (45), 1 (108), 2 (64))
marker: 1 strata with value/size (1 (25))
Face Sets: 1 strata with value/size (1 (21))
Defined by transform from:
DM_0x84000002_5 in 2 dimensions:
Min/Max of 0-cells per rank: 15/24
Min/Max of 1-cells per rank: 30/36
Min/Max of 2-cells per rank: 16/16
Labels:
celltype: 3 strata with value/size (1 (30), 3 (16), 0 (15))
depth: 3 strata with value/size (0 (15), 1 (30), 2 (16))
marker: 1 strata with value/size (1 (13))
Face Sets: 1 strata with value/size (1 (9))
Defined by transform from:
DM_0x84000002_6 in 2 dimensions:
Min/Max of 0-cells per rank: 6/12
Min/Max of 1-cells per rank: 9/12
Min/Max of 2-cells per rank: 4/4
Labels:
depth: 3 strata with value/size (0 (6), 1 (9), 2 (4))
celltype: 3 strata with value/size (0 (6), 1 (9), 3 (4))
marker: 1 strata with value/size (1 (7))
Face Sets: 1 strata with value/size (1 (3))
0 TS dt 0.001 time 0.
MHD 0) time = 0, Eergy= 2.3259668003585e+00 (plot ID 0)
0 SNES Function norm 5.415286407365e-03
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 245100.0 ON crusher002 CANCELLED AT
2023-01-08T15:32:43 DUE TO TIME LIMIT ***
>
> Thanks,
>
> Matt
>
>
>> On Jan 8, 2023, at 12:21 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> I am running on Crusher, CPU only, 64 cores per node with Plex/PetscFE.
>> In going up to 64 nodes, something really catastrophic is happening.
>> I understand I am not using the machine the way it was intended, but I
>> just want to see if there are any options that I could try for a quick
>> fix/help.
>>
>> In a debug build I get a stack trace on many but not all of the 4K
>> processes.
>> Alas, I am not sure why this job was terminated but every process that I
>> checked, that had an "ERROR", had this stack:
>>
>> 11:57 main *+=
>> crusher:/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data$ grep ERROR
>> slurm-245063.out |g 3160
>> [3160]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [3160]PETSC ERROR: Caught signal number 15 Terminate: Some process (or
>> the batch system) has told this process to end
>> [3160]PETSC ERROR: Try option -start_in_debugger or
>> -on_error_attach_debugger
>> [3160]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
>> https://petsc.org/release/faq/
>> [3160]PETSC ERROR: --------------------- Stack Frames
>> ------------------------------------
>> [3160]PETSC ERROR: The line numbers in the error traceback are not always
>> exact.
>> [3160]PETSC ERROR: #1 MPI function
>> [3160]PETSC ERROR: #2 PetscCommDuplicate() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/tagm.c:248
>> [3160]PETSC ERROR: #3 PetscHeaderCreate_Private() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/inherit.c:56
>> [3160]PETSC ERROR: #4 PetscSFCreate() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:65
>> [3160]PETSC ERROR: #5 DMLabelGather() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/label/dmlabel.c:1932
>> [3160]PETSC ERROR: #6 DMPlexLabelComplete_Internal() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:177
>> [3160]PETSC ERROR: #7 DMPlexLabelComplete() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:227
>> [3160]PETSC ERROR: #8 DMCompleteBCLabels_Internal() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:5301
>> [3160]PETSC ERROR: #9 DMCopyDS() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6117
>> [3160]PETSC ERROR: #10 DMCopyDisc() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6143
>> [3160]PETSC ERROR: #11 SetupDiscretization() at
>> /gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/mhd_2field.c:755
>>
>> Maybe the MPI is just getting overwhelmed*.*
>>
>> And I was able to get one run to to work (one TS with beuler), and the
>> solver performance was horrendous and I see this (attached):
>>
>> Time (sec): 1.601e+02 1.001 1.600e+02
>> VecMDot 111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00 0.0e+00
>> 1.1e+05 30 4 0 0 23 30 4 0 0 23 499
>> VecNorm 163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00 0.0e+00
>> 1.6e+05 39 2 0 0 34 39 2 0 0 34 139
>> VecNormalize 154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00 0.0e+00
>> 1.5e+05 38 2 0 0 32 38 2 0 0 32 189
>> etc,
>> KSPSolve 3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09 6.0e+01
>> 2.8e+05 72 95 45 72 58 72 95 45 72 58 4772
>>
>> Any ideas would be welcome,
>> Thanks,
>> Mark
>> <cushersolve.txt>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20230108/0c9757ff/attachment-0001.html>
More information about the petsc-dev
mailing list