[petsc-users] Sporadic MPI_Allreduce() called in different locations on larger core counts

Mark Lohry mlohry at gmail.com
Sun Aug 18 13:19:19 CDT 2019


Barry, thanks for your suggestion to do the serial coloring on the mesh
itself / block size 1 case first, and then manually color the blocks. Works
like a charm. The 2 million cell case is small enough to create the sparse
system on one process and color it in about a second.

On Sun, Aug 11, 2019 at 9:41 PM Mark Lohry <mlohry at gmail.com> wrote:

> So the parallel JP runs just as proportionally slow in serial as it does
> in parallel.
>
> valgrind --tool=callgrind shows essentially 100% of the runtime in
> jp.c:255-262, within the larger loop commented
> /* pass two -- color it by looking at nearby vertices and building a mask
> */
>
> for (j=0;j<ncols;j++) {
> if (seen[cols[j]] != cidx) {
> bidx++;
> seen[cols[j]] = cidx;
> idxbuf[bidx] = cols[j];
> distbuf[bidx] = dist+1;
> }
> }
>
> I'll dig into how this algorithm is supposed to work, but anything obvious
> in there? It kinda feels like something is doing something N^2 or worse
> when it doesn't need to be.
>
> On Sun, Aug 11, 2019 at 3:47 PM Mark Lohry <mlohry at gmail.com> wrote:
>
>> Sorry, forgot to reply to the mailing list.
>>
>> where does your matrix come from? A mesh? Structured, unstructured, a
>>> graph, something else? What type of discretization?
>>
>>
>> Unstructured tetrahedral mesh (CGNS, I can give links to the files if
>> that's of interest), the discretization is arbitrary order discontinuous
>> galerkin for compressible navier-stokes. 5 coupled equations x 10 nodes per
>> element for this 2nd order case to give the 50x50 blocks. Each tet cell
>> dependent on neighbors, so for tets 4 extra off-diagonal blocks per cell.
>>
>> I would expect one could exploit the large block size here in computing
>> the coloring -- the underlying mesh is 2M nodes with the same connectivity
>> as a standard cell-centered finite volume method.
>>
>>
>>
>> On Sun, Aug 11, 2019 at 2:12 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>>
>>>
>>>   These are due to attempting to copy the entire matrix to one process
>>> and do the sequential coloring there. Definitely won't work for larger
>>> problems, we'll
>>>
>>>    need to focus on
>>>
>>> 1) having useful parallel coloring and
>>> 2) maybe using an alternative way to determine the coloring:
>>>
>>>      where does your matrix come from? A mesh? Structured, unstructured,
>>> a graph, something else? What type of discretization?
>>>
>>>    Barry
>>>
>>>
>>> > On Aug 11, 2019, at 10:21 AM, Mark Lohry <mlohry at gmail.com> wrote:
>>> >
>>> > On the very large case, there does appear to be some kind of overflow
>>> ending up with an attempt to allocate too much memory in MatFDColorCreate,
>>> even with --with-64-bit-indices. Full terminal output here:
>>> >
>>> https://raw.githubusercontent.com/mlohry/petsc_miscellany/master/slurm-3451378.out
>>> >
>>> > In particular:
>>> > PETSC ERROR: Memory requested 1036713571771129344
>>> >
>>> > Log filename here:
>>> > https://github.com/mlohry/petsc_miscellany/blob/master/petsclogfile.0
>>> >
>>> > On Sun, Aug 11, 2019 at 9:49 AM Mark Lohry <mlohry at gmail.com> wrote:
>>> > Hi Barry, I made a minimum example comparing the colorings on a very
>>> small case. You'll need to unzip the jacobian_sparsity.tgz to run it.
>>> >
>>> > https://github.com/mlohry/petsc_miscellany
>>> >
>>> > This is sparse block system with 50x50 block sizes, ~7,680 blocks.
>>> Comparing the coloring types sl, lf, jp, id, greedy, I get these timings
>>> wallclock, running with -np 16:
>>> >
>>> > SL: 1.5s
>>> > LF: 1.3s
>>> > JP: 29s !
>>> > ID: 1.4s
>>> > greedy: 2s
>>> >
>>> > As far as I'm aware, JP is the only parallel coloring implemented? It
>>> is looking as though I'm simply running out of memory with the sequential
>>> methods (I should apologize to my cluster admin for chewing up 10TB and
>>> crashing...).
>>> >
>>> > On this small problem JP is taking 30 seconds wallclock, but that time
>>> grows exponentially with larger problems (last I tried it, I killed the job
>>> after 24 hours of spinning.)
>>> >
>>> > Also as I mentioned, the "greedy" method appears to be producing an
>>> invalid coloring for me unless I also specify weights "lexical". But
>>> "-mat_coloring_test" doesn't complain. I'll have to make a different
>>> example to actually show it's an invalid coloring.
>>> >
>>> > Thanks,
>>> > Mark
>>> >
>>> >
>>> >
>>> > On Sat, Aug 10, 2019 at 4:38 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> >   Mark,
>>> >
>>> >    Would you be able to cook up an example (or examples)  that
>>> demonstrate the problem (or problems)  and how to run it? If you send it to
>>> us and we can reproduce the problem then we'll fix it. If need be you can
>>> send large matrices to petsc-maint at mcs.anl.gov don't send them to
>>> petsc-users since it will reject large files.
>>> >
>>> >    Barry
>>> >
>>> >
>>> > > On Aug 10, 2019, at 1:56 PM, Mark Lohry <mlohry at gmail.com> wrote:
>>> > >
>>> > > Thanks Barry, been trying all of the above. I think I've homed in on
>>> it to an out-of-memory and/or integer overflow inside MatColoringApply.
>>> Which makes some sense since I only have a sequential coloring algorithm
>>> working...
>>> > >
>>> > > Is anyone out there using coloring in parallel? I still have the
>>> same previously mentioned issues with MATCOLORINGJP (on small problems
>>> takes upwards of 30 minutes to run) which as far as I can see is the only
>>> "parallel" implementation. MATCOLORINGSL and MATCOLORINGID both work on
>>> less large problems, MATCOLORINGGREEDY works on less large problems if and
>>> only if I set weight type to MAT_COLORING_WEIGHT_LEXICAL, and all 3 are
>>> failing on larger problems.
>>> > >
>>> > > On Tue, Aug 6, 2019 at 9:36 AM Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> > >
>>> > >   There is also
>>> > >
>>> > > $ ./configure --help | grep color
>>> > >   --with-is-color-value-type=<char,short>
>>> > >        char, short can store 256, 65536 colors  current: short
>>> > >
>>> > > I can't imagine you have over 65 k colors but something to check
>>> > >
>>> > >
>>> > > > On Aug 6, 2019, at 8:19 AM, Mark Lohry <mlohry at gmail.com> wrote:
>>> > > >
>>> > > > My first guess is that the code is getting integer overflow
>>> somewhere. 25 billion is well over the 2 billion that 32 bit integers can
>>> hold.
>>> > > >
>>> > > > Mine as well -- though in later tests I have the same issue when
>>> using --with-64-bit-indices. Ironically I had removed that flag at some
>>> point because the coloring / index set was using a serious chunk of total
>>> memory on medium sized problems.
>>> > >
>>> > >   Understood
>>> > >
>>> > > >
>>> > > > Questions on the petsc internals there though: Are matrices
>>> indexed with two integers (i,j) so the max matrix dimension is (int limit)
>>> x (int limit) or a single integer so the max dimension is sqrt(int limit)?
>>> > > > Also I was operating under the assumption the 32 bit limit should
>>> only constrain per-process problem sizes (25B over 400 processes giving 62M
>>> non-zeros per process), is that not right?
>>> > >
>>> > >    It is mostly right but may not be right for everything in PETSc.
>>> For example I don't know about the MatFD code
>>> > >
>>> > >    Since using a debugger is not practical for large code counts to
>>> find the point the two processes diverge you can try
>>> > >
>>> > > -log_trace
>>> > >
>>> > > or
>>> > >
>>> > > -log_trace filename
>>> > >
>>> > > in the second case it will generate one file per core called
>>> filename.%d  note it will produce a lot of output
>>> > >
>>> > >   Good luck
>>> > >
>>> > >
>>> > >
>>> > > >
>>> > > >    We are adding more tests to nicely handle integer overflow but
>>> it is not easy since it can occur in so many places
>>> > > >
>>> > > > Totally understood. I know the pain of only finding an overflow
>>> bug after days of waiting in a cluster queue for a big job.
>>> > > >
>>> > > > We urge you to upgrade.
>>> > > >
>>> > > > I'll do that today and hope for the best. On first tests on
>>> 3.11.3, I still have a couple issues with the coloring code:
>>> > > >
>>> > > > * I am still getting the nasty hangs with MATCOLORINGJP mentioned
>>> here:
>>> https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2017-October/033746.html
>>> > > > * MatColoringSetType(coloring, MATCOLORINGGREEDY);  this produces
>>> a wrong jacobian unless I also set MatColoringSetWeightType(coloring,
>>> MAT_COLORING_WEIGHT_LEXICAL);
>>> > > > * MATCOLORINGMIS mentioned in the documentation doesn't seem to
>>> exist.
>>> > > >
>>> > > > Thanks,
>>> > > > Mark
>>> > > >
>>> > > > On Tue, Aug 6, 2019 at 8:56 AM Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> > > >
>>> > > >    My first guess is that the code is getting integer overflow
>>> somewhere. 25 billion is well over the 2 billion that 32 bit integers can
>>> hold.
>>> > > >
>>> > > >    We urge you to upgrade.
>>> > > >
>>> > > >    Regardless for problems this large you likely need  the
>>> ./configure option --with-64-bit-indices
>>> > > >
>>> > > >    We are adding more tests to nicely handle integer overflow but
>>> it is not easy since it can occur in so many places
>>> > > >
>>> > > >    Hopefully this will resolve your problem with large process
>>> counts
>>> > > >
>>> > > >    Barry
>>> > > >
>>> > > >
>>> > > > > On Aug 6, 2019, at 7:43 AM, Mark Lohry via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>> > > > >
>>> > > > > I'm running some larger cases than I have previously with a
>>> working code, and I'm running into failures I don't see on smaller cases.
>>> Failures are on 400 cores, ~100M unknowns, 25B non-zero jacobian entries.
>>> Runs successfully on half size case on 200 cores.
>>> > > > >
>>> > > > > 1) The first error output from petsc is "MPI_Allreduce() called
>>> in different locations". Is this a red herring, suggesting some process
>>> failed prior to this and processes have diverged?
>>> > > > >
>>> > > > > 2) I don't think I'm running out of memory -- globally at least.
>>> Slurm output shows e.g.
>>> > > > > Memory Utilized: 459.15 GB (estimated maximum)
>>> > > > > Memory Efficiency: 26.12% of 1.72 TB (175.78 GB/node)
>>> > > > > I did try with and without --64-bit-indices.
>>> > > > >
>>> > > > > 3) The debug traces seem to vary, see below. I *think* the
>>> failure might be happening in the vicinity of a Coloring call. I'm using
>>> MatFDColoring like so:
>>> > > > >
>>> > > > >    ISColoring    iscoloring;
>>> > > > >     MatFDColoring fdcoloring;
>>> > > > >     MatColoring   coloring;
>>> > > > >
>>> > > > >     MatColoringCreate(ctx.JPre, &coloring);
>>> > > > >     MatColoringSetType(coloring, MATCOLORINGGREEDY);
>>> > > > >
>>> > > > >    // converges stalls badly without this on small cases, don't
>>> know why
>>> > > > >     MatColoringSetWeightType(coloring,
>>> MAT_COLORING_WEIGHT_LEXICAL);
>>> > > > >
>>> > > > >    // none of these worked.
>>> > > > >     //    MatColoringSetType(coloring, MATCOLORINGJP);
>>> > > > >     // MatColoringSetType(coloring, MATCOLORINGSL);
>>> > > > >     // MatColoringSetType(coloring, MATCOLORINGID);
>>> > > > >     MatColoringSetFromOptions(coloring);
>>> > > > >
>>> > > > >     MatColoringApply(coloring, &iscoloring);
>>> > > > >     MatColoringDestroy(&coloring);
>>> > > > >     MatFDColoringCreate(ctx.JPre, iscoloring, &fdcoloring);
>>> > > > >
>>> > > > > I have had issues in the past with getting a functional coloring
>>> setup for finite difference jacobians, and the above is the only
>>> configuration I've managed to get working successfully. Have there been any
>>> significant development changes to that area of code since v3.8.3? I'll try
>>> upgrading in the mean time and hope for the best.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > Any ideas?
>>> > > > >
>>> > > > >
>>> > > > > Thanks,
>>> > > > > Mark
>>> > > > >
>>> > > > >
>>> > > > > *************************************
>>> > > > >
>>> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]"
>>> slurm-3429773.out
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (functions) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19
>>> by mlohry Tue Aug  6 06:05:02 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun --with-64-bit-indices
>>> > > > > [0]PETSC ERROR: #1 TSSetMaxSteps() line 2944 in
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: #2 TSSetMaxSteps() line 2944 in
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Invalid argument
>>> > > > > [0]PETSC ERROR: Enum value must be same on all processes,
>>> argument # 2
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19
>>> by mlohry Tue Aug  6 06:05:02 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun --with-64-bit-indices
>>> > > > > [0]PETSC ERROR: #3 TSSetExactFinalTime() line 2250 in
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process
>>> (or the batch system) has told this process to end
>>> > > > > [0]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> > > > > [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and
>>> Apple Mac OS X to find memory corruption errors
>>> > > > > [0]PETSC ERROR: likely location of problem given in stack below
>>> > > > > [0]PETSC ERROR: ---------------------  Stack Frames
>>> ------------------------------------
>>> > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are
>>> not available,
>>> > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of
>>> the function
>>> > > > > [0]PETSC ERROR:       is given.
>>> > > > > [0]PETSC ERROR: [0] PetscCommDuplicate line 130
>>> /home/mlohry/build/external/petsc/src/sys/objects/tagm.c
>>> > > > > [0]PETSC ERROR: [0] PetscHeaderCreate_Private line 34
>>> /home/mlohry/build/external/petsc/src/sys/objects/inherit.c
>>> > > > > [0]PETSC ERROR: [0] DMCreate line 36
>>> /home/mlohry/build/external/petsc/src/dm/interface/dm.c
>>> > > > > [0]PETSC ERROR: [0] DMShellCreate line 983
>>> /home/mlohry/build/external/petsc/src/dm/impls/shell/dmshell.c
>>> > > > > [0]PETSC ERROR: [0] TSGetDM line 5287
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: [0] TSSetIFunction line 1310
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: [0] TSSetExactFinalTime line 2248
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: [0] TSSetMaxSteps line 2942
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Signal received
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19
>>> by mlohry Tue Aug  6 06:05:02 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun --with-64-bit-indices
>>> > > > > [0]PETSC ERROR: #4 User provided function() line 0 in  unknown
>>> file
>>> > > > >
>>> > > > >
>>> > > > > *************************************
>>> > > > >
>>> > > > >
>>> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]"
>>> slurm-3429158.out
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (code lines) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1
>>> by mlohry Mon Aug  5 23:58:19 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #1 MatSetBlockSizes() line 7206 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: #2 MatSetBlockSizes() line 7206 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: #3 MatSetBlockSize() line 7170 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (code lines) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1
>>> by mlohry Mon Aug  5 23:58:19 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #4 VecSetSizes() line 1310 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
>>> > > > > [0]PETSC ERROR: #5 VecSetSizes() line 1310 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
>>> > > > > [0]PETSC ERROR: #6 VecCreateMPIWithArray() line 609 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
>>> > > > > [0]PETSC ERROR: #7 MatSetUpMultiply_MPIAIJ() line 111 in
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mmaij.c
>>> > > > > [0]PETSC ERROR: #8 MatAssemblyEnd_MPIAIJ() line 735 in
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>> > > > > [0]PETSC ERROR: #9 MatAssemblyEnd() line 5243 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>> Violation, probably memory access out of range
>>> > > > > [0]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> > > > > [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and
>>> Apple Mac OS X to find memory corruption errors
>>> > > > > [0]PETSC ERROR: likely location of problem given in stack below
>>> > > > > [0]PETSC ERROR: ---------------------  Stack Frames
>>> ------------------------------------
>>> > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are
>>> not available,
>>> > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of
>>> the function
>>> > > > > [0]PETSC ERROR:       is given.
>>> > > > > [0]PETSC ERROR: [0] PetscSFSetGraphLayout line 497
>>> /home/mlohry/build/external/petsc/src/vec/is/utils/pmap.c
>>> > > > > [0]PETSC ERROR: [0] GreedyColoringLocalDistanceTwo_Private line
>>> 208 /home/mlohry/build/external/petsc/src/mat/color/impls/greedy/greedy.c
>>> > > > > [0]PETSC ERROR: [0] MatColoringApply_Greedy line 559
>>> /home/mlohry/build/external/petsc/src/mat/color/impls/greedy/greedy.c
>>> > > > > [0]PETSC ERROR: [0] MatColoringApply line 357
>>> /home/mlohry/build/external/petsc/src/mat/color/interface/matcoloring.c
>>> > > > > [0]PETSC ERROR: [0] VecSetSizes line 1308
>>> /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
>>> > > > > [0]PETSC ERROR: [0] VecCreateMPIWithArray line 605
>>> /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
>>> > > > > [0]PETSC ERROR: [0] MatSetUpMultiply_MPIAIJ line 24
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mmaij.c
>>> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd_MPIAIJ line 698
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5234
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: [0] MatSetBlockSizes line 7204
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: [0] MatSetBlockSize line 7167
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Signal received
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1
>>> by mlohry Mon Aug  5 23:58:19 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #10 User provided function() line 0 in  unknown
>>> file
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > *************************
>>> > > > >
>>> > > > >
>>> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]"
>>> slurm-3429134.out
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (code lines) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n1
>>> by mlohry Mon Aug  5 23:24:23 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #1 PetscSplitOwnership() line 88 in
>>> /home/mlohry/build/external/petsc/src/sys/utils/psplit.c
>>> > > > > [0]PETSC ERROR: #2 PetscSplitOwnership() line 88 in
>>> /home/mlohry/build/external/petsc/src/sys/utils/psplit.c
>>> > > > > [0]PETSC ERROR: #3 PetscLayoutSetUp() line 137 in
>>> /home/mlohry/build/external/petsc/src/vec/is/utils/pmap.c
>>> > > > > [0]PETSC ERROR: #4 VecCreate_MPI_Private() line 489 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
>>> > > > > [0]PETSC ERROR: #5 VecCreate_MPI() line 537 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
>>> > > > > [0]PETSC ERROR: #6 VecSetType() line 51 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/interface/vecreg.c
>>> > > > > [0]PETSC ERROR: #7 VecCreateMPI() line 40 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/vmpicr.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Object is in wrong state
>>> > > > > [0]PETSC ERROR: Vec object's type is not set: Argument # 1
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n1
>>> by mlohry Mon Aug  5 23:24:23 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #8 VecGetLocalSize() line 665 in
>>> /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > **************************************
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]"
>>> slurm-3429102.out
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (code lines) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16
>>> by mlohry Mon Aug  5 22:50:12 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #1 TSSetExactFinalTime() line 2250 in
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: #2 TSSetExactFinalTime() line 2250 in
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (code lines) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16
>>> by mlohry Mon Aug  5 22:50:12 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #3 MatSetBlockSizes() line 7206 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: #4 MatSetBlockSizes() line 7206 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: #5 MatSetBlockSize() line 7170 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
>>> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations
>>> (code lines) on different processors
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16
>>> by mlohry Mon Aug  5 22:50:12 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #6 MatStashScatterBegin_Ref() line 476 in
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: #7 MatStashScatterBegin_Ref() line 476 in
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: #8 MatStashScatterBegin_Private() line 455 in
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: #9 MatAssemblyBegin_MPIAIJ() line 679 in
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>> > > > > [0]PETSC ERROR: #10 MatAssemblyBegin() line 5154 in
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>> Violation, probably memory access out of range
>>> > > > > [0]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> > > > > [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and
>>> Apple Mac OS X to find memory corruption errors
>>> > > > > [0]PETSC ERROR: likely location of problem given in stack below
>>> > > > > [0]PETSC ERROR: ---------------------  Stack Frames
>>> ------------------------------------
>>> > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are
>>> not available,
>>> > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of
>>> the function
>>> > > > > [0]PETSC ERROR:       is given.
>>> > > > > [0]PETSC ERROR: [0] MatStashScatterEnd_Ref line 137
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: [0] MatStashScatterEnd_Private line 126
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd_MPIAIJ line 698
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5234
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: [0] MatStashScatterBegin_Ref line 473
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: [0] MatStashScatterBegin_Private line 454
>>> /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
>>> > > > > [0]PETSC ERROR: [0] MatAssemblyBegin_MPIAIJ line 676
>>> /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>> > > > > [0]PETSC ERROR: [0] MatAssemblyBegin line 5143
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: [0] MatSetBlockSizes line 7204
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: [0] MatSetBlockSize line 7167
>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> > > > > [0]PETSC ERROR: [0] TSSetExactFinalTime line 2248
>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> > > > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > > > [0]PETSC ERROR: Signal received
>>> > > > > [0]PETSC ERROR: See
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16
>>> by mlohry Mon Aug  5 22:50:12 2019
>>> > > > > [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt
>>> --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc
>>> --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx
>>> --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes
>>> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1
>>> --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS
>>> --with-mpiexec=/usr/bin/srun
>>> > > > > [0]PETSC ERROR: #11 User provided function() line 0 in  unknown
>>> file
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190818/1d48219b/attachment-0001.html>


More information about the petsc-users mailing list