[petsc-users] Sporadic MPI_Allreduce() called in different locations on larger core counts

Smith, Barry F. bsmith at mcs.anl.gov
Sun Aug 18 13:38:35 CDT 2019



  Excellant, we'll still need to fix the parallel coloring but I'm glad we can put that off :-)

  Barry

> On Aug 18, 2019, at 1:19 PM, Mark Lohry <mlohry at gmail.com> wrote:
> 
> Barry, thanks for your suggestion to do the serial coloring on the mesh itself / block size 1 case first, and then manually color the blocks. Works like a charm. The 2 million cell case is small enough to create the sparse system on one process and color it in about a second. 
> 
> On Sun, Aug 11, 2019 at 9:41 PM Mark Lohry <mlohry at gmail.com> wrote:
> So the parallel JP runs just as proportionally slow in serial as it does in parallel.
> 
> valgrind --tool=callgrind shows essentially 100% of the runtime in jp.c:255-262, within the larger loop commented 
> /* pass two -- color it by looking at nearby vertices and building a mask */
> 
> for (j=0;j<ncols;j++) {
> if (seen[cols[j]] != cidx) {
> bidx++;
> seen[cols[j]] = cidx;
> idxbuf[bidx] = cols[j];
> distbuf[bidx] = dist+1;
> }
> }
> 
> I'll dig into how this algorithm is supposed to work, but anything obvious in there? It kinda feels like something is doing something N^2 or worse when it doesn't need to be.
> 
> On Sun, Aug 11, 2019 at 3:47 PM Mark Lohry <mlohry at gmail.com> wrote:
> Sorry, forgot to reply to the mailing list.
> 
> where does your matrix come from? A mesh? Structured, unstructured, a graph, something else? What type of discretization?
> 
> Unstructured tetrahedral mesh (CGNS, I can give links to the files if that's of interest), the discretization is arbitrary order discontinuous galerkin for compressible navier-stokes. 5 coupled equations x 10 nodes per element for this 2nd order case to give the 50x50 blocks. Each tet cell dependent on neighbors, so for tets 4 extra off-diagonal blocks per cell.
> 
> I would expect one could exploit the large block size here in computing the coloring -- the underlying mesh is 2M nodes with the same connectivity as a standard cell-centered finite volume method.
> 
> 
> 
> On Sun, Aug 11, 2019 at 2:12 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   These are due to attempting to copy the entire matrix to one process and do the sequential coloring there. Definitely won't work for larger problems, we'll 
> 
>    need to focus on
> 
> 1) having useful parallel coloring and 
> 2) maybe using an alternative way to determine the coloring:
> 
>      where does your matrix come from? A mesh? Structured, unstructured, a graph, something else? What type of discretization?
> 
>    Barry
> 
> 
> > On Aug 11, 2019, at 10:21 AM, Mark Lohry <mlohry at gmail.com> wrote:
> > 
> > On the very large case, there does appear to be some kind of overflow ending up with an attempt to allocate too much memory in MatFDColorCreate, even with --with-64-bit-indices. Full terminal output here:
> > https://raw.githubusercontent.com/mlohry/petsc_miscellany/master/slurm-3451378.out
> > 
> > In particular:
> > PETSC ERROR: Memory requested 1036713571771129344
> > 
> > Log filename here:
> > https://github.com/mlohry/petsc_miscellany/blob/master/petsclogfile.0
> > 
> > On Sun, Aug 11, 2019 at 9:49 AM Mark Lohry <mlohry at gmail.com> wrote:
> > Hi Barry, I made a minimum example comparing the colorings on a very small case. You'll need to unzip the jacobian_sparsity.tgz to run it.
> > 
> > https://github.com/mlohry/petsc_miscellany
> > 
> > This is sparse block system with 50x50 block sizes, ~7,680 blocks. Comparing the coloring types sl, lf, jp, id, greedy, I get these timings wallclock, running with -np 16:
> > 
> > SL: 1.5s
> > LF: 1.3s
> > JP: 29s !
> > ID: 1.4s
> > greedy: 2s
> > 
> > As far as I'm aware, JP is the only parallel coloring implemented? It is looking as though I'm simply running out of memory with the sequential methods (I should apologize to my cluster admin for chewing up 10TB and crashing...).
> > 
> > On this small problem JP is taking 30 seconds wallclock, but that time grows exponentially with larger problems (last I tried it, I killed the job after 24 hours of spinning.)
> > 
> > Also as I mentioned, the "greedy" method appears to be producing an invalid coloring for me unless I also specify weights "lexical". But "-mat_coloring_test" doesn't complain. I'll have to make a different example to actually show it's an invalid coloring.
> > 
> > Thanks,
> > Mark
> > 
> > 
> > 
> > On Sat, Aug 10, 2019 at 4:38 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > 
> >   Mark,
> > 
> >    Would you be able to cook up an example (or examples)  that demonstrate the problem (or problems)  and how to run it? If you send it to us and we can reproduce the problem then we'll fix it. If need be you can send large matrices to petsc-maint at mcs.anl.gov don't send them to petsc-users since it will reject large files.
> > 
> >    Barry
> > 
> > 
> > > On Aug 10, 2019, at 1:56 PM, Mark Lohry <mlohry at gmail.com> wrote:
> > > 
> > > Thanks Barry, been trying all of the above. I think I've homed in on it to an out-of-memory and/or integer overflow inside MatColoringApply. Which makes some sense since I only have a sequential coloring algorithm working...
> > > 
> > > Is anyone out there using coloring in parallel? I still have the same previously mentioned issues with MATCOLORINGJP (on small problems takes upwards of 30 minutes to run) which as far as I can see is the only "parallel" implementation. MATCOLORINGSL and MATCOLORINGID both work on less large problems, MATCOLORINGGREEDY works on less large problems if and only if I set weight type to MAT_COLORING_WEIGHT_LEXICAL, and all 3 are failing on larger problems.
> > > 
> > > On Tue, Aug 6, 2019 at 9:36 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > > 
> > >   There is also 
> > > 
> > > $ ./configure --help | grep color
> > >   --with-is-color-value-type=<char,short>
> > >        char, short can store 256, 65536 colors  current: short
> > > 
> > > I can't imagine you have over 65 k colors but something to check
> > > 
> > > 
> > > > On Aug 6, 2019, at 8:19 AM, Mark Lohry <mlohry at gmail.com> wrote:
> > > > 
> > > > My first guess is that the code is getting integer overflow somewhere. 25 billion is well over the 2 billion that 32 bit integers can hold.
> > > > 
> > > > Mine as well -- though in later tests I have the same issue when using --with-64-bit-indices. Ironically I had removed that flag at some point because the coloring / index set was using a serious chunk of total memory on medium sized problems.
> > > 
> > >   Understood
> > > 
> > > > 
> > > > Questions on the petsc internals there though: Are matrices indexed with two integers (i,j) so the max matrix dimension is (int limit) x (int limit) or a single integer so the max dimension is sqrt(int limit)? 
> > > > Also I was operating under the assumption the 32 bit limit should only constrain per-process problem sizes (25B over 400 processes giving 62M non-zeros per process), is that not right?
> > > 
> > >    It is mostly right but may not be right for everything in PETSc. For example I don't know about the MatFD code
> > > 
> > >    Since using a debugger is not practical for large code counts to find the point the two processes diverge you can try 
> > > 
> > > -log_trace 
> > > 
> > > or 
> > > 
> > > -log_trace filename
> > > 
> > > in the second case it will generate one file per core called filename.%d  note it will produce a lot of output
> > > 
> > >   Good luck
> > > 
> > > 
> > > 
> > > > 
> > > >    We are adding more tests to nicely handle integer overflow but it is not easy since it can occur in so many places
> > > > 
> > > > Totally understood. I know the pain of only finding an overflow bug after days of waiting in a cluster queue for a big job.
> > > > 
> > > > We urge you to upgrade.
> > > > 
> > > > I'll do that today and hope for the best. On first tests on 3.11.3, I still have a couple issues with the coloring code:
> > > > 
> > > > * I am still getting the nasty hangs with MATCOLORINGJP mentioned here: https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2017-October/033746.html
> > > > * MatColoringSetType(coloring, MATCOLORINGGREEDY);  this produces a wrong jacobian unless I also set MatColoringSetWeightType(coloring, MAT_COLORING_WEIGHT_LEXICAL);
> > > > * MATCOLORINGMIS mentioned in the documentation doesn't seem to exist.
> > > > 
> > > > Thanks,
> > > > Mark
> > > > 
> > > > On Tue, Aug 6, 2019 at 8:56 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > > > 
> > > >    My first guess is that the code is getting integer overflow somewhere. 25 billion is well over the 2 billion that 32 bit integers can hold.
> > > > 
> > > >    We urge you to upgrade.
> > > > 
> > > >    Regardless for problems this large you likely need  the ./configure option --with-64-bit-indices
> > > > 
> > > >    We are adding more tests to nicely handle integer overflow but it is not easy since it can occur in so many places
> > > > 
> > > >    Hopefully this will resolve your problem with large process counts
> > > > 
> > > >    Barry
> > > > 
> > > > 
> > > > > On Aug 6, 2019, at 7:43 AM, Mark Lohry via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > > > > 
> > > > > I'm running some larger cases than I have previously with a working code, and I'm running into failures I don't see on smaller cases. Failures are on 400 cores, ~100M unknowns, 25B non-zero jacobian entries. Runs successfully on half size case on 200 cores.
> > > > > 
> > > > > 1) The first error output from petsc is "MPI_Allreduce() called in different locations". Is this a red herring, suggesting some process failed prior to this and processes have diverged?
> > > > > 
> > > > > 2) I don't think I'm running out of memory -- globally at least. Slurm output shows e.g.
> > > > > Memory Utilized: 459.15 GB (estimated maximum)
> > > > > Memory Efficiency: 26.12% of 1.72 TB (175.78 GB/node)
> > > > > I did try with and without --64-bit-indices.
> > > > > 
> > > > > 3) The debug traces seem to vary, see below. I *think* the failure might be happening in the vicinity of a Coloring call. I'm using MatFDColoring like so:
> > > > > 
> > > > >    ISColoring    iscoloring;
> > > > >     MatFDColoring fdcoloring;
> > > > >     MatColoring   coloring;
> > > > > 
> > > > >     MatColoringCreate(ctx.JPre, &coloring);
> > > > >     MatColoringSetType(coloring, MATCOLORINGGREEDY);
> > > > > 
> > > > >    // converges stalls badly without this on small cases, don't know why
> > > > >     MatColoringSetWeightType(coloring, MAT_COLORING_WEIGHT_LEXICAL);
> > > > > 
> > > > >    // none of these worked.
> > > > >     //    MatColoringSetType(coloring, MATCOLORINGJP);
> > > > >     // MatColoringSetType(coloring, MATCOLORINGSL);
> > > > >     // MatColoringSetType(coloring, MATCOLORINGID);
> > > > >     MatColoringSetFromOptions(coloring);
> > > > > 
> > > > >     MatColoringApply(coloring, &iscoloring);
> > > > >     MatColoringDestroy(&coloring);
> > > > >     MatFDColoringCreate(ctx.JPre, iscoloring, &fdcoloring);
> > > > > 
> > > > > I have had issues in the past with getting a functional coloring setup for finite difference jacobians, and the above is the only configuration I've managed to get working successfully. Have there been any significant development changes to that area of code since v3.8.3? I'll try upgrading in the mean time and hope for the best.
> > > > > 
> > > > > 
> > > > > 
> > > > > Any ideas?
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > Mark
> > > > > 
> > > > > 
> > > > > *************************************
> > > > > 
> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429773.out
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (functions) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19 by mlohry Tue Aug  6 06:05:02 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun --with-64-bit-indices
> > > > > [0]PETSC ERROR: #1 TSSetMaxSteps() line 2944 in /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: #2 TSSetMaxSteps() line 2944 in /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Invalid argument
> > > > > [0]PETSC ERROR: Enum value must be same on all processes, argument # 2
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19 by mlohry Tue Aug  6 06:05:02 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun --with-64-bit-indices
> > > > > [0]PETSC ERROR: #3 TSSetExactFinalTime() line 2250 in /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: ------------------------------------------------------------------------
> > > > > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
> > > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> > > > > [0]PETSC ERROR: likely location of problem given in stack below
> > > > > [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> > > > > [0]PETSC ERROR:       is given.
> > > > > [0]PETSC ERROR: [0] PetscCommDuplicate line 130 /home/mlohry/build/external/petsc/src/sys/objects/tagm.c
> > > > > [0]PETSC ERROR: [0] PetscHeaderCreate_Private line 34 /home/mlohry/build/external/petsc/src/sys/objects/inherit.c
> > > > > [0]PETSC ERROR: [0] DMCreate line 36 /home/mlohry/build/external/petsc/src/dm/interface/dm.c
> > > > > [0]PETSC ERROR: [0] DMShellCreate line 983 /home/mlohry/build/external/petsc/src/dm/impls/shell/dmshell.c
> > > > > [0]PETSC ERROR: [0] TSGetDM line 5287 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: [0] TSSetIFunction line 1310 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: [0] TSSetExactFinalTime line 2248 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: [0] TSSetMaxSteps line 2942 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Signal received
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19 by mlohry Tue Aug  6 06:05:02 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun --with-64-bit-indices
> > > > > [0]PETSC ERROR: #4 User provided function() line 0 in  unknown file
> > > > > 
> > > > > 
> > > > > *************************************
> > > > > 
> > > > > 
> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429158.out
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1 by mlohry Mon Aug  5 23:58:19 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #1 MatSetBlockSizes() line 7206 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: #2 MatSetBlockSizes() line 7206 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: #3 MatSetBlockSize() line 7170 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1 by mlohry Mon Aug  5 23:58:19 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #4 VecSetSizes() line 1310 in /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
> > > > > [0]PETSC ERROR: #5 VecSetSizes() line 1310 in /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
> > > > > [0]PETSC ERROR: #6 VecCreateMPIWithArray() line 609 in /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
> > > > > [0]PETSC ERROR: #7 MatSetUpMultiply_MPIAIJ() line 111 in /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mmaij.c
> > > > > [0]PETSC ERROR: #8 MatAssemblyEnd_MPIAIJ() line 735 in /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
> > > > > [0]PETSC ERROR: #9 MatAssemblyEnd() line 5243 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: ------------------------------------------------------------------------
> > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> > > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> > > > > [0]PETSC ERROR: likely location of problem given in stack below
> > > > > [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> > > > > [0]PETSC ERROR:       is given.
> > > > > [0]PETSC ERROR: [0] PetscSFSetGraphLayout line 497 /home/mlohry/build/external/petsc/src/vec/is/utils/pmap.c
> > > > > [0]PETSC ERROR: [0] GreedyColoringLocalDistanceTwo_Private line 208 /home/mlohry/build/external/petsc/src/mat/color/impls/greedy/greedy.c
> > > > > [0]PETSC ERROR: [0] MatColoringApply_Greedy line 559 /home/mlohry/build/external/petsc/src/mat/color/impls/greedy/greedy.c
> > > > > [0]PETSC ERROR: [0] MatColoringApply line 357 /home/mlohry/build/external/petsc/src/mat/color/interface/matcoloring.c
> > > > > [0]PETSC ERROR: [0] VecSetSizes line 1308 /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
> > > > > [0]PETSC ERROR: [0] VecCreateMPIWithArray line 605 /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
> > > > > [0]PETSC ERROR: [0] MatSetUpMultiply_MPIAIJ line 24 /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mmaij.c
> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd_MPIAIJ line 698 /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5234 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: [0] MatSetBlockSizes line 7204 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: [0] MatSetBlockSize line 7167 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Signal received
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1 by mlohry Mon Aug  5 23:58:19 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #10 User provided function() line 0 in  unknown file
> > > > > 
> > > > > 
> > > > > 
> > > > > *************************
> > > > > 
> > > > > 
> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429134.out
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n1 by mlohry Mon Aug  5 23:24:23 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #1 PetscSplitOwnership() line 88 in /home/mlohry/build/external/petsc/src/sys/utils/psplit.c
> > > > > [0]PETSC ERROR: #2 PetscSplitOwnership() line 88 in /home/mlohry/build/external/petsc/src/sys/utils/psplit.c
> > > > > [0]PETSC ERROR: #3 PetscLayoutSetUp() line 137 in /home/mlohry/build/external/petsc/src/vec/is/utils/pmap.c
> > > > > [0]PETSC ERROR: #4 VecCreate_MPI_Private() line 489 in /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
> > > > > [0]PETSC ERROR: #5 VecCreate_MPI() line 537 in /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c
> > > > > [0]PETSC ERROR: #6 VecSetType() line 51 in /home/mlohry/build/external/petsc/src/vec/vec/interface/vecreg.c
> > > > > [0]PETSC ERROR: #7 VecCreateMPI() line 40 in /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/vmpicr.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Object is in wrong state
> > > > > [0]PETSC ERROR: Vec object's type is not set: Argument # 1
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n1 by mlohry Mon Aug  5 23:24:23 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #8 VecGetLocalSize() line 665 in /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c
> > > > > 
> > > > > 
> > > > > 
> > > > > **************************************
> > > > > 
> > > > > 
> > > > > 
> > > > > mlohry at lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429102.out
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by mlohry Mon Aug  5 22:50:12 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #1 TSSetExactFinalTime() line 2250 in /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: #2 TSSetExactFinalTime() line 2250 in /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by mlohry Mon Aug  5 22:50:12 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #3 MatSetBlockSizes() line 7206 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: #4 MatSetBlockSizes() line 7206 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: #5 MatSetBlockSize() line 7170 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Petsc has generated inconsistent data
> > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by mlohry Mon Aug  5 22:50:12 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #6 MatStashScatterBegin_Ref() line 476 in /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: #7 MatStashScatterBegin_Ref() line 476 in /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: #8 MatStashScatterBegin_Private() line 455 in /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: #9 MatAssemblyBegin_MPIAIJ() line 679 in /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
> > > > > [0]PETSC ERROR: #10 MatAssemblyBegin() line 5154 in /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: ------------------------------------------------------------------------
> > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> > > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> > > > > [0]PETSC ERROR: likely location of problem given in stack below
> > > > > [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> > > > > [0]PETSC ERROR:       is given.
> > > > > [0]PETSC ERROR: [0] MatStashScatterEnd_Ref line 137 /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: [0] MatStashScatterEnd_Private line 126 /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd_MPIAIJ line 698 /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
> > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5234 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: [0] MatStashScatterBegin_Ref line 473 /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: [0] MatStashScatterBegin_Private line 454 /home/mlohry/build/external/petsc/src/mat/utils/matstash.c
> > > > > [0]PETSC ERROR: [0] MatAssemblyBegin_MPIAIJ line 676 /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c
> > > > > [0]PETSC ERROR: [0] MatAssemblyBegin line 5143 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: [0] MatSetBlockSizes line 7204 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: [0] MatSetBlockSize line 7167 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> > > > > [0]PETSC ERROR: [0] TSSetExactFinalTime line 2248 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> > > > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > > > [0]PETSC ERROR: Signal received
> > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 
> > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by mlohry Mon Aug  5 22:50:12 2019
> > > > > [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS --with-mpiexec=/usr/bin/srun
> > > > > [0]PETSC ERROR: #11 User provided function() line 0 in  unknown file
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 



More information about the petsc-users mailing list