[petsc-users] Bug or mis-use for 64 indices PETSc mpi linear solver server with more than 8 cores

Satish Balay balay.anl at fastmail.org
Thu Sep 5 08:48:08 CDT 2024


I fixed the "garbled" text at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/issues/1643__;!!G_uCfscf7eWS!YEs2AyYzwZQphspkgEhgojcPm1ypNLOD9_BZwl9NzvkOhN7BdIXzBHuN2n_6s2yV2e8VMCn58IgSGrN6NakWS5sRZnw$  - and best if additional followup is on the issue tracker [so that the replies to this issue don't get scattered]

Satish

On Thu, 5 Sep 2024, Mark Adams wrote:

> Barry's suggestion for testing got garbled in the gitlab issue posting.
> Here it is, I think:
> 
> 07:53 main *= ~/Codes/petsc$ make test
> s=ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
> /usr/local/bin/gmake --no-print-directory -f
> /Users/markadams/Codes/petsc/gmakefile.test PETSC_ARCH=arch-macosx-gnu-O
> PETSC_DIR=/Users/markadams/Codes/petsc test
> Using MAKEFLAGS: --no-print-directory --
> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-O
> s=ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
> Application at path (
> /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec.hydra ) removed
> from firewall
> Application at path (
> /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec.hydra ) added to
> firewall
> Incoming connection to the application is blocked
>        TEST
> arch-macosx-gnu-O/tests/counts/ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1.counts
>  ok ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
>  ok diff-ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
> 
> On Wed, Sep 4, 2024 at 3:59 PM Lin_Yuxiang <linyx199071 at gmail.com> wrote:
> 
> > To whom it may concern:
> >
> >
> >
> > I recently tried to use the 64 indices PETSc to replace the legacy code's
> > solver using MPI linear solver server. However, it gives me error when I
> > use more than 8 cores, saying
> >
> >
> >
> > Get NNZ
> >
> > MatsetPreallocation
> >
> > MatsetValue
> >
> > MatSetValue Time per kernel: 43.1147 s
> >
> > Matassembly
> >
> > VecsetValue
> >
> > pestc_solve
> >
> > Read -1, expected 1951397280, errno = 14
> >
> >
> >
> > When I tried the -start_in_debugger, the error seems from MPI_Scatter:
> >
> >
> >
> > Rank0:
> >
> > #3  0x00001555512e4de5 in mca_pml_ob1_recv () from
> > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so
> >
> > #4  0x0000155553e01e60 in PMPI_Scatterv () from
> > /lib/x86_64-linux-gnu/libmpi.so.40
> >
> > #5  0x0000155554b13eab in PCMPISetMat (pc=pc at entry=0x0) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:230
> >
> > #6  0x0000155554b17403 in PCMPIServerBegin () at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:464
> >
> > #7  0x00001555540b9aa4 in PetscInitialize_Common (prog=0x7fffffffe27b
> > "geosimtrs_mpiserver", file=file at entry=0x0,
> >
> >     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> > parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> > exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> > points in x-direction\n  -n <mesh"..., ftn=ftn at entry=PETSC_FALSE,
> > readarguments=readarguments at entry=PETSC_FALSE, len=len at entry=0)
> >
> >     at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1109
> >
> > #8  0x00001555540bba82 in PetscInitialize (argc=argc at entry=0x7fffffffda8c,
> > args=args at entry=0x7fffffffda80, file=file at entry=0x0,
> >
> >     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> > parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> > exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> > points in x-direction\n  -n <mesh"...) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1274
> >
> > #9  0x0000555555557673 in main (argc=<optimized out>, args=<optimized
> > out>) at geosimtrs_mpiserver.c:29
> >
> >
> >
> >  Rank1-10
> >
> > 0x0000155553e1f030 in ompi_coll_base_allgather_intra_bruck () from
> > /lib/x86_64-linux-gnu/libmpi.so.40
> >
> > #4  0x0000155550f62aaa in ompi_coll_tuned_allgather_intra_dec_fixed ()
> > from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so
> >
> > #5  0x0000155553ddb431 in PMPI_Allgather () from
> > /lib/x86_64-linux-gnu/libmpi.so.40
> >
> > #6  0x00001555541a2289 in PetscLayoutSetUp (map=0x555555721ed0) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/vec/is/utils/pmap.c:248
> >
> > #7  0x000015555442e06a in MatMPIAIJSetPreallocationCSR_MPIAIJ
> > (B=0x55555572d850, Ii=0x15545a778010, J=0x15545beacb60, v=0x1554cff55e60)
> >
> >     at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/mat/impls/aij/mpi/mpiaij.c:3885
> >
> > #8  0x00001555544284e3 in MatMPIAIJSetPreallocationCSR (B=0x55555572d850,
> > i=0x15545a778010, j=0x15545beacb60, v=0x1554cff55e60) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/mat/impls/aij/mpi/mpiaij.c:3998
> >
> > #9  0x0000155554b1412f in PCMPISetMat (pc=pc at entry=0x0) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:250
> >
> > #10 0x0000155554b17403 in PCMPIServerBegin () at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:464
> >
> > #11 0x00001555540b9aa4 in PetscInitialize_Common (prog=0x7fffffffe27b
> > "geosimtrs_mpiserver", file=file at entry=0x0,
> >
> >     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> > parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> > exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> > points in x-direction\n  -n <mesh"..., ftn=ftn at entry=PETSC_FALSE,
> > readarguments=readarguments at entry=PETSC_FALSE, len=len at entry=0) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1109
> >
> > #12 0x00001555540bba82 in PetscInitialize (argc=argc at entry=0x7fffffffda8c,
> > args=args at entry=0x7fffffffda80, file=file at entry=0x0,
> >
> >     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> > parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> > exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> > points in x-direction\n  -n <mesh"...) at
> > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1274
> >
> > #13 0x0000555555557673 in main (argc=<optimized out>, args=<optimized
> > out>) at geosimtrs_mpiserver.c:29
> >
> >
> >
> > This did not happen in 32bit indiced PETSc, running with more than 8 cores
> > runs smoothly using MPI linear solver server, nor did it happen on 64 bit
> > indiced MPI version (not with mpi_linear_solver_server), only happens on 64
> > bit PETSc mpi linear solver server, I think it maybe a potential bug?
> >
> >
> >
> > Any advice would be greatly appreciated, the matrix and ia, ja is too big
> > to upload, so if anything you need to debug pls let me know
> >
> >
> >
> >    -
> >
> >    Machine type: HPC
> >    -
> >
> >    OS version and type: Linux houamd009 6.1.55-cggdb11-1 #1 SMP Fri Sep
> >    29 10:09:13 UTC 2023 x86_64 GNU/Linux
> >    -
> >
> >    PETSc version: #define PETSC_VERSION_RELEASE    1
> >    #define PETSC_VERSION_MAJOR      3
> >    #define PETSC_VERSION_MINOR      20
> >    #define PETSC_VERSION_SUBMINOR   4
> >    #define PETSC_RELEASE_DATE       "Sep 28, 2023"
> >    #define PETSC_VERSION_DATE       "Jan 29, 2024"
> >    -
> >
> >    MPI implementation: OpenMPI
> >    -
> >
> >    Compiler and version: GNU
> >
> >
> >
> > Yuxiang Lin
> >
> 


More information about the petsc-users mailing list