[petsc-users] Bug or mis-use for 64 indices PETSc mpi linear solver server with more than 8 cores

Mark Adams mfadams at lbl.gov
Thu Sep 5 06:56:27 CDT 2024


Barry's suggestion for testing got garbled in the gitlab issue posting.
Here it is, I think:

07:53 main *= ~/Codes/petsc$ make test
s=ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
/usr/local/bin/gmake --no-print-directory -f
/Users/markadams/Codes/petsc/gmakefile.test PETSC_ARCH=arch-macosx-gnu-O
PETSC_DIR=/Users/markadams/Codes/petsc test
Using MAKEFLAGS: --no-print-directory --
PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-O
s=ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
Application at path (
/Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec.hydra ) removed
from firewall
Application at path (
/Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec.hydra ) added to
firewall
Incoming connection to the application is blocked
       TEST
arch-macosx-gnu-O/tests/counts/ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1.counts
 ok ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1
 ok diff-ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1

On Wed, Sep 4, 2024 at 3:59 PM Lin_Yuxiang <linyx199071 at gmail.com> wrote:

> To whom it may concern:
>
>
>
> I recently tried to use the 64 indices PETSc to replace the legacy code's
> solver using MPI linear solver server. However, it gives me error when I
> use more than 8 cores, saying
>
>
>
> Get NNZ
>
> MatsetPreallocation
>
> MatsetValue
>
> MatSetValue Time per kernel: 43.1147 s
>
> Matassembly
>
> VecsetValue
>
> pestc_solve
>
> Read -1, expected 1951397280, errno = 14
>
>
>
> When I tried the -start_in_debugger, the error seems from MPI_Scatter:
>
>
>
> Rank0:
>
> #3  0x00001555512e4de5 in mca_pml_ob1_recv () from
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so
>
> #4  0x0000155553e01e60 in PMPI_Scatterv () from
> /lib/x86_64-linux-gnu/libmpi.so.40
>
> #5  0x0000155554b13eab in PCMPISetMat (pc=pc at entry=0x0) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:230
>
> #6  0x0000155554b17403 in PCMPIServerBegin () at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:464
>
> #7  0x00001555540b9aa4 in PetscInitialize_Common (prog=0x7fffffffe27b
> "geosimtrs_mpiserver", file=file at entry=0x0,
>
>     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> points in x-direction\n  -n <mesh"..., ftn=ftn at entry=PETSC_FALSE,
> readarguments=readarguments at entry=PETSC_FALSE, len=len at entry=0)
>
>     at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1109
>
> #8  0x00001555540bba82 in PetscInitialize (argc=argc at entry=0x7fffffffda8c,
> args=args at entry=0x7fffffffda80, file=file at entry=0x0,
>
>     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> points in x-direction\n  -n <mesh"...) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1274
>
> #9  0x0000555555557673 in main (argc=<optimized out>, args=<optimized
> out>) at geosimtrs_mpiserver.c:29
>
>
>
>  Rank1-10
>
> 0x0000155553e1f030 in ompi_coll_base_allgather_intra_bruck () from
> /lib/x86_64-linux-gnu/libmpi.so.40
>
> #4  0x0000155550f62aaa in ompi_coll_tuned_allgather_intra_dec_fixed ()
> from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so
>
> #5  0x0000155553ddb431 in PMPI_Allgather () from
> /lib/x86_64-linux-gnu/libmpi.so.40
>
> #6  0x00001555541a2289 in PetscLayoutSetUp (map=0x555555721ed0) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/vec/is/utils/pmap.c:248
>
> #7  0x000015555442e06a in MatMPIAIJSetPreallocationCSR_MPIAIJ
> (B=0x55555572d850, Ii=0x15545a778010, J=0x15545beacb60, v=0x1554cff55e60)
>
>     at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/mat/impls/aij/mpi/mpiaij.c:3885
>
> #8  0x00001555544284e3 in MatMPIAIJSetPreallocationCSR (B=0x55555572d850,
> i=0x15545a778010, j=0x15545beacb60, v=0x1554cff55e60) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/mat/impls/aij/mpi/mpiaij.c:3998
>
> #9  0x0000155554b1412f in PCMPISetMat (pc=pc at entry=0x0) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:250
>
> #10 0x0000155554b17403 in PCMPIServerBegin () at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:464
>
> #11 0x00001555540b9aa4 in PetscInitialize_Common (prog=0x7fffffffe27b
> "geosimtrs_mpiserver", file=file at entry=0x0,
>
>     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> points in x-direction\n  -n <mesh"..., ftn=ftn at entry=PETSC_FALSE,
> readarguments=readarguments at entry=PETSC_FALSE, len=len at entry=0) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1109
>
> #12 0x00001555540bba82 in PetscInitialize (argc=argc at entry=0x7fffffffda8c,
> args=args at entry=0x7fffffffda80, file=file at entry=0x0,
>
>     help=help at entry=0x55555555a1e0 <help> "Solves a linear system in
> parallel with KSP.\nInput parameters include:\n  -view_exact_sol   : write
> exact solution vector to stdout\n  -m <mesh_x>       : number of mesh
> points in x-direction\n  -n <mesh"...) at
> /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1274
>
> #13 0x0000555555557673 in main (argc=<optimized out>, args=<optimized
> out>) at geosimtrs_mpiserver.c:29
>
>
>
> This did not happen in 32bit indiced PETSc, running with more than 8 cores
> runs smoothly using MPI linear solver server, nor did it happen on 64 bit
> indiced MPI version (not with mpi_linear_solver_server), only happens on 64
> bit PETSc mpi linear solver server, I think it maybe a potential bug?
>
>
>
> Any advice would be greatly appreciated, the matrix and ia, ja is too big
> to upload, so if anything you need to debug pls let me know
>
>
>
>    -
>
>    Machine type: HPC
>    -
>
>    OS version and type: Linux houamd009 6.1.55-cggdb11-1 #1 SMP Fri Sep
>    29 10:09:13 UTC 2023 x86_64 GNU/Linux
>    -
>
>    PETSc version: #define PETSC_VERSION_RELEASE    1
>    #define PETSC_VERSION_MAJOR      3
>    #define PETSC_VERSION_MINOR      20
>    #define PETSC_VERSION_SUBMINOR   4
>    #define PETSC_RELEASE_DATE       "Sep 28, 2023"
>    #define PETSC_VERSION_DATE       "Jan 29, 2024"
>    -
>
>    MPI implementation: OpenMPI
>    -
>
>    Compiler and version: GNU
>
>
>
> Yuxiang Lin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240905/851e1c79/attachment-0001.html>


More information about the petsc-users mailing list