[petsc-users] MatTransposeMatMult ends up with an MPI error

Thomas Witkowski thomas.witkowski at tu-dresden.de
Wed Oct 17 14:57:05 CDT 2012


Am 17.10.2012 17:50, schrieb Hong Zhang:
> Thomas:
>
> Does this occur only for large matrices?
> Can you dump your matrices into petsc binary files
> (e.g., A.dat, B.dat) and send to us for debugging?
>
> Lately, we added a new implementation of MatTransposeMatMult() in 
> petsc-dev
> which is shown much faster than released MatTransposeMatMult().
> You might give it a try by
> 1. install petsc-dev (see 
> http://www.mcs.anl.gov/petsc/developers/index.html)
> 2. run your code with option '-mattransposematmult_viamatmatmult 1'
> Let us know what you get.
>
I checked the problem with petsc-dev. Here, the code just hangs 
somewhere inside MatTransposeMatMult. I checked, what MatTranspose does 
on the corresponding matrix and the behavior is the same. I extracted 
the matrix from my simulations, its of size 123,432 x 1,533,726 and very 
sparse (2 to 8 nnzs per row). I'm sorry, but this is the smallest matrix 
where I found the problem (I will send the matrix file to petsc-maint). 
I wrote some small piece of code, that just reads the matrix and runs 
MatTranspose. With 1 mpi task, it works fine. With small number of mpi 
tasks (so around 8), I get the following error message:

[1]PETSC ERROR: 
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the 
batch system) has told this process to end
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC 
ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------
[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[1]PETSC ERROR:       INSTEAD the line number of the start of the function
[1]PETSC ERROR:       is given.
[1]PETSC ERROR: [1] PetscSFReduceEnd line 1259 src/sys/sf/sf.c
[1]PETSC ERROR: [1] MatTranspose_MPIAIJ line 2045 
src/mat/impls/aij/mpi/mpiaij.c
[1]PETSC ERROR: [1] MatTranspose line 4341 src/mat/interface/matrix.c


With 32 mpi tasks, which I also use in my simulation, the code hangs in 
MatTranspose.

If there is something more I can do to help you finding the problem, 
please let me know!

Thomas

> Hong
>
>     My code makes use of the function MatTransposeMatMult, and usually
>     it work fine! For some larger input data, it now stops with a lot
>     of MPI errors:
>
>     fatal error in PMPI_Barrier: Other MPI error, error stack:
>     PMPI_Barrier(476)..: MPI_Barrier(comm=0x84000001) failed
>     MPIR_Barrier(82)...:
>     MPI_Waitall(261): MPI_Waitall(count=9, req_array=0xa787ba0,
>     status_array=0xa789240) failed
>     MPI_Waitall(113): The supplied request in array element 8 was
>     invalid (kind=0)
>     Fatal error in PMPI_Barrier: Other MPI error, error stack:
>     PMPI_Barrier(476)..: MPI_Barrier(comm=0x84000001) failed
>     MPIR_Barrier(82)...:
>     mpid_irecv_done(98): read from socket failed - request
>     state:recv(pde)done
>
>
>     Here is the stack print from the debugger:
>
>     6,                MatTransposeMatMult (matrix.c:8907)
>     6,                  MatTransposeMatMult_MPIAIJ_MPIAIJ
>     (mpimatmatmult.c:809)
>     6,                    MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
>     (mpimatmatmult.c:1136)
>     6,                      PetscGatherMessageLengths2 (mpimesg.c:213)
>     6,                        PMPI_Waitall
>     6,                          MPIR_Err_return_comm
>     6,                            MPID_Abort
>
>
>     I use PETSc 3.3-p3. Any idea whether this is or could be related
>     to some bug in PETSc or whether I make wrong use of the function
>     in some way?
>
>     Thomas
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121017/8ddf1a06/attachment-0001.html>


More information about the petsc-users mailing list