<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Am 17.10.2012 17:50, schrieb Hong
Zhang:<br>
</div>
<blockquote
cite="mid:CAGCphBtYv8ccjtKWqJHAVLFPfVXFR7Xrt+vTLmfpWtG+OWiiQg@mail.gmail.com"
type="cite">Thomas:
<div><br>
</div>
<div>Does this occur only for large matrices?</div>
<div>Can you dump your matrices into petsc binary files </div>
<div>(e.g., A.dat, B.dat) and send to us for debugging?</div>
<div><br>
</div>
<div>Lately, we added a new implementation
of MatTransposeMatMult() in petsc-dev</div>
<div>which is shown much faster than
released MatTransposeMatMult().</div>
<div>You might give it a try by</div>
<div>1. install petsc-dev (see <a moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/developers/index.html">http://www.mcs.anl.gov/petsc/developers/index.html</a>)</div>
<div>2. run your code with option '<span
style="color:rgb(34,34,34);font-family:'courier
new',monospace;font-size:13px;background-color:rgb(255,255,255)">-mattransposematmult_</span><span
style="color:rgb(34,34,34);font-family:'courier
new',monospace;font-size:13px;background-color:rgb(255,255,255)">viamatmatmult
1'</span></div>
<div><span style="color:rgb(34,34,34);font-family:'courier
new',monospace;font-size:13px;background-color:rgb(255,255,255)">Let
us know what you get.</span></div>
<div><span style="color: rgb(34, 34, 34); font-family: 'courier
new',monospace; font-size: 13px; background-color: rgb(255,
255, 255);"><br>
</span></div>
</blockquote>
I checked the problem with petsc-dev. Here, the code just hangs
somewhere inside MatTransposeMatMult. I checked, what MatTranspose
does on the corresponding matrix and the behavior is the same. I
extracted the matrix from my simulations, its of size 123,432 x
1,533,726 and very sparse (2 to 8 nnzs per row). I'm sorry, but this
is the smallest matrix where I found the problem (I will send the
matrix file to petsc-maint). I wrote some small piece of code, that
just reads the matrix and runs MatTranspose. With 1 mpi task, it
works fine. With small number of mpi tasks (so around 8), I get the
following error message:<br>
<br>
[1]PETSC ERROR:
------------------------------------------------------------------------<br>
[1]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or
the batch system) has told this process to end<br>
[1]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger<br>
[1]PETSC ERROR: or see
<a class="moz-txt-link-freetext" href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a>[1]PETSC
ERROR: or try <a class="moz-txt-link-freetext" href="http://valgrind.org">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to
find memory corruption errors<br>
[1]PETSC ERROR: likely location of problem given in stack below<br>
[1]PETSC ERROR: --------------------- Stack Frames
------------------------------------<br>
[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
available,<br>
[1]PETSC ERROR: INSTEAD the line number of the start of the
function<br>
[1]PETSC ERROR: is given.<br>
[1]PETSC ERROR: [1] PetscSFReduceEnd line 1259 src/sys/sf/sf.c<br>
[1]PETSC ERROR: [1] MatTranspose_MPIAIJ line 2045
src/mat/impls/aij/mpi/mpiaij.c<br>
[1]PETSC ERROR: [1] MatTranspose line 4341
src/mat/interface/matrix.c<br>
<br>
<br>
With 32 mpi tasks, which I also use in my simulation, the code hangs
in MatTranspose.<br>
<br>
If there is something more I can do to help you finding the problem,
please let me know!<br>
<br>
Thomas<br>
<br>
<blockquote
cite="mid:CAGCphBtYv8ccjtKWqJHAVLFPfVXFR7Xrt+vTLmfpWtG+OWiiQg@mail.gmail.com"
type="cite">
<div><span style="color:rgb(34,34,34);font-family:'courier
new',monospace;font-size:13px;background-color:rgb(255,255,255)">
</span></div>
<div><span style="color:rgb(34,34,34);font-family:'courier
new',monospace;font-size:13px;background-color:rgb(255,255,255)">Hong</span></div>
<div><br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
My code makes use of the function MatTransposeMatMult, and
usually it work fine! For some larger input data, it now
stops with a lot of MPI errors:<br>
<br>
fatal error in PMPI_Barrier: Other MPI error, error stack:<br>
PMPI_Barrier(476)..: MPI_Barrier(comm=0x84000001) failed<br>
MPIR_Barrier(82)...:<br>
MPI_Waitall(261): MPI_Waitall(count=9, req_array=0xa787ba0,
status_array=0xa789240) failed<br>
MPI_Waitall(113): The supplied request in array element 8
was invalid (kind=0)<br>
Fatal error in PMPI_Barrier: Other MPI error, error stack:<br>
PMPI_Barrier(476)..: MPI_Barrier(comm=0x84000001) failed<br>
MPIR_Barrier(82)...:<br>
mpid_irecv_done(98): read from socket failed - request
state:recv(pde)done<br>
<br>
<br>
Here is the stack print from the debugger:<br>
<br>
6, MatTransposeMatMult (matrix.c:8907)<br>
6, MatTransposeMatMult_MPIAIJ_MPIAIJ
(mpimatmatmult.c:809)<br>
6, MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
(mpimatmatmult.c:1136)<br>
6, PetscGatherMessageLengths2
(mpimesg.c:213)<br>
6, PMPI_Waitall<br>
6, MPIR_Err_return_comm<br>
6, MPID_Abort<br>
<br>
<br>
I use PETSc 3.3-p3. Any idea whether this is or could be
related to some bug in PETSc or whether I make wrong use of
the function in some way?<span class="HOEnZb"><font
color="#888888"><br>
<br>
Thomas<br>
<br>
</font></span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
</body>
</html>