<div dir="ltr"><div>Oh. You're opening a can of worms but maybe that's your intent ;) I see the block Jacobi preconditioner in the valgrind logs.</div><div> </div><div>Do, </div><div>mpirun -n 1 (or 2) ./ex7 -mat_type mpiaijcusparse -vec_type mpicusp -pc_type none</div>
<div> </div><div>From here, we can try to sort out the VecScatterInitializeForGPU problem when mpirun/exec is not used.</div><div> </div><div>If you want to implement block jacobi preconditioner on multiple GPUs, that's a larger problem to solve. I had some code that sort of worked. We'd have to sit down and discuss.</div>
<div>-Paul</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jan 22, 2014 at 10:48 AM, Dominic Meiser <span dir="ltr"><<a href="mailto:dmeiser@txcorp.com" target="_blank">dmeiser@txcorp.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Attached are the logs with 1 rank and 2
ranks. As far as I can tell these are different errors.<br>
<br>
For the log attached to the previous email I chose to run ex7
without mpirun so that valgrind checks ex7 and not mpirun. Is
there a way to have valgrind check the mpi processes rather than
mpirun?<br>
<br>
Cheers,<br>
Dominic<div><div class="h5"><br>
<br>
<br>
On 01/22/2014 10:37 AM, Paul Mullowney wrote:<br>
</div></div></div><div><div class="h5">
<blockquote type="cite">
<div dir="ltr">
<div>Hmmm. I may not have protected against the case where the
mpaijcusp(arse) classes are called but without mpirun/mpiexec.
I suppose it should have occurred to me that someone would do
this.</div>
<div> </div>
<div>try : </div>
<div>mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp</div>
<div> </div>
<div>In this scenario, the sequential to sequential vecscatters
should be called.</div>
<div> </div>
<div>Then,</div>
<div>mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp</div>
<div> </div>
<div>In this scenario, MPI_General vecscatters should be called
... and work correctly if you have a system with multiple
GPUs.</div>
<div> </div>
<div>I</div>
<div> </div>
<div>-Paul</div>
</div>
<div class="gmail_extra">
<br>
<br>
<div class="gmail_quote">On Wed, Jan 22, 2014 at 10:32 AM,
Dominic Meiser <span dir="ltr"><<a href="mailto:dmeiser@txcorp.com" target="_blank">dmeiser@txcorp.com</a>></span>
wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
Hey Paul,<br>
<br>
Thanks for providing background on this.
<div><br>
<br>
On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<br>
Dominic,<br>
A few years ago, I was trying to minimize the amount of
data transfer<br>
to and from the GPU (for multi-GPU MatMult) by
inspecting the indices<br>
of the data that needed to be message to and from the
device. Then, I<br>
would call gather kernels on the GPU which pulled the
scattered data<br>
into contiguous buffers and then be transferred to the
host<br>
asynchronously (while the MatMult was occurring). The
existence of<br>
VecScatterInitializeForGPU was added in order to build
the necessary<br>
buffers as needed. This was the motivation behind the
existence of<br>
VecScatterInitializeForGPU.<br>
An alternative approach is to message the smallest
contiguous buffer<br>
containing all the data with a single cudaMemcpyAsync.
This is the<br>
method currently implemented.<br>
I never found a case where the former implementation
(with a GPU<br>
gather-kernel) performed better than the alternative
approach which<br>
messaged the smallest contiguous buffer. I looked at
many, many matrices.<br>
Now, as far as I understand the VecScatter kernels, this
method should<br>
only get called if the transfer is MPI_General (i.e.
PtoP parallel to<br>
parallel). Other VecScatter methods are called in other
circumstances<br>
where the the scatter is not MPI_General. That
assumption could be<br>
wrong though.<br>
</blockquote>
<br>
<br>
</div>
I see. I figured there was some logic in place to make sure
that this function only gets called in cases where the
transfer type is MPI_General. I'm getting segfaults in this
function where the todata and fromdata are of a different
type. This could easily be user error but I'm not sure. Here
is an example valgrind error:<br>
<br>
==27781== Invalid read of size 8<br>
==27781== at 0x1188080: VecScatterInitializeForGPU
(vscatcusp.c:46)<br>
==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*,
_p_Vec*, _p_Vec*) (<a href="http://mpiaijcusparse.cu:108" target="_blank">mpiaijcusparse.cu:108</a>)<br>
==27781== by 0xA20CC3: MatMult (matrix.c:2242)<br>
==27781== by 0x4645E4: main (ex7.c:93)<br>
==27781== Address 0x286305e0 is 1,616 bytes inside a block
of size 1,620 alloc'd<br>
==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)<br>
==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int,
char const*, char const*, void**) (mal.c:27)<br>
==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long,
int, char const*, char const*, void**) (mtr.c:186)<br>
==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)<br>
==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)<br>
==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*,
MatAssemblyType) (mpiaij.c:706)<br>
==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)<br>
==27781== by 0x464301: main (ex7.c:78)<br>
<br>
This was produced by src/ksp/ksp/tutorials/ex7.c. The
command line options are<br>
<br>
./ex7 -mat_type mpiaijcusparse -vec_type cusp<br>
<br>
In this particular case the todata is of type
VecScatter_Seq_Stride and fromdata is of type
VecScatter_Seq_General. The complete valgrind log (including
configure options for petsc) is attached.<br>
<br>
Any comments or suggestions are appreciated.<br>
Cheers,<br>
Dominic<br>
<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div>
<br>
-Paul<br>
<br>
<br>
On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser <<a href="mailto:dmeiser@txcorp.com" target="_blank">dmeiser@txcorp.com</a><br>
</div>
<div>
<mailto:<a href="mailto:dmeiser@txcorp.com" target="_blank">dmeiser@txcorp.com</a>>>
wrote:<br>
<br>
Hi,<br>
<br>
I'm trying to understand VecScatterInitializeForGPU in<br>
</div>
src/vec/vec/utils/veccusp/__vscatcusp.c. I don't
understand why
<div><br>
this function can get away with casting the fromdata and
todata in<br>
the inctx to VecScatter_MPI_General. Don't we need to
inspect the<br>
VecScatterType fields of the todata and fromdata?<br>
<br>
Cheers,<br>
Dominic<br>
<br>
-- <br>
Dominic Meiser<br>
Tech-X Corporation<br>
5621 Arapahoe Avenue<br>
Boulder, CO 80303<br>
USA<br>
</div>
Telephone: <a href="tel:303-996-2036" target="_blank" value="+13039962036">303-996-2036</a> <tel:<a href="tel:303-996-2036" target="_blank" value="+13039962036">303-996-2036</a>><br>
Fax: <a href="tel:303-448-7756" target="_blank" value="+13034487756">303-448-7756</a>
<tel:<a href="tel:303-448-7756" target="_blank" value="+13034487756">303-448-7756</a>><br>
<a href="http://www.txcorp.com" target="_blank">www.txcorp.com</a> <<a href="http://www.txcorp.com" target="_blank">http://www.txcorp.com</a>><br>
<br>
<br>
</blockquote>
<div>
<div>
<br>
<br>
<br>
-- <br>
Dominic Meiser<br>
Tech-X Corporation<br>
5621 Arapahoe Avenue<br>
Boulder, CO 80303<br>
USA<br>
Telephone: <a href="tel:303-996-2036" target="_blank" value="+13039962036">303-996-2036</a><br>
Fax: <a href="tel:303-448-7756" target="_blank" value="+13034487756">303-448-7756</a><br>
<a href="http://www.txcorp.com" target="_blank">www.txcorp.com</a><br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<pre cols="72">--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: <a href="tel:303-996-2036" target="_blank" value="+13039962036">303-996-2036</a>
Fax: <a href="tel:303-448-7756" target="_blank" value="+13034487756">303-448-7756</a>
<a href="http://www.txcorp.com" target="_blank">www.txcorp.com</a></pre>
</div></div></div>
</blockquote></div><br></div>