<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 08/20/14 16:03, Karl Rupp wrote:<br>
</div>
<blockquote cite="mid:53F4AABF.1070309@iue.tuwien.ac.at" type="cite">
<br>
<blockquote type="cite">
<blockquote type="cite">What you could do with 4N procs for
PETSc is to define your own matrix
<br>
layout, where only one out of four processes actually owns
part of the
<br>
matrix. After MatAssemblyBegin()/MatAssemblyEnd() the full
data gets
<br>
correctly transferred to N procs, with the other 3*N procs
being
<br>
'empty'. You should then be able to run the solver with all
4*N
<br>
processors, but only N of them actually do the work on the
GPUs.
<br>
</blockquote>
OK, I understand your solution, as I was already thinking about
that,
<br>
thanks to confirm it. But, my fear was that the performance was
not
<br>
improved. Indeed, I still don't understand (even after
<br>
analyzing -log_summary profiles and searching in the petsc-dev
archives)
<br>
what is slowing down with several MPI tasks sharing one GPU,
compared to
<br>
one MPI task working with one GPU...
<br>
In the proposed solution, 4*N processes will still exchange MPI
messages
<br>
during a KSP iteration, and the amount of data copy will be the
same
<br>
between GPU and CPU(s), so if you could enlighten
<br>
me, I will be glad.
<br>
</blockquote>
<br>
One of the causes of the performance penalty you observe is the
higher PCI-Express communication: If four ranks share a single
GPU, then each matrix-vector product requires at least 8 vector
transfers between host and device, rather than just 2 with a
single MPI rank. Similarly, you have four times the number of
kernel launches. It may well be that these overheads just eat up
all the performance gains you could otherwise obtain. I don't know
your profiling data, so I can't be more specific at this point.
</blockquote>
<br>
Thanks a lot Karli for the explanations. I am currently trying your
solution.<br>
<br>
Pierre<br>
<blockquote cite="mid:53F4AABF.1070309@iue.tuwien.ac.at" type="cite">
<br>
Best regards,
<br>
Karli
<br>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<b>Trio_U support team</b>
<br>
Marthe ROUX (01 69 08 00 02) Saclay
<br>
Pierre LEDAC (04 38 78 91 49) Grenoble
</div>
</body>
</html>