<div class="gmail_quote">On Mon, Jun 4, 2012 at 9:52 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

<br>

I am doing computational fluid dynamics and I have a 3D finite volume code. I have partitioned the data in the z direction. At times, I need to copy some boundary data (1 2d slice) from 1 processor to another. They are velocities in u,v and w and contiguous in memory.<br>

</blockquote><div><br></div><div>I notice you frequently on the PETSc list so I'll point out that VecScatter (and DMDA for structured grids) handle this in a generic way that can be mapped to MPI in several different ways.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Is it recommended to<br>

<br>

1. send u,v,w as seperate MPI calls or<br>

<br>

2. copy all the u,v,w data into a 1D array and send just once. Then copy and update the data or<br>

<br>

3. Use derived type and group all these data together? Then copy and update the data.<br>

<br>

Which is a better choice? Does it depend on the size of the data? I think my cluster uses InifiniBand, if I'm not wrong.</blockquote><div><br></div><div>Interlacing u,v,w together in memory is generally better for serial performance because it reuses cache more effectively and keeps the number of memory streams manageable. It is also better for packing buffers because the relevant data is less scattered in memory.</div>

<div><br></div><div>Relative performance of user packing versus datatypes is quite implementation- and hardware-dependent. You can implement both or just implement one and plan to write the other implementation if you have evidence that it will be tangibly better (and your time is best spent tuning at that level).</div>

</div>