[mpich-discuss] Sending data many times, packing data or using derived type?

Jeff Hammond jhammond at alcf.anl.gov
Tue Jun 5 02:14:42 CDT 2012


This is a fairly common question so please post your test code and
results if you can.  I'm sure others would appreciate it.

Torsten Hoeffler (and others) have written papers (sometimes in
EuroMPI) on this subject.  You might check out what Google Scholar
gives you.  It might safe you some trouble writing a benchmark,
although I recall the benchmarks for MPI datatypes that I've seen are
not for your use case (although it is probably the most common, which
indicates my knowledge of the MPI datatype literature may be
incomplete).

I suspect this is architecture dependent.  For example, Blue Gene has
a relatively slow CPU compared to the network and using MPI datatypes
tend to not pay off, while they should be effective if you have an
late-model Intel with Ethernet.

Jeff

On Mon, Jun 4, 2012 at 10:20 PM, TAY wee-beng <zonexo at gmail.com> wrote:
> Hi Jed,
>
> Thanks. For a while I thought I emailed to the wrong mailing list ;-)
>
> I'll do a simple subroutine to check.
>
> Yours sincerely,
>
> TAY wee-beng
>
>
> On 4/6/2012 5:02 PM, Jed Brown wrote:
>
> On Mon, Jun 4, 2012 at 9:52 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>> Hi,
>>
>> I am doing computational fluid dynamics and I have a 3D finite volume
>> code. I have partitioned the data in the z direction. At times, I need to
>> copy some boundary data (1 2d slice) from 1 processor to another. They are
>> velocities in u,v and w and contiguous in memory.
>
>
> I notice you frequently on the PETSc list so I'll point out that VecScatter
> (and DMDA for structured grids) handle this in a generic way that can be
> mapped to MPI in several different ways.
>
>>
>> Is it recommended to
>>
>> 1. send u,v,w as seperate MPI calls or
>>
>> 2. copy all the u,v,w data into a 1D array and send just once. Then copy
>> and update the data or
>>
>> 3. Use derived type and group all these data together? Then copy and
>> update the data.
>>
>> Which is a better choice? Does it depend on the size of the data? I think
>> my cluster uses InifiniBand, if I'm not wrong.
>
>
> Interlacing u,v,w together in memory is generally better for serial
> performance because it reuses cache more effectively and keeps the number of
> memory streams manageable. It is also better for packing buffers because the
> relevant data is less scattered in memory.
>
> Relative performance of user packing versus datatypes is quite
> implementation- and hardware-dependent. You can implement both or just
> implement one and plan to write the other implementation if you have
> evidence that it will be tangibly better (and your time is best spent tuning
> at that level).
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond


More information about the mpich-discuss mailing list