[mpich-discuss] Bcast question?

Sat Aug 11 20:40:55 CDT 2012

John,

I found the following instructive.  I ended up using the MPI_Write_all with file views (page 35 with a nice example starting on page 48).  With a file view, each process specifies it's own access information.

https://fs.hlrs.de/projects/par/events/2011/parallel_prog_2011/2011XE6-1/07-IO_Optimization.pdf

Daniel Kokron
NASA Ames (ARC-TN)
SciCon group
301-286-3959

________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of John Chludzinski [john.chludzinski at gmail.com]
Sent: Saturday, August 11, 2012 6:47 AM
To: William Gropp
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Bcast question?

I followed up on your suggestion to look into MPI-IO - GREAT suggestion.

I found an example at http://beige.ucs.indiana.edu/I590/node92.html and added code to gather the pieces of the file read in by each process:

MPI_Gather( read_buffer, number_of_bytes, MPI_BYTE, rbuf, number_of_bytes, MPI_BYTE, MASTER_RANK, MPI_COMM_WORLD);

All process execute this line.  The problem is that number_of_bytes maybe different for the last process if  total_number_of_bytes is not a multiple of pool_size (i.e., total_number_of_bytes % pool_size != 0).  And if the value isn't the same for all processes, you get:

Fatal error in PMPI_Gather: Message truncated

If I set pool_size (the number of processes) so that total_number_of_bytes is a multiple of it (i.e., total_number_of_bytes % pool_size == 0), the code executes without error.

I thought I read in Peter Pacheco's book that this need not necessarily be required?

---John

On Fri, Aug 10, 2012 at 9:58 AM, William Gropp <wgropp at illinois.edu<mailto:wgropp at illinois.edu>> wrote:
The most likely newbe mistake is that you are timing the time time that the MPI_Bcast is waiting - for example, if your code looks like this:

if (rank == 0) { tr = MPI_Wtime(); read data tr = MPI_Wtime()-tr; }
tb = MPI_Wtime(): MPI_Bcast(…); tb = MPI_Wtime() - tb;

then on all but rank 0, you are timing the time that MPI_Bcast is waiting for the read data step to finish.  Instead, consider adding an MPI_Barrier before the MPI_Bcast:

if (rank == 0) { tr = MPI_Wtime(); read data tr = MPI_Wtime()-tr; }
MPI_Barrier();
tb = MPI_Wtime(): MPI_Bcast(…); tb = MPI_Wtime() - tb;

*Only* do this when you are trying to answer such timing questions.

You may also want to consider using MPI-IO to parallelize the read step.

Bill

William Gropp
Director, Parallel Computing Institute
Deputy Director for Research
Institute for Advanced Computing Applications and Technologies
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign

On Aug 10, 2012, at 3:03 AM, John Chludzinski wrote:

> I have a problem which requires all process to have a copy of an array of data that is read in from a file (process 0).  I Bcast the array to all processes (using MPI_COMM_WORLD).
>
> I instrumented the code with some calls to Wtime to find the time consumed for different actions.  In particular, I was interested in comparing the time required for Bcast's vs. fread's.  The size of the array is 1,200,000 of type MPI_DOUBLE.
>
> For a 3 process run:
>
> RANK = 0      fread_time = 2.323575
>
> vs.
>
> RANK = 2      bcast_time = 2.361233
> RANK = 0      bcast_time = 0.081910
> RANK = 1      bcast_time = 2.399790
>
> These numbers seem to indicate that Bcast-ing the data is as slow as reading the data from a file (on my Western Digital Passport USB drive).  Am I making a newbie mistake?
>
> ---John
>
> PS> I'm using Fedora 16 (32 bit) notebook with a dual core AMD Phenom processor.
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov<mailto:mpich-discuss at mcs.anl.gov>
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss