[mpich-discuss] How to implement this case

Eric A. Borisch eborisch at ieee.org
Mon Jan 3 15:16:56 CST 2011


Looks about right... I'm assuming there is a <do actual work here> to be
inserted between the MPI_Waitany and MPI_Irecv within the N*M-sized loop....
and I count from 0 rather than 1 by force of habit... :)

I think the logic below will end with one extra attempted MPI_Irecv than
desired; perhaps change

if trunk_sent[index] != M do
   MPI_Irecv(buffer[index], index, requests[index])
   trunk_sent[index]++
end

to

if ++trunk_sent[index] != M do
   MPI_Irecv(buffer[index], index, requests[index])
end

 -Eric

On Mon, Jan 3, 2011 at 3:00 PM, Xiao Li <shinelee.thewise at gmail.com> wrote:

> Hi Eric,
>
> Thanks for your detailed suggestion. After read MPI documents, I propose
> the following algorithm,
>
> //begin collecting data for the first trunk
> for i=1 to N do
>     MPI_Irecv(buffer[i], i, requests[i])
> end
> //set data sending counter
> for i=1 to N do
>     trunk_sent[i] = 0
> end
> //begin collecting data
> for i=1 to N*M do
>     MPI_Waitany(N, requests, &index, &status)
>     if trunk_sent[index] != M do
>          MPI_Irecv(buffer[index], index, requests[index])
>          trunk_sent[index]++
>     end
> end
>
> May I know what is your opinion of this algorithm?
>
> cheers
> Xiao
>
>
> On Mon, Jan 3, 2011 at 3:31 PM, Eric A. Borisch <eborisch at ieee.org> wrote:
>
>> Xiao,
>>
>> You should be able to get by with just N buffers, one for each client.
>> After you have processed the i-th iteration for client n, re-issue an
>> MPI_Irecv with the same buffer. This will match up with the next MPI_Send
>> from client n. You don't have to worry about synchronizing -- the MPI_Irecv
>> does not need to be posted before the MPI_Send. (But the MPI_Send won't
>> complete until it has been, of course...)
>>
>> You could always roll your own sockets, but MPI does a nice job of
>> managing connections and messages for you. In addition, MPI can be used
>> fairly efficiently on a wide range of interconnects, from shared memory to
>> Infiniband with little to no change on the user's part.
>>
>> In addition, you could likely improve performance in MPI by having two
>> sets (call them A and B) of buffers to send from on each worker; one is in
>> the "send" state (let's call this one A, started with an MPI_Isend after it
>> was initially filled) while you're filling B. After B is filled, initiate a
>> new MPI_Isend (very quick) on B and then wait for A's first send (MPI_Wait)
>> to complete. Once the first send on A is completed, you can start populating
>> A with the next iteration's output, initiate A's send, wait for B's send to
>> complete, and the cycle begins again.
>>
>> This approach allows you to overlay communication and computation times,
>> and still works with the MPI_Waitany() approach to harvesting completed jobs
>> in first-completed order on the master. This is an almost trivial thing to
>> implement in MPI, but achieving it with sockets requires (IMHO) much more
>> programmer overhead...
>>
>> Just my 2c.
>>
>>  Eric
>>
>>
>> On Mon, Jan 3, 2011 at 1:24 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>>
>>> Hi Eric,
>>>
>>> Assume I have N workers and M trunks of sending data for each worker
>>> respectively, then I have to create N*M data buffer for MPI_Irecv usage. Is
>>> this method too costly?
>>>
>>> Or If I write raw socket programming, is that better? Just like
>>> traditional client/server socket programming? Master listens on port
>>> and spawn a new thread to accept worker's data storage request?
>>>
>>> cheers
>>> Xiao
>>>
>>>
>>> On Mon, Jan 3, 2011 at 2:13 PM, Eric A. Borisch <eborisch at ieee.org>wrote:
>>>
>>>> Look at the documentation for MPI_Irecv and MPI_Testany ... these should
>>>> help you do what you want.
>>>>
>>>>  Eric
>>>>
>>>> On Mon, Jan 3, 2011 at 12:45 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>>>>
>>>>> Hi MPICH2 people,
>>>>>
>>>>> Now, I have a application that composed of single master and many
>>>>> workers. The application requirement is very simple: workers finish some
>>>>> jobs and send data to master and master store these data into files
>>>>> separately. I can simply use MPI_Send on worker side to send data to master.
>>>>> But master does not know the data sending sequence. Some worker go fast
>>>>> while some are slow. More specifically, suppose there are 5 workers, then
>>>>> the data send sequence may be 1,3,4,5,2 or 2,5,4,1,3. If I just write a for
>>>>> loop for(i=1 to 5) on master side with MPI_Recv to get data, the master and
>>>>> some faster worker have to wait for a long time. I know MPI_Gather can
>>>>> implement this. But I am not sure is MPI_Gather works parallelly or just a
>>>>> sequential MPI_Recv? Another issue is my data is extremely large, more than
>>>>> 1GB data needed to be sent to master. If I divide the data into pieces, I do
>>>>> not think MPI_Gather can work. I also tried to think about raw socket
>>>>> programming, but I do not think it is a good practice. Would you give me
>>>>> some suggestion please?
>>>>>
>>>>> cheers
>>>>> Xiao
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110103/bb0ca4d3/attachment.htm>


More information about the mpich-discuss mailing list