[mpich-discuss] How to implement this case

Xiao Li shinelee.thewise at gmail.com
Mon Jan 3 15:00:33 CST 2011


Hi Eric,

Thanks for your detailed suggestion. After read MPI documents, I propose the
following algorithm,

//begin collecting data for the first trunk
for i=1 to N do
    MPI_Irecv(buffer[i], i, requests[i])
end
//set data sending counter
for i=1 to N do
    trunk_sent[i] = 0
end
//begin collecting data
for i=1 to N*M do
    MPI_Waitany(N, requests, &index, &status)
    if trunk_sent[index] != M do
         MPI_Irecv(buffer[index], index, requests[index])
         trunk_sent[index]++
    end
end

May I know what is your opinion of this algorithm?

cheers
Xiao

On Mon, Jan 3, 2011 at 3:31 PM, Eric A. Borisch <eborisch at ieee.org> wrote:

> Xiao,
>
> You should be able to get by with just N buffers, one for each client.
> After you have processed the i-th iteration for client n, re-issue an
> MPI_Irecv with the same buffer. This will match up with the next MPI_Send
> from client n. You don't have to worry about synchronizing -- the MPI_Irecv
> does not need to be posted before the MPI_Send. (But the MPI_Send won't
> complete until it has been, of course...)
>
> You could always roll your own sockets, but MPI does a nice job of managing
> connections and messages for you. In addition, MPI can be used fairly
> efficiently on a wide range of interconnects, from shared memory to
> Infiniband with little to no change on the user's part.
>
> In addition, you could likely improve performance in MPI by having two sets
> (call them A and B) of buffers to send from on each worker; one is in the
> "send" state (let's call this one A, started with an MPI_Isend after it was
> initially filled) while you're filling B. After B is filled, initiate a new
> MPI_Isend (very quick) on B and then wait for A's first send (MPI_Wait) to
> complete. Once the first send on A is completed, you can start populating A
> with the next iteration's output, initiate A's send, wait for B's send to
> complete, and the cycle begins again.
>
> This approach allows you to overlay communication and computation times,
> and still works with the MPI_Waitany() approach to harvesting completed jobs
> in first-completed order on the master. This is an almost trivial thing to
> implement in MPI, but achieving it with sockets requires (IMHO) much more
> programmer overhead...
>
> Just my 2c.
>
>  Eric
>
>
> On Mon, Jan 3, 2011 at 1:24 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>
>> Hi Eric,
>>
>> Assume I have N workers and M trunks of sending data for each worker
>> respectively, then I have to create N*M data buffer for MPI_Irecv usage. Is
>> this method too costly?
>>
>> Or If I write raw socket programming, is that better? Just like
>> traditional client/server socket programming? Master listens on port
>> and spawn a new thread to accept worker's data storage request?
>>
>> cheers
>> Xiao
>>
>>
>> On Mon, Jan 3, 2011 at 2:13 PM, Eric A. Borisch <eborisch at ieee.org>wrote:
>>
>>> Look at the documentation for MPI_Irecv and MPI_Testany ... these should
>>> help you do what you want.
>>>
>>>  Eric
>>>
>>> On Mon, Jan 3, 2011 at 12:45 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>>>
>>>> Hi MPICH2 people,
>>>>
>>>> Now, I have a application that composed of single master and many
>>>> workers. The application requirement is very simple: workers finish some
>>>> jobs and send data to master and master store these data into files
>>>> separately. I can simply use MPI_Send on worker side to send data to master.
>>>> But master does not know the data sending sequence. Some worker go fast
>>>> while some are slow. More specifically, suppose there are 5 workers, then
>>>> the data send sequence may be 1,3,4,5,2 or 2,5,4,1,3. If I just write a for
>>>> loop for(i=1 to 5) on master side with MPI_Recv to get data, the master and
>>>> some faster worker have to wait for a long time. I know MPI_Gather can
>>>> implement this. But I am not sure is MPI_Gather works parallelly or just a
>>>> sequential MPI_Recv? Another issue is my data is extremely large, more than
>>>> 1GB data needed to be sent to master. If I divide the data into pieces, I do
>>>> not think MPI_Gather can work. I also tried to think about raw socket
>>>> programming, but I do not think it is a good practice. Would you give me
>>>> some suggestion please?
>>>>
>>>> cheers
>>>> Xiao
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110103/a5ff841c/attachment-0001.htm>


More information about the mpich-discuss mailing list