[mpich-discuss] How to implement this case

Eric A. Borisch eborisch at ieee.org
Tue Jan 4 12:18:35 CST 2011


That shouldn't be the case; the request at index should get set to
MPI_REQUEST_NULL and not hurt anything:

**** Test program; Only uses two nodes as written!****
#include <iostream>

#include <mpi.h>

int main(int argc, char * argv[])
{
  MPI_Init(&argc, &argv);
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

#define BUFSIZE (0x1 << 20) // 1M-element
#define COUNT 4
  if (rank == 1)
  {
    int * buffer = (int *) malloc(BUFSIZE * sizeof(int));
    for (int i=0; i < COUNT; i++)
      MPI_Send(buffer, BUFSIZE, MPI_INT, 0, (i*3)%COUNT, MPI_COMM_WORLD);
  }
  else if (!rank)
  {
    int * buffers[COUNT];
    MPI_Request requests[COUNT];
    for (int i=0; i < COUNT; i++)
    {
      buffers[i] = (int *) malloc(BUFSIZE * sizeof(int));
      MPI_Irecv(buffers[i], BUFSIZE, MPI_INT, 1, i, MPI_COMM_WORLD, requests
+ i);
    }
    int rec = 0;
    while (rec != MPI_UNDEFINED)
    {
      std::cout << "Waiting... ";
      if(MPI_Waitany(COUNT, requests, &rec, MPI_STATUS_IGNORE))
      {
        std::cout << "Error in wait?" << std::endl;
        break;
      }
      else if (rec != MPI_UNDEFINED)
        std::cout << "Received message " << rec;
      else
        std::cout << "None left!        ";

      std::cout << " Requests: ";
      for (int i=0; i < COUNT; i++)
        if (requests[i] == MPI_REQUEST_NULL)
          std::cout << "N"; // NULL
        else
          std::cout << "V"; // VALID
      std::cout << std::endl;
    }
  }

  MPI_Finalize();
  return 0;
}
*** Execution output

Waiting... Received message 0 Requests: NVVV
Waiting... Received message 3 Requests: NVVN
Waiting... Received message 2 Requests: NVNN
Waiting... Received message 1 Requests: NNNN
Waiting... None left!         Requests: NNNN

 -Eric

On Mon, Jan 3, 2011 at 10:19 PM, Xiao Li <shinelee.thewise at gmail.com> wrote:

> Hi Eric,
>
> The if statement may make some error, I think it should be altered as .
>
>  if ++trunk_sent[index] != M do
>          MPI_Irecv(buffer[index], index, requests[index])
>  else
>         //remove the MPIRequest object of finished process, or else it
> might be halt forever
>         remove requests[index]
>  end
>
> cheers
> Xiao
>
> On Mon, Jan 3, 2011 at 4:24 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>
>> Hi Eric,
>>
>> You are right. An extra MPI_Irecv will be executed at end. Thanks for your
>> comment.
>>
>> cheers
>> Xiao
>>
>>
>> On Mon, Jan 3, 2011 at 4:16 PM, Eric A. Borisch <eborisch at ieee.org>wrote:
>>
>>> Looks about right... I'm assuming there is a <do actual work here> to be
>>> inserted between the MPI_Waitany and MPI_Irecv within the N*M-sized loop....
>>> and I count from 0 rather than 1 by force of habit... :)
>>>
>>> I think the logic below will end with one extra attempted MPI_Irecv than
>>> desired; perhaps change
>>>
>>> if trunk_sent[index] != M do
>>>    MPI_Irecv(buffer[index], index, requests[index])
>>>    trunk_sent[index]++
>>> end
>>>
>>> to
>>>
>>> if ++trunk_sent[index] != M do
>>>    MPI_Irecv(buffer[index], index, requests[index])
>>> end
>>>
>>>  -Eric
>>>
>>>
>>> On Mon, Jan 3, 2011 at 3:00 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>>>
>>>> Hi Eric,
>>>>
>>>> Thanks for your detailed suggestion. After read MPI documents, I propose
>>>> the following algorithm,
>>>>
>>>> //begin collecting data for the first trunk
>>>> for i=1 to N do
>>>>     MPI_Irecv(buffer[i], i, requests[i])
>>>> end
>>>> //set data sending counter
>>>> for i=1 to N do
>>>>     trunk_sent[i] = 0
>>>> end
>>>> //begin collecting data
>>>> for i=1 to N*M do
>>>>     MPI_Waitany(N, requests, &index, &status)
>>>>     if trunk_sent[index] != M do
>>>>          MPI_Irecv(buffer[index], index, requests[index])
>>>>          trunk_sent[index]++
>>>>     end
>>>> end
>>>>
>>>> May I know what is your opinion of this algorithm?
>>>>
>>>> cheers
>>>> Xiao
>>>>
>>>>
>>>> On Mon, Jan 3, 2011 at 3:31 PM, Eric A. Borisch <eborisch at ieee.org>wrote:
>>>>
>>>>> Xiao,
>>>>>
>>>>> You should be able to get by with just N buffers, one for each client.
>>>>> After you have processed the i-th iteration for client n, re-issue an
>>>>> MPI_Irecv with the same buffer. This will match up with the next MPI_Send
>>>>> from client n. You don't have to worry about synchronizing -- the MPI_Irecv
>>>>> does not need to be posted before the MPI_Send. (But the MPI_Send won't
>>>>> complete until it has been, of course...)
>>>>>
>>>>> You could always roll your own sockets, but MPI does a nice job of
>>>>> managing connections and messages for you. In addition, MPI can be used
>>>>> fairly efficiently on a wide range of interconnects, from shared memory to
>>>>> Infiniband with little to no change on the user's part.
>>>>>
>>>>> In addition, you could likely improve performance in MPI by having two
>>>>> sets (call them A and B) of buffers to send from on each worker; one is in
>>>>> the "send" state (let's call this one A, started with an MPI_Isend after it
>>>>> was initially filled) while you're filling B. After B is filled, initiate a
>>>>> new MPI_Isend (very quick) on B and then wait for A's first send (MPI_Wait)
>>>>> to complete. Once the first send on A is completed, you can start populating
>>>>> A with the next iteration's output, initiate A's send, wait for B's send to
>>>>> complete, and the cycle begins again.
>>>>>
>>>>> This approach allows you to overlay communication and computation
>>>>> times, and still works with the MPI_Waitany() approach to harvesting
>>>>> completed jobs in first-completed order on the master. This is an almost
>>>>> trivial thing to implement in MPI, but achieving it with sockets requires
>>>>> (IMHO) much more programmer overhead...
>>>>>
>>>>> Just my 2c.
>>>>>
>>>>>  Eric
>>>>>
>>>>>
>>>>> On Mon, Jan 3, 2011 at 1:24 PM, Xiao Li <shinelee.thewise at gmail.com>wrote:
>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Assume I have N workers and M trunks of sending data for each worker
>>>>>> respectively, then I have to create N*M data buffer for MPI_Irecv usage. Is
>>>>>> this method too costly?
>>>>>>
>>>>>> Or If I write raw socket programming, is that better? Just like
>>>>>> traditional client/server socket programming? Master listens on port
>>>>>> and spawn a new thread to accept worker's data storage request?
>>>>>>
>>>>>> cheers
>>>>>> Xiao
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 3, 2011 at 2:13 PM, Eric A. Borisch <eborisch at ieee.org>wrote:
>>>>>>
>>>>>>> Look at the documentation for MPI_Irecv and MPI_Testany ... these
>>>>>>> should help you do what you want.
>>>>>>>
>>>>>>>  Eric
>>>>>>>
>>>>>>> On Mon, Jan 3, 2011 at 12:45 PM, Xiao Li <shinelee.thewise at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi MPICH2 people,
>>>>>>>>
>>>>>>>> Now, I have a application that composed of single master and many
>>>>>>>> workers. The application requirement is very simple: workers finish some
>>>>>>>> jobs and send data to master and master store these data into files
>>>>>>>> separately. I can simply use MPI_Send on worker side to send data to master.
>>>>>>>> But master does not know the data sending sequence. Some worker go fast
>>>>>>>> while some are slow. More specifically, suppose there are 5 workers, then
>>>>>>>> the data send sequence may be 1,3,4,5,2 or 2,5,4,1,3. If I just write a for
>>>>>>>> loop for(i=1 to 5) on master side with MPI_Recv to get data, the master and
>>>>>>>> some faster worker have to wait for a long time. I know MPI_Gather can
>>>>>>>> implement this. But I am not sure is MPI_Gather works parallelly or just a
>>>>>>>> sequential MPI_Recv? Another issue is my data is extremely large, more than
>>>>>>>> 1GB data needed to be sent to master. If I divide the data into pieces, I do
>>>>>>>> not think MPI_Gather can work. I also tried to think about raw socket
>>>>>>>> programming, but I do not think it is a good practice. Would you give me
>>>>>>>> some suggestion please?
>>>>>>>>
>>>>>>>> cheers
>>>>>>>> Xiao
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110104/6e841c7a/attachment.htm>


More information about the mpich-discuss mailing list