[mpich-discuss] Persistent communications between 2 computers
Thierry Roudier
thierry.roudier at kiastek.com
Thu Sep 8 16:24:58 CDT 2011
Thanks Darius for your reply.
Unfortunately, what you suggested doesn't work. I still have the same
error message. But as the error only occurs on some transactions, I
avoid the crash by using MPI_Errhandler_set(..., MPI_ERRORS_RETURN). And
I handle the error in my program.
I also saw by setting the MPI parameter "channel" to sock in the Windows
Registry, the behavior of persistent communications is more stable.
-Thierry
On 08/09/2011 5:25 PM, Darius Buntinas wrote:
> The error message indicates that the second process was unable to look up the address of the other process. Rather than using "localhost" use the actual address or name of the machine. See if this helps.
>
> -d
>
>
> On Sep 8, 2011, at 1:46 AM, Thierry Roudier wrote:
>
>> Hi all,
>>
>> I want to use persistent communications to exchange data (below is the code I use). It works well on a single computer (mpiexec -n 2 teststart). But if I want to distribute the code on 2 computers (Win7) by using the command line: mpiexec -hosts 2 localhost 192.168.1.12 teststart. Unfortunately it doesn't work, and I got the following message:
>>> Fatal error in PMPI_Waitall: Other MPI error, error stack:
>>> PMPI_Waitall(274)....................: MPI_Waitall(count=2, req_array=004208B0, status_array=00420880) failed
>>> MPIR_Waitall_impl(121)...............:
>>> MPIDI_CH3I_Progress(402).............:
>>> MPID_nem_mpich2_blocking_recv(905)...:
>>> MPID_nem_newtcp_module_poll(37)......:
>>> MPID_nem_newtcp_module_connpoll(2669):
>>> gen_cnting_fail_handler(1738)........: connect failed - The semaphore timeout period has expired.
>>> (errno 121)
>>>
>>> ****** Persistent Communications *****
>>> Trials= 1
>>> Reps/trial= 1000
>>> Message Size Bandwidth (bytes/sec)
>>> Fatal error in PMPI_Waitall: Other MPI error, error stack:
>>> PMPI_Waitall(274)....................: MPI_Waitall(count=2, req_array=004208B0, status_array=00420880) failed
>>> MPIR_Waitall_impl(121)...............:
>>> MPIDI_CH3I_Progress(402).............:
>>> MPID_nem_mpich2_blocking_recv(905)...:
>>> MPID_nem_newtcp_module_poll(37)......:
>>> MPID_nem_newtcp_module_connpoll(2655):
>>> gen_read_fail_handler(1145)..........: read from socket failed - Le nom rÚseau spÚcifiÚ nÆest plus disponible.
>> And below is the code I use.
>>
>> Thanks a lot for your help.
>>
>> -Thierry
>>
>>> #include<mpi.h>
>>> #include<stdio.h>
>>>
>>> /* Modify these to change timing scenario */
>>> #define TRIALS 1
>>> #define STEPS 1
>>> #define MAX_MSGSIZE 1048576 /* 2^STEPS */
>>> #define REPS 1000
>>> #define MAXPOINTS 10000
>>>
>>> int numtasks, rank, tag=999, n, i, j, k, this, msgsizes[MAXPOINTS];
>>> double mbytes, tbytes, results[MAXPOINTS], ttime, t1, t2;
>>> char sbuff[MAX_MSGSIZE], rbuff[MAX_MSGSIZE];
>>> MPI_Status stats[2];
>>> MPI_Request reqs[2];
>>>
>>> int main(argc,argv)
>>> int argc;
>>> char *argv[]; {
>>>
>>> MPI_Init(&argc,&argv);
>>> MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
>>> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>>
>>> /**************************** task 0 ***********************************/
>>> if (rank == 0) {
>>>
>>> /* Initializations */
>>> n=1;
>>> for (i=0; i<=STEPS; i++) {
>>> msgsizes[i] = n;
>>> results[i] = 0.0;
>>> n=n*2;
>>> }
>>> for (i=0; i<MAX_MSGSIZE; i++)
>>> sbuff[i] = 'x';
>>>
>>> /* Greetings */
>>> printf("\n****** Persistent Communications *****\n");
>>> printf("Trials= %8d\n",TRIALS);
>>> printf("Reps/trial= %8d\n",REPS);
>>> printf("Message Size Bandwidth (bytes/sec)\n");
>>>
>>> /* Begin timings */
>>> for (k=0; k<TRIALS; k++) {
>>>
>>> n=1;
>>> for (j=0; j<=STEPS; j++) {
>>>
>>> /* Setup persistent requests for both the send and receive */
>>> MPI_Recv_init (&rbuff, n, MPI_CHAR, 1, tag, MPI_COMM_WORLD,&reqs[0]);
>>> MPI_Send_init (&sbuff, n, MPI_CHAR, 1, tag, MPI_COMM_WORLD,&reqs[1]);
>>>
>>> t1 = MPI_Wtime();
>>> for (i=1; i<=REPS; i++){
>>> MPI_Startall (2, reqs);
>>> MPI_Waitall (2, reqs, stats);
>>> }
>>> t2 = MPI_Wtime();
>>>
>>> /* Compute bandwidth and save best result over all TRIALS */
>>> ttime = t2 - t1;
>>> tbytes = sizeof(char) * n * 2.0 * (float)REPS;
>>> mbytes = tbytes/ttime;
>>> if (results[j]< mbytes)
>>> results[j] = mbytes;
>>>
>>> /* Free persistent requests */
>>> MPI_Request_free (&reqs[0]);
>>> MPI_Request_free (&reqs[1]);
>>> n=n*2;
>>> } /* end j loop */
>>> } /* end k loop */
>>>
>>> /* Print results */
>>> for (j=0; j<=STEPS; j++) {
>>> printf("%9d %16d\n", msgsizes[j], (int)results[j]);
>>> }
>>>
>>> } /* end of task 0 */
>>>
>>>
>>>
>>> /**************************** task 1 ************************************/
>>> if (rank == 1) {
>>>
>>> /* Begin timing tests */
>>> for (k=0; k<TRIALS; k++) {
>>>
>>> n=1;
>>> for (j=0; j<=STEPS; j++) {
>>>
>>> /* Setup persistent requests for both the send and receive */
>>> MPI_Recv_init (&rbuff, n, MPI_CHAR, 0, tag, MPI_COMM_WORLD,&reqs[0]);
>>> MPI_Send_init (&sbuff, n, MPI_CHAR, 0, tag, MPI_COMM_WORLD,&reqs[1]);
>>>
>>> for (i=1; i<=REPS; i++){
>>> MPI_Startall (2, reqs);
>>> MPI_Waitall (2, reqs, stats);
>>> }
>>>
>>> /* Free persistent requests */
>>> MPI_Request_free (&reqs[0]);
>>> MPI_Request_free (&reqs[1]);
>>> n=n*2;
>>>
>>> } /* end j loop */
>>> } /* end k loop */
>>> } /* end task 1 */
>>>
>>>
>>> MPI_Finalize();
>>>
>>> } /* end of main */
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list