[mpich-discuss] Poor MPICH Performance on Windows Vista vs. XP

Rahul Mukerjee r.mukerjee at gmail.com
Tue Nov 3 15:01:18 CST 2009


Jayesh,

On Windows Vista, when I call mpiexec without "-channel nemesis", my
application runs fine, but with nemesis, it crashes. I get the
following error:

application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773).......................: MPI_Allreduce(sbuf=06313540, rbuf=063
135C8, count=16, MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Allreduce(467)......................:
MPIC_Recv(83)............................:
MPIC_Wait(513)...........................:
MPIDI_CH3I_Progress(150).................:
MPID_nem_mpich2_blocking_recv(948).......:
MPID_nem_newtcp_module_poll(154).........:
MPID_nem_newtcp_module_connpoll(1841)....:
state_commrdy_handler(1663)..............:
MPID_nem_newtcp_module_recv_handler(1547):
MPID_nem_newtcp_module_recv_handler(1546): read from socket failed, An existing
connection was forcibly closed by the remote host.
 (errno 10054)

job aborted:
rank: node: exit code[: error message]
0: Computer138: 1: process 0 exited without calling finalize
1: Computer133: 59: process 1 exited without calling finalize
2: Computer136: 123

Would you know why this is happening?

Sincerely,

Rahul.

On Thu, Oct 29, 2009 at 8:04 PM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:
> Hi,
>  The default channel used by MPICH2 on windows is the sock channel (all MPI communication is performed using tcp sockets). The nemesis channel performs local (same node) communication using shared memory and remote (across nodes) communication using tcp sockets (on windows) .
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Rahul Mukerjee" <r.mukerjee at gmail.com>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Thursday, October 29, 2009 5:34:17 PM GMT -06:00 US/Canada Central
> Subject: Re: [mpich-discuss] Poor MPICH Performance on Windows Vista vs. XP
>
> Hi Jayesh,
>
> I installed the latest release, 1.2 of MPICH2 on the Vista machines.
> The -channel nemesis option showed impressive gains. Here are the
> results:
>
> 3 hosts: Time = 0.354110 secs
> 4 hosts: Time = 13.612531 secs
> 5 hosts: Time = 13.849166 secs
>
> With -channel nemesis:
> 3 hosts: Time = 0.045664 secs
> 4 hosts: Time = 0.253233 secs
> 5 hosts: Time = 0.270557 secs
>
> Can you please tell me what exactly was the issue here? Also, I am yet
> to test on the XP machines but I am assuming that this option will
> perform equally well there. Is that correct? And finally, I'd also
> like to mention that I had seen very poor scalability for my
> application on a quad-core machine with the -localonly option. I hope
> to see some dramatic improvements with the 'nemesis'. Please correct
> me if I am wrong.
>
> Thank you very much for your help.
>
> Sincerely,
>
> Rahul Mukerjee.
>
> On Thu, Oct 29, 2009 at 12:52 PM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:
>> Hi,
>>  Oh... and try the latest stable release, 1.2, of MPICH2 (1.0.8 is old) available at http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads .
>>  Let us know your findings.
>>
>> (PS: With WinXP make sure that you have SP3 installed before installing MPICH2.)
>> Regards,
>> Jayesh
>> ----- Original Message -----
>> From: "Jayesh Krishna" <jayesh at mcs.anl.gov>
>> To: "r mukerjee" <r.mukerjee at gmail.com>
>> Cc: mpich-discuss at mcs.anl.gov
>> Sent: Thursday, October 29, 2009 12:47:39 PM GMT -06:00 US/Canada Central
>> Subject: Re: [mpich-discuss] Poor MPICH Performance on Windows Vista vs. XP
>>
>> Hi,
>>  Did you try the nemesis channel (mpiexec -n 2 -channel nemesis barrier_test.exe)?
>>
>> Regards,
>> Jayesh
>> ----- Original Message -----
>> From: "Rahul Mukerjee" <r.mukerjee at gmail.com>
>> To: mpich-discuss at mcs.anl.gov
>> Sent: Thursday, October 29, 2009 12:17:01 PM GMT -06:00 US/Canada Central
>> Subject: [mpich-discuss] Poor MPICH Performance on Windows Vista vs. XP
>>
>> Hi,
>>
>> I am using MPICH2 version 1.0.8p1. I am using a simple C code to
>> measure the time taken for 100 MPI_BARRIER calls in Windows Vista vs.
>> XP. Here is the code:
>>
>> #include "mpi.h"
>> #include <stdio.h>
>> #include <stdlib.h>
>> int main(int argc, char **argv)
>> {
>>    int cpuid, ncpu;
>>    int i, count;
>>    double t1, t2, sum;
>>    count = 100;
>>    sum = 0.0;
>>    MPI_Init(&argc, &argv);
>>    MPI_Comm_size(MPI_COMM_WORLD, &ncpu);
>>    MPI_Comm_rank(MPI_COMM_WORLD, &cpuid);
>>    printf("NCPU:%d, CPUID:%d\n", ncpu, cpuid);
>>    fflush(stdout);
>>    printf("start barrier\n"); fflush(stdout);
>>    for (i=0; i<count; i++)
>>        {
>>                t1 = MPI_Wtime();
>>                MPI_Barrier(MPI_COMM_WORLD);
>>                t2 = MPI_Wtime();
>>                if (cpuid == 0) sum = sum + (t2 - t1);
>>        }
>>        printf("end barrier\n"); fflush(stdout);
>>        if (cpuid == 0) printf("Time = %lf\n", sum);
>>        MPI_Finalize();
>>    return 0;
>> }
>>
>> The results are:
>>
>> Windows XP:
>> 3 hosts: Time = 0.034093 secs
>> 4 hosts: Time = 0.041328 secs
>>
>> Windows Vista:
>> 3 hosts: Time = 0.347072 secs (Slow, but still bearable)
>> 4 hosts: Time = 11.820926 secs (WTF???)
>>
>> Someone had posted earlier about poor MPICH performance with HPC
>> Server 2008 / Vista. This is in continuation of that, in a way. Any
>> help would be greatly appreciated. Thank you.
>>
>> Sincerely,
>>
>> Rahul Mukerjee.
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>


More information about the mpich-discuss mailing list