[mpich-discuss] FW: Regarding MPICH2-1.1.1p1 testing basing on open-mx

Rajeev Thakur thakur at mcs.anl.gov
Mon Mar 22 08:46:01 CDT 2010


Not sure if this message went out on the list...

  _____  

From: ??? [mailto:limu713 at gmail.com] 
Sent: Sunday, March 21, 2010 10:25 PM
To: mpich-discuss at mcs.anl.gov; Brice.Goglin at inria.fr; mercier at labri.fr
Subject: Re: [mpich-discuss] Regarding MPICH2-1.1.1p1 testing basing on open-mx


hello,

We used rhel server5.4. There is two cores per node, but only one core was used. 
Details are attached. Other nodes have the same configuration.

When testing, we have this error message:
[root at cu02 ~]# mpiexec -n 4 /usr/lib64/mpich2/bin/mpitests-IMB-EXT  Unidir_Get

>> rank 0 in job 8 cu02.hpc.com_54277 caused collective abort of all
>> ranks exit status of rank 0: killed by signal 9

  And the same wrong comes with "mpiexec -n 4
  /usr/lib64/mpich2/bin/mpitests-IMB-EXT Bidir_Get "



 

2010/3/20 Brice Goglin <Brice.Goglin at inria.fr>


Some bugs were reported in the past about some MPICH2 tests not working,
but we never reproduced them with recent MPICH2 and Open-MX versions.
I'd like to know what kind of interfaces, hosts and kernels were used
here. And also how many processes per node were used.

Brice




Dave Goodell wrote:
> I don't think that we have tested OpenMX with the mx netmod, so I'm
> not sure if there are any bugs there.  I've CCed the primary
> developers of both OpenMX and our mx netmod in case they have any
> information on this.
>
> Do simpler tests work?  The "examples/cpi" program in your MPICH2
> build directory is a good simple sanity test.
>
> -Dave
>
> On Mar 19, 2010, at 3:31 AM, 李俊丽 wrote:
>
>> Hello,
>>
>> Just do:
>> ./configure  --with-device=ch3:nemesis:mx
>> --with-mx-lib=/opt/open-mx/lib/ --with-mx-include=/opt/open-mx/include/
>>
>> make
>>
>> make install
>>
>> Then, I start open-omx service, and test mpich2 based on open-mx.
>>
>>
>>
>> [root at cu02 ~]# mpiexec -n 4 /usr/lib64/mpich2/bin/mpitests-IMB-EXT
>> Unidir_Get
>>
>>
>>
>> It has this error message:
>>
>> rank 0 in job 8 cu02.hpc.com_54277 caused collective abort of all
>> ranks exit status of rank 0: killed by signal 9
>>
>> And the same wrong comes with "mpiexec -n 4
>> /usr/lib64/mpich2/bin/mpitests-IMB-EXT Bidir_Get "
>>
>> Is there any way to solve this problem?
>>
>> Thanks!
>>
>> Regards
>>
>> Lily
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100322/d4e27a28/attachment.htm>


More information about the mpich-discuss mailing list