[mpich-discuss] nemesis

Guillaume Mercier mercierg at mcs.anl.gov
Mon Aug 17 00:46:28 CDT 2009


Hum... this is strange.
I'm going to ckeck this and let you know.
All I can say is that Nemesis/MX *should* be compatible. This could be 
also a bug in Open-Mx.
Did you try to compile everything statically and without creating shared 
libs?


Guillaume


shenqian at tsinghua.org.cn a écrit :
> Hi Rajeev,
>
> I run "make testing" in the top-level mpich2-1.1.1 directory, got many many errors! Indeed, there are 94 tests failed against the all 553 tests in the summary.xml. My configure options are:
>
> ./configure --prefix=/opt/mpich2-install --with-device=ch3:nemesis:mx  --with-mx-lib=/opt/open-mx/lib/ --with-mx-include=/opt/open-mx/include/ --enable-sharedlibs=gcc 
>
> I also build mpich2-1.1.1 with the default settings, and run "make testing" in the top directory . The all 553 test passed! No fail in summary.xml.
>
> So is it really that Nemesis/MX is compatible with Open-MX? Or are there any missing options for configure?
>
> Thanks,
> Qian Shen
>
>   
>> You can also run "make testing" in the top-level mpich2 directory. It
>> will run the entire test suite in test/mpi. If they run, it would
>> indicate there is something wrong with your program.
>>
>> Rajeev 
>>
>>     
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
>>> shenqian at tsinghua.org.cn
>>> Sent: Sunday, August 16, 2009 8:54 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] nemesis
>>>
>>> Hi Guillaume,
>>>
>>> Thanks for your response. 
>>>
>>> Because the program is another company's property, I can't 
>>> send it to you :-(  If I can reproduce it with my own 
>>> program, I will send it to you as soon as possible. 
>>>
>>> I'm trying to using Nemesis/MX on top of open-mx for high 
>>> performance. I got a good performance improvement by using 
>>> open-mx instead of TCP/IP, but the hang problem at 
>>> MPI_Barrier() puzzles me these days. Would you tell me some 
>>> way to debug it? 
>>>
>>> Thanks,
>>> Qian Shen
>>>  
>>>       
>>>> Hello,
>>>>
>>>> Yes, Nemesis/MX is supposed to be compatible with Open-MX. 
>>>>         
>>> Could you 
>>>       
>>>> sent me your example program so that I can find out what is the 
>>>> problem?
>>>>
>>>>
>>>> Thanks.
>>>> Guillaume
>>>>
>>>> shenqian at tsinghua.org.cn a écrit :
>>>>         
>>>>> Hi,
>>>>>
>>>>> I build mpich2-1.1.1 on top of open-mx-1.1.1, and have 
>>>>>           
>>> ch3:nemesis:mx enabled. Nemesis/MX should be compatible with 
>>> Open-MX, isn't it? I test the examples shipped by mpich2, 
>>> they work well. But my MPI program always hang on 
>>> MPI_Barrier(). The output messages are:
>>>       
>>>>> Open-MX: Send request (seqnum 105 sesnum 0) timeout, already sent 
>>>>> 1001 times, resetting partner status
>>>>> Open-MX: Cleaning partner 00:11:09:5b:7d:16 endpoint 0
>>>>> Open-MX: Dropped 1 pending send requests to partner
>>>>>
>>>>> It seems like that the requests can not be send to other 
>>>>>           
>>> nodes. If I switch to the traditional TCP/IP stack, use the 
>>> ch3:nemesis device, the program can run successfully. 
>>>       
>>>>> Could anyone tell me how to handle this issue? 
>>>>>
>>>>> Regards,
>>>>> Qian Shen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   
>>>>>           
>>>       
>
>   



More information about the mpich-discuss mailing list