[mpich-discuss] nemesis

shenqian at tsinghua.org.cn shenqian at tsinghua.org.cn
Mon Aug 17 03:07:12 CDT 2009


I tried again, still got many many errors. My options are: 

./configure --with-device=ch3:nemesis:mx  --with-mx-lib=/opt/open-mx/lib/ --with-mx-include=/opt/open-mx/include/ 


> 
> I think that if you put nothing it should work fine.
> 
> 
> 
> shenqian at tsinghua.org.cn a écrit :
> >> Hum... this is strange.
> >> I'm going to ckeck this and let you know.
> >> All I can say is that Nemesis/MX *should* be compatible. This could be 
> >> also a bug in Open-Mx.
> >> Did you try to compile everything statically and without creating shared 
> >> libs?
> >>     
> >
> > which option should I use?  --enable-sharedlibs=none or --enable-dynamiclibs=none ?
> >
> >
> >   
> >> Guillaume
> >>
> >>
> >> shenqian at tsinghua.org.cn a écrit :
> >>     
> >>> Hi Rajeev,
> >>>
> >>> I run "make testing" in the top-level mpich2-1.1.1 directory, got many many errors! Indeed, there are 94 tests failed against the all 553 tests in the summary.xml. My configure options are:
> >>>
> >>> ./configure --prefix=/opt/mpich2-install --with-device=ch3:nemesis:mx  --with-mx-lib=/opt/open-mx/lib/ --with-mx-include=/opt/open-mx/include/ --enable-sharedlibs=gcc 
> >>>
> >>> I also build mpich2-1.1.1 with the default settings, and run "make testing" in the top directory . The all 553 test passed! No fail in summary.xml.
> >>>
> >>> So is it really that Nemesis/MX is compatible with Open-MX? Or are there any missing options for configure?
> >>>
> >>> Thanks,
> >>> Qian Shen
> >>>
> >>>   
> >>>       
> >>>> You can also run "make testing" in the top-level mpich2 directory. It
> >>>> will run the entire test suite in test/mpi. If they run, it would
> >>>> indicate there is something wrong with your program.
> >>>>
> >>>> Rajeev 
> >>>>
> >>>>     
> >>>>         
> >>>>> -----Original Message-----
> >>>>> From: mpich-discuss-bounces at mcs.anl.gov 
> >>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
> >>>>> shenqian at tsinghua.org.cn
> >>>>> Sent: Sunday, August 16, 2009 8:54 AM
> >>>>> To: mpich-discuss at mcs.anl.gov
> >>>>> Subject: Re: [mpich-discuss] nemesis
> >>>>>
> >>>>> Hi Guillaume,
> >>>>>
> >>>>> Thanks for your response. 
> >>>>>
> >>>>> Because the program is another company's property, I can't 
> >>>>> send it to you :-(  If I can reproduce it with my own 
> >>>>> program, I will send it to you as soon as possible. 
> >>>>>
> >>>>> I'm trying to using Nemesis/MX on top of open-mx for high 
> >>>>> performance. I got a good performance improvement by using 
> >>>>> open-mx instead of TCP/IP, but the hang problem at 
> >>>>> MPI_Barrier() puzzles me these days. Would you tell me some 
> >>>>> way to debug it? 
> >>>>>
> >>>>> Thanks,
> >>>>> Qian Shen
> >>>>>  
> >>>>>       
> >>>>>           
> >>>>>> Hello,
> >>>>>>
> >>>>>> Yes, Nemesis/MX is supposed to be compatible with Open-MX. 
> >>>>>>         
> >>>>>>             
> >>>>> Could you 
> >>>>>       
> >>>>>           
> >>>>>> sent me your example program so that I can find out what is the 
> >>>>>> problem?
> >>>>>>
> >>>>>>
> >>>>>> Thanks.
> >>>>>> Guillaume
> >>>>>>
> >>>>>> shenqian at tsinghua.org.cn a écrit :
> >>>>>>         
> >>>>>>             
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I build mpich2-1.1.1 on top of open-mx-1.1.1, and have 
> >>>>>>>           
> >>>>>>>               
> >>>>> ch3:nemesis:mx enabled. Nemesis/MX should be compatible with 
> >>>>> Open-MX, isn't it? I test the examples shipped by mpich2, 
> >>>>> they work well. But my MPI program always hang on 
> >>>>> MPI_Barrier(). The output messages are:
> >>>>>       
> >>>>>           
> >>>>>>> Open-MX: Send request (seqnum 105 sesnum 0) timeout, already sent 
> >>>>>>> 1001 times, resetting partner status
> >>>>>>> Open-MX: Cleaning partner 00:11:09:5b:7d:16 endpoint 0
> >>>>>>> Open-MX: Dropped 1 pending send requests to partner
> >>>>>>>
> >>>>>>> It seems like that the requests can not be send to other 
> >>>>>>>           
> >>>>>>>               
> >>>>> nodes. If I switch to the traditional TCP/IP stack, use the 
> >>>>> ch3:nemesis device, the program can run successfully. 
> >>>>>       
> >>>>>           
> >>>>>>> Could anyone tell me how to handle this issue? 
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Qian Shen
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>   
> >>>>>>>           
> >>>>>>>               
> >>>>>       
> >>>>>           
> >>>   
> >>>       
> >
> >   
> 



More information about the mpich-discuss mailing list