[mpich-discuss] nemesis

shenqian at tsinghua.org.cn shenqian at tsinghua.org.cn
Sun Aug 16 23:47:28 CDT 2009


Hi Rajeev,

I run "make testing" in the top-level mpich2-1.1.1 directory, got many many errors! Indeed, there are 94 tests failed against the all 553 tests in the summary.xml. My configure options are:

./configure --prefix=/opt/mpich2-install --with-device=ch3:nemesis:mx  --with-mx-lib=/opt/open-mx/lib/ --with-mx-include=/opt/open-mx/include/ --enable-sharedlibs=gcc 

I also build mpich2-1.1.1 with the default settings, and run "make testing" in the top directory . The all 553 test passed! No fail in summary.xml.

So is it really that Nemesis/MX is compatible with Open-MX? Or are there any missing options for configure?

Thanks,
Qian Shen

> 
> You can also run "make testing" in the top-level mpich2 directory. It
> will run the entire test suite in test/mpi. If they run, it would
> indicate there is something wrong with your program.
> 
> Rajeev 
> 
> > -----Original Message-----
> > From: mpich-discuss-bounces at mcs.anl.gov 
> > [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
> > shenqian at tsinghua.org.cn
> > Sent: Sunday, August 16, 2009 8:54 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: Re: [mpich-discuss] nemesis
> > 
> > Hi Guillaume,
> > 
> > Thanks for your response. 
> > 
> > Because the program is another company's property, I can't 
> > send it to you :-(  If I can reproduce it with my own 
> > program, I will send it to you as soon as possible. 
> > 
> > I'm trying to using Nemesis/MX on top of open-mx for high 
> > performance. I got a good performance improvement by using 
> > open-mx instead of TCP/IP, but the hang problem at 
> > MPI_Barrier() puzzles me these days. Would you tell me some 
> > way to debug it? 
> > 
> > Thanks,
> > Qian Shen
> >  
> > > 
> > > Hello,
> > > 
> > > Yes, Nemesis/MX is supposed to be compatible with Open-MX. 
> > Could you 
> > > sent me your example program so that I can find out what is the 
> > > problem?
> > > 
> > > 
> > > Thanks.
> > > Guillaume
> > > 
> > > shenqian at tsinghua.org.cn a écrit :
> > > > Hi,
> > > >
> > > > I build mpich2-1.1.1 on top of open-mx-1.1.1, and have 
> > ch3:nemesis:mx enabled. Nemesis/MX should be compatible with 
> > Open-MX, isn't it? I test the examples shipped by mpich2, 
> > they work well. But my MPI program always hang on 
> > MPI_Barrier(). The output messages are:
> > > >
> > > > Open-MX: Send request (seqnum 105 sesnum 0) timeout, already sent 
> > > > 1001 times, resetting partner status
> > > > Open-MX: Cleaning partner 00:11:09:5b:7d:16 endpoint 0
> > > > Open-MX: Dropped 1 pending send requests to partner
> > > >
> > > > It seems like that the requests can not be send to other 
> > nodes. If I switch to the traditional TCP/IP stack, use the 
> > ch3:nemesis device, the program can run successfully. 
> > > >
> > > > Could anyone tell me how to handle this issue? 
> > > >
> > > > Regards,
> > > > Qian Shen
> > > >
> > > >
> > > >
> > > >
> > > >   
> > > 
> > 
> > 
> 



More information about the mpich-discuss mailing list