[mpich-discuss] nemesis

shenqian at tsinghua.org.cn shenqian at tsinghua.org.cn
Sun Aug 16 08:54:24 CDT 2009


Hi Guillaume,

Thanks for your response. 

Because the program is another company's property, I can't send it to you :-(  If I can reproduce it with my own program, I will send it to you as soon as possible. 

I'm trying to using Nemesis/MX on top of open-mx for high performance. I got a good performance improvement by using open-mx instead of TCP/IP, but the hang problem at MPI_Barrier() puzzles me these days. Would you tell me some way to debug it? 

Thanks,
Qian Shen
 
> 
> Hello,
> 
> Yes, Nemesis/MX is supposed to be compatible with Open-MX. Could you 
> sent me your example program so
> that I can find out what is the problem?
> 
> 
> Thanks.
> Guillaume
> 
> shenqian at tsinghua.org.cn a écrit :
> > Hi,
> >
> > I build mpich2-1.1.1 on top of open-mx-1.1.1, and have ch3:nemesis:mx enabled. Nemesis/MX should be compatible with Open-MX, isn't it? I test the examples shipped by mpich2, they work well. But my MPI program always hang on MPI_Barrier(). The output messages are:
> >
> > Open-MX: Send request (seqnum 105 sesnum 0) timeout, already sent 1001 times, resetting partner status
> > Open-MX: Cleaning partner 00:11:09:5b:7d:16 endpoint 0
> > Open-MX: Dropped 1 pending send requests to partner
> >
> > It seems like that the requests can not be send to other nodes. If I switch to the traditional TCP/IP stack, use the ch3:nemesis device, the program can run successfully. 
> >
> > Could anyone tell me how to handle this issue? 
> >
> > Regards,
> > Qian Shen
> >
> >
> >
> >
> >   
> 



More information about the mpich-discuss mailing list