[mpich-discuss] Getting runtime exception

Pavan Balaji balaji at mcs.anl.gov
Fri Feb 5 09:07:19 CST 2010


While I don't foresee any problem in the setup you described, I should
point out that this is not the kind of setup we do our testing on. So,
if there's something wrong, we won't be able to reproduce it here.

But, there might be some basic tests you can do to make sure the setup
is OK.

1. Use the mpdcheck and mpdringtest utilities to make sure mpd can
launch fine.

2. Use the hydra process manager instead of mpd:

% mpiexec.hydra -f machinefile -n 4 hostname

machinefile should contain n1 and n2.

 -- Pavan

On 02/05/2010 07:24 AM, Rajnish wrote:
> 
> First thank you all a lot for providing help here.
> 
> I have two Linux SMP nodes, say n1 and n2.
> 
> n1 has Linux 2.1.18 with gcc 4.1.2 and n2 has Linux 2.1.20 with gcc
> 4.1.1 running. Both are on NFS and shh-accesible.
> 
> I got MPICH2 installed and running. /mpdringtest /with both nodes on the
> ring runs fine.
> 
> When I schedule tasks on same node, they run fine.
> 
> However, when I schedule tasks across both nodes, with n2 as the master
> node, I get the following message on n1:
> 
> mpd_uncaught_except_tb handling:
>  exceptions.KeyError: 'process_mapping'
>    /usr/local/bin/mpd  1354  do_mpdrun
>        msg['process_mapping'][lorank] = self.myHost
>    /usr/local/bin/mpd  984  handle_lhs_input
>        self.do_mpdrun(msg)
>    /usr/local/bin/mpdlib.py  780  handle_active_streams
>        handler(stream,*args)
>    /usr/local/bin/mpd  301  runmainloop
>        rv = self.streamHandler.handle_active_streams(timeout=8.0)
>    /usr/local/bin/mpd  270  run
>        self.runmainloop()
>    /usr/local/bin/mpd  1643  ?
>        mpd.run()
> n1-wulf.myhost.org_mpdman_1 (run 287): invalid msg from lhs; expecting
> ringsize got: {}
> 
> 
> After doing /mpdallexit/, n2 shows the following message:
> 
> mpiexec_n2-wulf.myhost.org <http://mpiexec_n2-wulf.myhost.org> (mpiexec
> 377): no msg recvd from mpd when expecting ack of request
> --------------------
> 
> My question: Do I need exactly same OS version on both n1 and n2, or
> same gcc version on both, or I may have some other installation problem?
> 
> Thanks in advance,
> - Rajnish.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list