[mpich-discuss] Getting runtime exception
Pavan Balaji
balaji at mcs.anl.gov
Fri Feb 5 09:07:19 CST 2010
While I don't foresee any problem in the setup you described, I should
point out that this is not the kind of setup we do our testing on. So,
if there's something wrong, we won't be able to reproduce it here.
But, there might be some basic tests you can do to make sure the setup
is OK.
1. Use the mpdcheck and mpdringtest utilities to make sure mpd can
launch fine.
2. Use the hydra process manager instead of mpd:
% mpiexec.hydra -f machinefile -n 4 hostname
machinefile should contain n1 and n2.
-- Pavan
On 02/05/2010 07:24 AM, Rajnish wrote:
>
> First thank you all a lot for providing help here.
>
> I have two Linux SMP nodes, say n1 and n2.
>
> n1 has Linux 2.1.18 with gcc 4.1.2 and n2 has Linux 2.1.20 with gcc
> 4.1.1 running. Both are on NFS and shh-accesible.
>
> I got MPICH2 installed and running. /mpdringtest /with both nodes on the
> ring runs fine.
>
> When I schedule tasks on same node, they run fine.
>
> However, when I schedule tasks across both nodes, with n2 as the master
> node, I get the following message on n1:
>
> mpd_uncaught_except_tb handling:
> exceptions.KeyError: 'process_mapping'
> /usr/local/bin/mpd 1354 do_mpdrun
> msg['process_mapping'][lorank] = self.myHost
> /usr/local/bin/mpd 984 handle_lhs_input
> self.do_mpdrun(msg)
> /usr/local/bin/mpdlib.py 780 handle_active_streams
> handler(stream,*args)
> /usr/local/bin/mpd 301 runmainloop
> rv = self.streamHandler.handle_active_streams(timeout=8.0)
> /usr/local/bin/mpd 270 run
> self.runmainloop()
> /usr/local/bin/mpd 1643 ?
> mpd.run()
> n1-wulf.myhost.org_mpdman_1 (run 287): invalid msg from lhs; expecting
> ringsize got: {}
>
>
> After doing /mpdallexit/, n2 shows the following message:
>
> mpiexec_n2-wulf.myhost.org <http://mpiexec_n2-wulf.myhost.org> (mpiexec
> 377): no msg recvd from mpd when expecting ack of request
> --------------------
>
> My question: Do I need exactly same OS version on both n1 and n2, or
> same gcc version on both, or I may have some other installation problem?
>
> Thanks in advance,
> - Rajnish.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list