[mpich-discuss] Getting runtime exception
Dave Goodell
goodell at mcs.anl.gov
Fri Feb 5 11:37:51 CST 2010
On Feb 5, 2010, at 7:24 AM, Rajnish wrote:
[snip]
> However, when I schedule tasks across both nodes, with n2 as the
> master node, I get the following message on n1:
>
> mpd_uncaught_except_tb handling:
> exceptions.KeyError: 'process_mapping'
> /usr/local/bin/mpd 1354 do_mpdrun
> msg['process_mapping'][lorank] = self.myHost
> /usr/local/bin/mpd 984 handle_lhs_input
> self.do_mpdrun(msg)
> /usr/local/bin/mpdlib.py 780 handle_active_streams
> handler(stream,*args)
> /usr/local/bin/mpd 301 runmainloop
> rv = self.streamHandler.handle_active_streams(timeout=8.0)
> /usr/local/bin/mpd 270 run
> self.runmainloop()
> /usr/local/bin/mpd 1643 ?
> mpd.run()
> n1-wulf.myhost.org_mpdman_1 (run 287): invalid msg from lhs;
> expecting ringsize got: {}
What version of MPICH2 are you running?
This message seems to indicate that you have somehow installed
incompatible versions of "mpd.py" between the two hosts. What output
do you get from running the following commands on both hosts?
-------8<-------
tail -n +2 `which mpd.py` | md5sum
tail -n +2 `which mpdman.py` | md5sum
-------8<-------
(the "tail" business is needed because the shebang line is usually
altered by the install step)
Results for a few releases:
-------8<-------
release mpich2-1.2.1
68a128402fb44c6fdebe631bbc1c4b7f mpd.py
b79fd98d6e4f9d9b80c295e05e01591c mpdman.py
release mpich2-1.2
be37cc1347b915a0ec32cba54c928f63 mpd.py
5a9cd3f44b5986584b27a648f889bf31 mpdman.py
release mpich2-1.1.1p1
be37cc1347b915a0ec32cba54c928f63 mpd.py
5a9cd3f44b5986584b27a648f889bf31 mpdman.py
release mpich2-1.1.1
550958d41e76cdef0ceaa74d540760de mpd.py
5a9cd3f44b5986584b27a648f889bf31 mpdman.py
release mpich2-1.1
f59c7e766dd2d3488b6df212a663ccb9 mpd.py
07129c1f68cd815c56bd186eb1b59038 mpdman.py
release mpich2-1.0.8
65fb3b8b1c9e3d053bb97d5ef2ae86ad mpd.py
2083f8908d0b9698eb0550c32ef3d153 mpdman.py
-------8<-------
> After doing mpdallexit, n2 shows the following message:
>
> mpiexec_n2-wulf.myhost.org (mpiexec 377): no msg recvd from mpd when
> expecting ack of request
You can ignore this message, it's a consequence of the earlier error.
-Dave
More information about the mpich-discuss
mailing list