[mpich-discuss] Getting runtime exception
Rajnish
rajnish99 at gmail.com
Fri Feb 5 11:59:08 CST 2010
Thank you Pavan for suggesting the use of hydra, did not have it in the
default installation, doing that right now.
Thank you Dave for pointing out about the version incompatibility. Yes,
md5sums are different indeed...
I had installed 1.2.1 on node-n1, and node n2 had it already, but not sure
which version, must be incompatible.
I am going to install fresh and will give you updates.
- Rajnish.
On Fri, Feb 5, 2010 at 12:37 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> On Feb 5, 2010, at 7:24 AM, Rajnish wrote:
>
> [snip]
>
> However, when I schedule tasks across both nodes, with n2 as the master
>> node, I get the following message on n1:
>>
>> mpd_uncaught_except_tb handling:
>> exceptions.KeyError: 'process_mapping'
>> /usr/local/bin/mpd 1354 do_mpdrun
>> msg['process_mapping'][lorank] = self.myHost
>> /usr/local/bin/mpd 984 handle_lhs_input
>> self.do_mpdrun(msg)
>> /usr/local/bin/mpdlib.py 780 handle_active_streams
>> handler(stream,*args)
>> /usr/local/bin/mpd 301 runmainloop
>> rv = self.streamHandler.handle_active_streams(timeout=8.0)
>> /usr/local/bin/mpd 270 run
>> self.runmainloop()
>> /usr/local/bin/mpd 1643 ?
>> mpd.run()
>> n1-wulf.myhost.org_mpdman_1 (run 287): invalid msg from lhs; expecting
>> ringsize got: {}
>>
>
> What version of MPICH2 are you running?
>
> This message seems to indicate that you have somehow installed incompatible
> versions of "mpd.py" between the two hosts. What output do you get from
> running the following commands on both hosts?
>
> -------8<-------
> tail -n +2 `which mpd.py` | md5sum
> tail -n +2 `which mpdman.py` | md5sum
> -------8<-------
>
> (the "tail" business is needed because the shebang line is usually altered
> by the install step)
>
> Results for a few releases:
> -------8<-------
> release mpich2-1.2.1
> 68a128402fb44c6fdebe631bbc1c4b7f mpd.py
> b79fd98d6e4f9d9b80c295e05e01591c mpdman.py
> release mpich2-1.2
> be37cc1347b915a0ec32cba54c928f63 mpd.py
> 5a9cd3f44b5986584b27a648f889bf31 mpdman.py
> release mpich2-1.1.1p1
> be37cc1347b915a0ec32cba54c928f63 mpd.py
> 5a9cd3f44b5986584b27a648f889bf31 mpdman.py
> release mpich2-1.1.1
> 550958d41e76cdef0ceaa74d540760de mpd.py
> 5a9cd3f44b5986584b27a648f889bf31 mpdman.py
> release mpich2-1.1
> f59c7e766dd2d3488b6df212a663ccb9 mpd.py
> 07129c1f68cd815c56bd186eb1b59038 mpdman.py
> release mpich2-1.0.8
> 65fb3b8b1c9e3d053bb97d5ef2ae86ad mpd.py
> 2083f8908d0b9698eb0550c32ef3d153 mpdman.py
> -------8<-------
>
>
> After doing mpdallexit, n2 shows the following message:
>>
>> mpiexec_n2-wulf.myhost.org (mpiexec 377): no msg recvd from mpd when
>> expecting ack of request
>>
>
> You can ignore this message, it's a consequence of the earlier error.
>
> -Dave
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100205/e0b944b7/attachment.htm>
More information about the mpich-discuss
mailing list