[MPICH] Strange problem when running mpich2 application
Marcelo Nardelli
marcelo.nardelli at gmail.com
Sun Oct 8 17:35:19 CDT 2006
Hi,
I have an application that uses mpich2 and creates processes dynamically
through MPI_Comm_Spawn. The application may use up to 10 machines (3
machines are Pentium 4 and the others are AMD Athlon). The machine names
are:
pos-04.cic.unb.br
carbona.laico.cic.unb.br
magicien.laico.cic.unb.br
fau.laico.cic.unb.br
pos-14.cic.unb.br
pos-10.cic.unb.br
pos-09.cic.unb.br
pos-08.cic.unb.br
pos-06.cic.unb.br
pos-03.cic.unb.br
Usually, things go pretty fine, without running into any troubles. However,
the following message appeared at the console in one recent run of the
application (the message is copied exactly as it has been written to the
console):
----------------------------------------
[nardelli at carbona nardelli]$ pos-14.cic.unb.br_mpdman_4_s (recv_dict_msg
377):recv_dict_msg: errmsg=::
mpdtb:
/home/nardelli/mpich2-install/bin/mpdlib.py, 377, recv_dict_msg
/home/nardelli/mpich2-install/bin/mpdman.py, 464,
handle_lhs_input
/home/nardelli/mpich2-install/bin/mpdlib.py, 488,
handle_active_streams
/home/nardelli/mpich2-install/bin/mpdman.py, 413, run
/home/nardelli/mpich2-install/bin/mpd, 1284, launch_mpdman_via_fork
/home/nardelli/mpich2-install/bin/mpd, 1205, run_one_cli
/home/nardelli/mpich2-install/bin/mpd, 1061, do_mpdrun
/home/nardelli/mpich2-install/bin/mpd, 755,
handle_lhs_input
/home/nardelli/mpich2-install/bin/mpdlib.py, 488,
handle_active_streams
/home/nardelli/mpich2-install/bin/mpd, 266, runmainloop
/home/nardelli/mpich2-install/bin/mpd, 240, run
/home/nardelli/mpich2-install/bin/mpd, 1344, ?
mpd_cli_app=/home/nardelli/SW_Teste/SlaveMain
fau.laico.cic.unb.br_mpdman_7_s: mpd_uncaught_except_tb handling:
exceptions.AttributeError:
'int' object has no attribute 'send_dict_msg'
/home/nardelli/mpich2-install/bin/mpdman.py 564 handle_lhs_input
self.ring.rhsSock.send_dict_msg(msg)
/home/nardelli/mpich2-install/bin/mpdlib.py 488 handle_active_streams
handler(stream,*args)
/home/nardelli/mpich2-install/bin/mpdman.py 413 run
rv = self.streamHandler.handle_active_streams(timeout=5.0)
/home/nardelli/mpich2-install/bin/mpd 1284 launch_mpdman_via_fork
mpdman.run()
/home/nardelli/mpich2-install/bin/mpd 1205 run_one_cli
(manPid,toManSock) = self.launch_mpdman_via_fork(msg,man_env)
/home/nardelli/mpich2-install/bin/mpd 1061 do_mpdrun
self.run_one_cli(rank,msg)
/home/nardelli/mpich2-install/bin/mpd 755 handle_lhs_input
self.do_mpdrun(msg)
/home/nardelli/mpich2-install/bin/mpdlib.py 488 handle_active_streams
handler(stream,*args)
/home/nardelli/mpich2-install/bin/mpd 266 runmainloop
rv
= self.streamHandler.handle_active_streams(timeout=8.0)
/home/nardelli/mpich2-install/bin/mpd 240 run
self.runmainloop()
/home/nardelli/mpich2-install/bin/mpd 1344 ?
mpd.run()
mpd_cli_app=/home/nardelli/SW_Teste/SlaveMain
----------------------------------------
Does anyone know what is this? I tried to find some answer in google, but
I'm really lost here. This error message has not appeared anymore (at least,
not until now...). Maybe it was a problem that happened during a MPI_Recv
call... Please, any ideas about the error?
Thanks,
Marcelo Nardelli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20061008/5fc75da3/attachment.htm>
More information about the mpich-discuss
mailing list