Hi,<br><br>I have an application that uses mpich2 and creates processes dynamically through MPI_Comm_Spawn. The application may use up to 10 machines (3 machines are Pentium 4 and the others are AMD Athlon). The machine names are:
<br><br><a href="http://pos-04.cic.unb.br">pos-04.cic.unb.br</a><br><a href="http://carbona.laico.cic.unb.br">carbona.laico.cic.unb.br</a><br><a href="http://magicien.laico.cic.unb.br">magicien.laico.cic.unb.br</a><br><a href="http://fau.laico.cic.unb.br">
fau.laico.cic.unb.br</a><br><a href="http://pos-14.cic.unb.br">pos-14.cic.unb.br</a><br><a href="http://pos-10.cic.unb.br">pos-10.cic.unb.br</a><br><a href="http://pos-09.cic.unb.br">pos-09.cic.unb.br</a><br><a href="http://pos-08.cic.unb.br">
pos-08.cic.unb.br</a><br><a href="http://pos-06.cic.unb.br">pos-06.cic.unb.br</a><br><a href="http://pos-03.cic.unb.br">pos-03.cic.unb.br</a><br><br>Usually, things go pretty fine, without running into any troubles. However, the following message appeared at the console in one recent run of the application (the message is copied exactly as it has been written to the console):
<br><br>----------------------------------------<br>[nardelli@carbona nardelli]$ pos-14.cic.unb.br_mpdman_4_s (recv_dict_msg 377):recv_dict_msg: errmsg=::<br> mpdtb:<br> /home/nardelli/mpich2-install/bin/mpdlib.py, 377, recv_dict_msg
<br> /home/nardelli/mpich2-install/bin/mpdman.py, 464, handle_lhs_input<br> /home/nardelli/mpich2-install/bin/mpdlib.py, 488, handle_active_streams<br> /home/nardelli/mpich2-install/bin/mpdman.py, 413, run
<br> /home/nardelli/mpich2-install/bin/mpd, 1284, launch_mpdman_via_fork<br> /home/nardelli/mpich2-install/bin/mpd, 1205, run_one_cli
<br> /home/nardelli/mpich2-install/bin/mpd, 1061, do_mpdrun<br> /home/nardelli/mpich2-install/bin/mpd, 755, handle_lhs_input<br> /home/nardelli/mpich2-install/bin/mpdlib.py, 488, handle_active_streams
<br> /home/nardelli/mpich2-install/bin/mpd, 266, runmainloop<br> /home/nardelli/mpich2-install/bin/mpd, 240, run<br> /home/nardelli/mpich2-install/bin/mpd, 1344, ?
<br> mpd_cli_app=/home/nardelli/SW_Teste/SlaveMain<br> fau.laico.cic.unb.br_mpdman_7_s: mpd_uncaught_except_tb handling:<br>
exceptions.AttributeError: 'int' object has no attribute 'send_dict_msg'<br> /home/nardelli/mpich2-install/bin/mpdman.py 564 handle_lhs_input<br>
self.ring.rhsSock.send_dict_msg(msg)<br> /home/nardelli/mpich2-install/bin/mpdlib.py 488 handle_active_streams<br> handler(stream,*args)
<br> /home/nardelli/mpich2-install/bin/mpdman.py 413 run<br> rv = self.streamHandler.handle_active_streams(timeout=5.0)<br> /home/nardelli/mpich2-install/bin/mpd 1284 launch_mpdman_via_fork
<br> mpdman.run()<br> /home/nardelli/mpich2-install/bin/mpd 1205 run_one_cli<br> (manPid,toManSock) =
self.launch_mpdman_via_fork(msg,man_env)<br> /home/nardelli/mpich2-install/bin/mpd 1061 do_mpdrun<br> self.run_one_cli(rank,msg)<br> /home/nardelli/mpich2-install/bin/mpd 755 handle_lhs_input
<br> self.do_mpdrun(msg)<br> /home/nardelli/mpich2-install/bin/mpdlib.py 488 handle_active_streams<br> handler(stream,*args)
<br> /home/nardelli/mpich2-install/bin/mpd 266 runmainloop<br> rv = self.streamHandler.handle_active_streams(timeout=8.0)<br> /home/nardelli/mpich2-install/bin/mpd 240 run
<br> self.runmainloop()<br> /home/nardelli/mpich2-install/bin/mpd 1344 ?<br> mpd.run()<br> mpd_cli_app=/home/nardelli/SW_Teste/SlaveMain
<br>----------------------------------------<br><br>Does anyone know what is this? I tried to find some answer in google, but I'm really lost here. This error message has not appeared anymore (at least, not until now...). Maybe it was a problem that happened during a MPI_Recv call... Please, any ideas about the error?
<br><br>Thanks,<br>Marcelo Nardelli<br>