[MPICH] Strange problem when running mpich2 application

Marcelo Nardelli marcelo.nardelli at gmail.com
Sun Oct 8 17:35:19 CDT 2006


Hi,

I have an application that uses mpich2 and creates processes dynamically
through MPI_Comm_Spawn. The application may use up to 10 machines (3
machines are Pentium 4 and the others are AMD Athlon). The machine names
are:

pos-04.cic.unb.br
carbona.laico.cic.unb.br
magicien.laico.cic.unb.br
fau.laico.cic.unb.br
pos-14.cic.unb.br
pos-10.cic.unb.br
pos-09.cic.unb.br
pos-08.cic.unb.br
pos-06.cic.unb.br
pos-03.cic.unb.br

Usually, things go pretty fine, without running into any troubles. However,
the following message appeared at the console in one recent run of the
application (the message is copied exactly as it has been written to the
console):

----------------------------------------
[nardelli at carbona nardelli]$ pos-14.cic.unb.br_mpdman_4_s (recv_dict_msg
377):recv_dict_msg: errmsg=::
                        mpdtb:

/home/nardelli/mpich2-install/bin/mpdlib.py, 377,  recv_dict_msg
                       /home/nardelli/mpich2-install/bin/mpdman.py,  464,
handle_lhs_input
               /home/nardelli/mpich2-install/bin/mpdlib.py,  488,
handle_active_streams
            /home/nardelli/mpich2-install/bin/mpdman.py,  413,  run

/home/nardelli/mpich2-install/bin/mpd,  1284,  launch_mpdman_via_fork

/home/nardelli/mpich2-install/bin/mpd,  1205,  run_one_cli

/home/nardelli/mpich2-install/bin/mpd,  1061,  do_mpdrun
                          /home/nardelli/mpich2-install/bin/mpd,  755,
handle_lhs_input
            /home/nardelli/mpich2-install/bin/mpdlib.py,  488,
handle_active_streams
         /home/nardelli/mpich2-install/bin/mpd,  266,  runmainloop

/home/nardelli/mpich2-install/bin/mpd,  240,  run

/home/nardelli/mpich2-install/bin/mpd,  1344,  ?
               mpd_cli_app=/home/nardelli/SW_Teste/SlaveMain

fau.laico.cic.unb.br_mpdman_7_s: mpd_uncaught_except_tb handling:
                                               exceptions.AttributeError:
'int' object has no attribute 'send_dict_msg'

/home/nardelli/mpich2-install/bin/mpdman.py  564  handle_lhs_input
                                     self.ring.rhsSock.send_dict_msg(msg)

/home/nardelli/mpich2-install/bin/mpdlib.py  488  handle_active_streams

handler(stream,*args)
                     /home/nardelli/mpich2-install/bin/mpdman.py  413  run
   rv = self.streamHandler.handle_active_streams(timeout=5.0)

/home/nardelli/mpich2-install/bin/mpd  1284  launch_mpdman_via_fork
                                                           mpdman.run()

/home/nardelli/mpich2-install/bin/mpd  1205  run_one_cli

(manPid,toManSock) = self.launch_mpdman_via_fork(msg,man_env)

/home/nardelli/mpich2-install/bin/mpd  1061  do_mpdrun
                          self.run_one_cli(rank,msg)

/home/nardelli/mpich2-install/bin/mpd  755  handle_lhs_input
                                            self.do_mpdrun(msg)

/home/nardelli/mpich2-install/bin/mpdlib.py  488  handle_active_streams

handler(stream,*args)
           /home/nardelli/mpich2-install/bin/mpd  266  runmainloop
                                                                          rv
= self.streamHandler.handle_active_streams(timeout=8.0)

/home/nardelli/mpich2-install/bin/mpd  240  run
                               self.runmainloop()

/home/nardelli/mpich2-install/bin/mpd  1344  ?
                           mpd.run()

mpd_cli_app=/home/nardelli/SW_Teste/SlaveMain
----------------------------------------

Does anyone know what is this? I tried to find some answer in google, but
I'm really lost here. This error message has not appeared anymore (at least,
not until now...). Maybe it was a problem that happened during a MPI_Recv
call... Please, any ideas about the error?

Thanks,
Marcelo Nardelli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20061008/5fc75da3/attachment.htm>


More information about the mpich-discuss mailing list