[MPICH] MPICH2 ring breaking; three times in two days
Rajeev Thakur
thakur at mcs.anl.gov
Tue Jun 27 17:34:25 CDT 2006
Can you try upgrading to 1.0.3 and see if the problem goes away.
Rajeev
_____
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Galton, Simon
Sent: Tuesday, June 27, 2006 8:34 AM
To: 'mpich-discuss at mcs.anl.gov'
Subject: [MPICH] MPICH2 ring breaking; three times in two days
Help! :)
We've been seeing a problem where most of our nodes drop out of the MPICH2
ring; this has happened three times in the last two days. It's not always
the same nodes, either :(
The syslog file on our head node shows the following error:
(handle_rhs_challenge_response 1010): INVALID msg for rhs response msg=:{}:
from host=xxxxx
xxxxx represents the various hosts which drop out of the ring.
Could this be a misbehaving job?
Help! :)
I'm using mpich2-1.0.1 on RHEL3
Simon
CONFIDENTIAL AND PRIVILEGED INFORMATION NOTICE
This e-mail, and any attachments, may contain information that
is confidential, subject to copyright, or exempt from disclosure.
Any unauthorized review, disclosure, retransmission,
dissemination or other use of or reliance on this information
may be unlawful and is strictly prohibited.
AVIS D'INFORMATION CONFIDENTIELLE ET PRIVILÉGIÉE
Le présent courriel, et toute pièce jointe, peut contenir de
l'information qui est confidentielle, régie par les droits
d'auteur, ou interdite de divulgation. Tout examen,
divulgation, retransmission, diffusion ou autres utilisations
non autorisées de l'information ou dépendance non autorisée
envers celle-ci peut être illégale et est strictement interdite.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20060627/4855f0b6/attachment.htm>
More information about the mpich-discuss
mailing list