[mpich-discuss] Hydra issues

Scott Atchley atchley at myri.com
Wed Aug 26 08:58:22 CDT 2009


On Aug 26, 2009, at 9:22 AM, Scott Atchley wrote:

> On Aug 26, 2009, at 7:14 AM, Scott Atchley wrote:
>
>> On Aug 25, 2009, at 9:48 PM, Pavan Balaji wrote:
>>
>>> Scott,
>>>
>>> Are you using Hydra from mpich2-1.1.1p1? Are other programs  
>>> running fine?
>>
>> Yes, this is based on 1.1.1p1. I have the same issue with 1.1.1p1's  
>> ch3:nemesis:mx.
>
> I meant to add that I have not tried any other programs yet.
>
> Scott

Ok, I was not patient enough. If I let it run, it eventually starts.  
The actual walltime is nearly the same as when I use proxies, but the  
stdout is delayed until the application completes.


On a side note, starting the proxies and then immediately calling  
mpiexec.hydra fails:

$ time mpiexec.hydra -boot-proxies -f hosts && time mpiexec.hydra -use- 
persistent -f hosts -n 16 $PWD/IMB-MPI1 -npmin 16

real	0m0.004s
user	0m0.000s
sys	0m0.002s
HYDU_sock_connect (141): connect error (Connection refused)
launch_helper (57): unable to connect to proxy
HYDU_sock_connect (141): connect error (Connection refused)
launch_helper (57): unable to connect to proxy
HYDU_sock_read (276): read errno (Transport endpoint is not connected)
HYD_PMCD_pmi_serv_control_cb (271): unable to read status from proxy
HYD_DMX_wait_for_event (167): callback returned error status
HYD_PMCI_wait_for_completion (479): error waiting for event
main (248): process manager error waiting for completion

real	0m0.006s
user	0m0.001s
sys	0m0.004s

Scott


More information about the mpich-discuss mailing list