[mpich-discuss] good system gone bad..

SULLIVAN David (AREVA) David.Sullivan at areva.com
Tue Dec 7 07:22:54 CST 2010


Darius,

Thank you again. I looked at SOME of the NFS mounts before I posted, but
obviously not enough of them. One of the nodes did not properly mount
/global. Embarrassingly simple, but I am so happy that its up and
running I will deal with it.  

Dave

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
Sent: Monday, December 06, 2010 9:54 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] good system gone bad..


I wonder if something got messed up in the restart.  Can you try ssh'ing
into each node like this:
  ssh a_node ls -l /global/mpich2-1.3/bin/hydra_pmi_proxy

-d

On Dec 6, 2010, at 6:30 PM, SULLIVAN David (AREVA) wrote:

> Yes, it is. The installation was built in an NFS mounted folder. As I
said, this worked in the morning. After a restart It fails to run.
> 
> Dave
> 
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Darius Buntinas
> Sent: Mon 12/6/2010 5:53 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] good system gone bad..
> 
> Check that the hydra proxy installed (and accessible) on every node.
> 
> -d
> 
> On Dec 6, 2010, at 4:27 PM, SULLIVAN David (AREVA) wrote:
> 
>> All,
>> 
>> I am trying to troubleshoot a problem with my MPICH2 install. I
compiled version 1.30 with gcc and intel fortran. When I try to execute
anything I am greeted with the following:
>> 
>> bash: /global/mpich2-1.3/bin/hydra_pmi_proxy: No such file or 
>> directory It is lying though:
>> 
>> total 2916
>> -rwxr-xr-x 1 root root   1656 Dec  6 16:54 bt2line
>> -rwxr-xr-x 1 root root  10008 Dec  6 16:54 check_callstack -rwxr-xr-x

>> 1 root root  56622 Dec  6 16:54 clog2_join
>> -rwxr-xr-x 1 root root   1970 Dec  6 16:54 clog2print
>> -rwxr-xr-x 1 root root  53798 Dec  6 16:54 clog2_print -rwxr-xr-x 1 
>> root root  53799 Dec  6 16:54 clog2_repair
>> -rwxr-xr-x 1 root root   1959 Dec  6 16:54 clog2TOslog2
>> -rwxr-xr-x 1 root root   1960 Dec  6 16:54 clogprint
>> -rwxr-xr-x 1 root root   1956 Dec  6 16:54 clogTOslog2
>> -rwxr-xr-x 1 root root  84887 Dec  6 16:54 hwloc-bind -rwxr-xr-x 1 
>> root root  83889 Dec  6 16:54 hwloc-calc -rwxr-xr-x 1 root root  
>> 79196 Dec  6 16:54 hwloc-distrib
>> lrwxrwxrwx 1 root root      6 Dec  6 16:54 hwloc-info -> lstopo
>> lrwxrwxrwx 1 root root      6 Dec  6 16:54 hwloc-ls -> lstopo
>> lrwxrwxrwx 1 root root     10 Dec  6 16:54 hwloc-mask -> hwloc-calc
>> -rwxr-xr-x 1 root root 422199 Dec  6 16:54 hydra_nameserver 
>> -rwxr-xr-x 1 root root 420869 Dec  6 16:54 hydra_persist -rwxr-xr-x 1

>> root root 586873 Dec  6 16:54 hydra_pmi_proxy
>> -rwxr-xr-x 1 root root   1946 Dec  6 16:54 jumpshot
>> -rwxr-xr-x 1 root root   1944 Dec  6 16:54 logconvertor
>> -rwxr-xr-x 1 root root 136310 Dec  6 16:54 lstopo
>> lrwxrwxrwx 1 root root      6 Dec  6 16:54 mpic++ -> mpicxx
>> -rwxr-xr-x 1 root root   8974 Dec  6 16:54 mpicc
>> -rwxr-xr-x 1 root root   8874 Dec  6 16:54 mpich2version
>> -rwxr-xr-x 1 root root   8659 Dec  6 16:54 mpicxx
>> lrwxrwxrwx 1 root root     13 Dec  6 16:54 mpiexec -> mpiexec.hydra
>> -rwxr-xr-x 1 root root 748519 Dec  6 16:54 mpiexec.hydra -rwxr-xr-x 1

>> root root  10640 Dec  6 16:54 mpif77 -rwxr-xr-x 1 root root  12615 
>> Dec  6 16:54 mpif90
>> lrwxrwxrwx 1 root root     13 Dec  6 16:54 mpirun -> mpiexec.hydra
>> -rwxr-xr-x 1 root root   3430 Dec  6 16:54 parkill
>> -rwxr-xr-x 1 root root  20427 Dec  6 16:54 plpa-info -rwxr-xr-x 1 
>> root root  40775 Dec  6 16:54 plpa-taskset
>> -rwxr-xr-x 1 root root   1965 Dec  6 16:54 slog2filter
>> -rwxr-xr-x 1 root root   1983 Dec  6 16:54 slog2navigator
>> -rwxr-xr-x 1 root root   2015 Dec  6 16:54 slog2print
>> -rwxr-xr-x 1 root root   1974 Dec  6 16:54 slog2updater
>> clearly it IS there, and it is accessible :
>> 
>> [dsullivan at athos bin]$ which hydra_pmi_proxy 
>> /global/mpich2-1.3/bin/hydra_pmi_proxy
>> [dsullivan at athos bin]$
>> This all worked this morning and little has changed since (a restart
maybe). Any direction would be greatly appreciated.
>> 
>> 
>> Regards,
>> 
>> David Sullivan
>> 
>> 
>> 
>> AREVA NP INC
>> 400 Donald Lynch Boulevard
>> Marlborough, MA, 01752
>> Phone: (508) 573-6721
>> Fax: (434) 382-5597
>> David.Sullivan at AREVA.com
>> 
>> The information in this e-mail is AREVA property and is intended
solely for the addressees. Reproduction and distribution are prohibited.
Thank you .
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> <winmail.dat>_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list