[mpich-discuss] good system gone bad..

Darius Buntinas buntinas at mcs.anl.gov
Tue Dec 7 10:42:20 CST 2010


Glad to hear it.

-d

On Dec 7, 2010, at 5:22 AM, SULLIVAN David (AREVA) wrote:

> Darius,
> 
> Thank you again. I looked at SOME of the NFS mounts before I posted, but
> obviously not enough of them. One of the nodes did not properly mount
> /global. Embarrassingly simple, but I am so happy that its up and
> running I will deal with it.  
> 
> Dave
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
> Sent: Monday, December 06, 2010 9:54 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] good system gone bad..
> 
> 
> I wonder if something got messed up in the restart.  Can you try ssh'ing
> into each node like this:
>  ssh a_node ls -l /global/mpich2-1.3/bin/hydra_pmi_proxy
> 
> -d
> 
> On Dec 6, 2010, at 6:30 PM, SULLIVAN David (AREVA) wrote:
> 
>> Yes, it is. The installation was built in an NFS mounted folder. As I
> said, this worked in the morning. After a restart It fails to run.
>> 
>> Dave
>> 
>> 
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Darius Buntinas
>> Sent: Mon 12/6/2010 5:53 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] good system gone bad..
>> 
>> Check that the hydra proxy installed (and accessible) on every node.
>> 
>> -d
>> 
>> On Dec 6, 2010, at 4:27 PM, SULLIVAN David (AREVA) wrote:
>> 
>>> All,
>>> 
>>> I am trying to troubleshoot a problem with my MPICH2 install. I
> compiled version 1.30 with gcc and intel fortran. When I try to execute
> anything I am greeted with the following:
>>> 
>>> bash: /global/mpich2-1.3/bin/hydra_pmi_proxy: No such file or 
>>> directory It is lying though:
>>> 
>>> total 2916
>>> -rwxr-xr-x 1 root root   1656 Dec  6 16:54 bt2line
>>> -rwxr-xr-x 1 root root  10008 Dec  6 16:54 check_callstack -rwxr-xr-x
> 
>>> 1 root root  56622 Dec  6 16:54 clog2_join
>>> -rwxr-xr-x 1 root root   1970 Dec  6 16:54 clog2print
>>> -rwxr-xr-x 1 root root  53798 Dec  6 16:54 clog2_print -rwxr-xr-x 1 
>>> root root  53799 Dec  6 16:54 clog2_repair
>>> -rwxr-xr-x 1 root root   1959 Dec  6 16:54 clog2TOslog2
>>> -rwxr-xr-x 1 root root   1960 Dec  6 16:54 clogprint
>>> -rwxr-xr-x 1 root root   1956 Dec  6 16:54 clogTOslog2
>>> -rwxr-xr-x 1 root root  84887 Dec  6 16:54 hwloc-bind -rwxr-xr-x 1 
>>> root root  83889 Dec  6 16:54 hwloc-calc -rwxr-xr-x 1 root root  
>>> 79196 Dec  6 16:54 hwloc-distrib
>>> lrwxrwxrwx 1 root root      6 Dec  6 16:54 hwloc-info -> lstopo
>>> lrwxrwxrwx 1 root root      6 Dec  6 16:54 hwloc-ls -> lstopo
>>> lrwxrwxrwx 1 root root     10 Dec  6 16:54 hwloc-mask -> hwloc-calc
>>> -rwxr-xr-x 1 root root 422199 Dec  6 16:54 hydra_nameserver 
>>> -rwxr-xr-x 1 root root 420869 Dec  6 16:54 hydra_persist -rwxr-xr-x 1
> 
>>> root root 586873 Dec  6 16:54 hydra_pmi_proxy
>>> -rwxr-xr-x 1 root root   1946 Dec  6 16:54 jumpshot
>>> -rwxr-xr-x 1 root root   1944 Dec  6 16:54 logconvertor
>>> -rwxr-xr-x 1 root root 136310 Dec  6 16:54 lstopo
>>> lrwxrwxrwx 1 root root      6 Dec  6 16:54 mpic++ -> mpicxx
>>> -rwxr-xr-x 1 root root   8974 Dec  6 16:54 mpicc
>>> -rwxr-xr-x 1 root root   8874 Dec  6 16:54 mpich2version
>>> -rwxr-xr-x 1 root root   8659 Dec  6 16:54 mpicxx
>>> lrwxrwxrwx 1 root root     13 Dec  6 16:54 mpiexec -> mpiexec.hydra
>>> -rwxr-xr-x 1 root root 748519 Dec  6 16:54 mpiexec.hydra -rwxr-xr-x 1
> 
>>> root root  10640 Dec  6 16:54 mpif77 -rwxr-xr-x 1 root root  12615 
>>> Dec  6 16:54 mpif90
>>> lrwxrwxrwx 1 root root     13 Dec  6 16:54 mpirun -> mpiexec.hydra
>>> -rwxr-xr-x 1 root root   3430 Dec  6 16:54 parkill
>>> -rwxr-xr-x 1 root root  20427 Dec  6 16:54 plpa-info -rwxr-xr-x 1 
>>> root root  40775 Dec  6 16:54 plpa-taskset
>>> -rwxr-xr-x 1 root root   1965 Dec  6 16:54 slog2filter
>>> -rwxr-xr-x 1 root root   1983 Dec  6 16:54 slog2navigator
>>> -rwxr-xr-x 1 root root   2015 Dec  6 16:54 slog2print
>>> -rwxr-xr-x 1 root root   1974 Dec  6 16:54 slog2updater
>>> clearly it IS there, and it is accessible :
>>> 
>>> [dsullivan at athos bin]$ which hydra_pmi_proxy 
>>> /global/mpich2-1.3/bin/hydra_pmi_proxy
>>> [dsullivan at athos bin]$
>>> This all worked this morning and little has changed since (a restart
> maybe). Any direction would be greatly appreciated.
>>> 
>>> 
>>> Regards,
>>> 
>>> David Sullivan
>>> 
>>> 
>>> 
>>> AREVA NP INC
>>> 400 Donald Lynch Boulevard
>>> Marlborough, MA, 01752
>>> Phone: (508) 573-6721
>>> Fax: (434) 382-5597
>>> David.Sullivan at AREVA.com
>>> 
>>> The information in this e-mail is AREVA property and is intended
> solely for the addressees. Reproduction and distribution are prohibited.
> Thank you .
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> <winmail.dat>_______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list