[mpich-discuss] good system gone bad..
Darius Buntinas
buntinas at mcs.anl.gov
Tue Dec 7 10:42:20 CST 2010
Glad to hear it.
-d
On Dec 7, 2010, at 5:22 AM, SULLIVAN David (AREVA) wrote:
> Darius,
>
> Thank you again. I looked at SOME of the NFS mounts before I posted, but
> obviously not enough of them. One of the nodes did not properly mount
> /global. Embarrassingly simple, but I am so happy that its up and
> running I will deal with it.
>
> Dave
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
> Sent: Monday, December 06, 2010 9:54 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] good system gone bad..
>
>
> I wonder if something got messed up in the restart. Can you try ssh'ing
> into each node like this:
> ssh a_node ls -l /global/mpich2-1.3/bin/hydra_pmi_proxy
>
> -d
>
> On Dec 6, 2010, at 6:30 PM, SULLIVAN David (AREVA) wrote:
>
>> Yes, it is. The installation was built in an NFS mounted folder. As I
> said, this worked in the morning. After a restart It fails to run.
>>
>> Dave
>>
>>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Darius Buntinas
>> Sent: Mon 12/6/2010 5:53 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] good system gone bad..
>>
>> Check that the hydra proxy installed (and accessible) on every node.
>>
>> -d
>>
>> On Dec 6, 2010, at 4:27 PM, SULLIVAN David (AREVA) wrote:
>>
>>> All,
>>>
>>> I am trying to troubleshoot a problem with my MPICH2 install. I
> compiled version 1.30 with gcc and intel fortran. When I try to execute
> anything I am greeted with the following:
>>>
>>> bash: /global/mpich2-1.3/bin/hydra_pmi_proxy: No such file or
>>> directory It is lying though:
>>>
>>> total 2916
>>> -rwxr-xr-x 1 root root 1656 Dec 6 16:54 bt2line
>>> -rwxr-xr-x 1 root root 10008 Dec 6 16:54 check_callstack -rwxr-xr-x
>
>>> 1 root root 56622 Dec 6 16:54 clog2_join
>>> -rwxr-xr-x 1 root root 1970 Dec 6 16:54 clog2print
>>> -rwxr-xr-x 1 root root 53798 Dec 6 16:54 clog2_print -rwxr-xr-x 1
>>> root root 53799 Dec 6 16:54 clog2_repair
>>> -rwxr-xr-x 1 root root 1959 Dec 6 16:54 clog2TOslog2
>>> -rwxr-xr-x 1 root root 1960 Dec 6 16:54 clogprint
>>> -rwxr-xr-x 1 root root 1956 Dec 6 16:54 clogTOslog2
>>> -rwxr-xr-x 1 root root 84887 Dec 6 16:54 hwloc-bind -rwxr-xr-x 1
>>> root root 83889 Dec 6 16:54 hwloc-calc -rwxr-xr-x 1 root root
>>> 79196 Dec 6 16:54 hwloc-distrib
>>> lrwxrwxrwx 1 root root 6 Dec 6 16:54 hwloc-info -> lstopo
>>> lrwxrwxrwx 1 root root 6 Dec 6 16:54 hwloc-ls -> lstopo
>>> lrwxrwxrwx 1 root root 10 Dec 6 16:54 hwloc-mask -> hwloc-calc
>>> -rwxr-xr-x 1 root root 422199 Dec 6 16:54 hydra_nameserver
>>> -rwxr-xr-x 1 root root 420869 Dec 6 16:54 hydra_persist -rwxr-xr-x 1
>
>>> root root 586873 Dec 6 16:54 hydra_pmi_proxy
>>> -rwxr-xr-x 1 root root 1946 Dec 6 16:54 jumpshot
>>> -rwxr-xr-x 1 root root 1944 Dec 6 16:54 logconvertor
>>> -rwxr-xr-x 1 root root 136310 Dec 6 16:54 lstopo
>>> lrwxrwxrwx 1 root root 6 Dec 6 16:54 mpic++ -> mpicxx
>>> -rwxr-xr-x 1 root root 8974 Dec 6 16:54 mpicc
>>> -rwxr-xr-x 1 root root 8874 Dec 6 16:54 mpich2version
>>> -rwxr-xr-x 1 root root 8659 Dec 6 16:54 mpicxx
>>> lrwxrwxrwx 1 root root 13 Dec 6 16:54 mpiexec -> mpiexec.hydra
>>> -rwxr-xr-x 1 root root 748519 Dec 6 16:54 mpiexec.hydra -rwxr-xr-x 1
>
>>> root root 10640 Dec 6 16:54 mpif77 -rwxr-xr-x 1 root root 12615
>>> Dec 6 16:54 mpif90
>>> lrwxrwxrwx 1 root root 13 Dec 6 16:54 mpirun -> mpiexec.hydra
>>> -rwxr-xr-x 1 root root 3430 Dec 6 16:54 parkill
>>> -rwxr-xr-x 1 root root 20427 Dec 6 16:54 plpa-info -rwxr-xr-x 1
>>> root root 40775 Dec 6 16:54 plpa-taskset
>>> -rwxr-xr-x 1 root root 1965 Dec 6 16:54 slog2filter
>>> -rwxr-xr-x 1 root root 1983 Dec 6 16:54 slog2navigator
>>> -rwxr-xr-x 1 root root 2015 Dec 6 16:54 slog2print
>>> -rwxr-xr-x 1 root root 1974 Dec 6 16:54 slog2updater
>>> clearly it IS there, and it is accessible :
>>>
>>> [dsullivan at athos bin]$ which hydra_pmi_proxy
>>> /global/mpich2-1.3/bin/hydra_pmi_proxy
>>> [dsullivan at athos bin]$
>>> This all worked this morning and little has changed since (a restart
> maybe). Any direction would be greatly appreciated.
>>>
>>>
>>> Regards,
>>>
>>> David Sullivan
>>>
>>>
>>>
>>> AREVA NP INC
>>> 400 Donald Lynch Boulevard
>>> Marlborough, MA, 01752
>>> Phone: (508) 573-6721
>>> Fax: (434) 382-5597
>>> David.Sullivan at AREVA.com
>>>
>>> The information in this e-mail is AREVA property and is intended
> solely for the addressees. Reproduction and distribution are prohibited.
> Thank you .
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> <winmail.dat>_______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list