[mpich-discuss] Problems running mpiexec on lsf

Pavan Balaji balaji at mcs.anl.gov
Tue May 8 16:53:32 CDT 2012


On 05/08/2012 12:24 PM, Farkash, Noam wrote:
> I’m having problems running mpiexec on an lsf system. (The version I’m
> using is: mpich2-1.4.1)
>
> The error I get is:
>
> "/tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy": No such file
> or directory
>
> I checked that the machines (which the lsf is assigning) actually do see
> this file ( I manually logged into those machines and run ls&stat on
> that file)

Did you log into *all* the nodes and check?

>  >bsub -n 11 -R 'select[(type==RHEL4_64) && cpuf>45 &&
> (mem>10000||gb32||gb64)] rusage[mem=10000]' -P bd-ckt -q normal -o junk
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/mpiexec -n 11 list_file

There seem to be 10 cores on each node, so the first 10 processes are 
run on the first node and the 11th process on the second node.

> [mpiexec at svbr0124] Launch arguments: /tool/lsf7/bin/blaunch -n ca2h2388
> "/tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy" --control-port
> svbr0124:50789 --debug --rmk lsf --launcher lsf --demux poll --pgid 0
> --retries 10 --proxy-id 1
>
> "/tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy": No such file
> or directory
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy
> /tool/cbar/apps/in/10.1-BETA.3/arch/lib/hydra_pmi_proxy

That's 10 processes printing that they see hydra_pmi_proxy and 1 process 
printing that it doesn't.  That means that the first node has 
hydra_pmi_proxy and the second node doesn't.

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list