[mpich-discuss] envall genv not working?

Pavan Balaji balaji at mcs.anl.gov
Tue Dec 27 01:04:38 CST 2011


Brian,

The -genv and other options only pass the environment variables to the 
application, and not to the hydra_pmi_proxy.  However, the build should 
automatically be setting the rpath, so shared libraries should get 
loaded on the remote nodes as well.  In your setup, are the shared 
libraries not located in the same location on all machines?

It would have been good if the LD_LIBRARY_PATH variable was passed to 
the proxy as well, but unfortunately, I can't think of a clean and 
portable way to pass this correctly.  Internally, mpiexec is launching:

% ssh <hostname>  hydra_pmi_proxy --some_parameters

In this case, if ssh doesn't pass environment variables correctly, the 
proxy will not see the shared library path.  There are some ways to pass 
this along described here: 
http://superuser.com/questions/163167/when-sshing-how-can-i-set-an-environment-variable-on-the-server-that-changes-f

But none of them seem portable or without requiring changes on the 
server side (sshd).  Also, I'd prefer a solution that'll work for not 
just ssh, but all of the launchers that we support.

  -- Pavan

On 12/13/2011 09:50 AM, Andrus, Brian Contractor wrote:
> Hello,
>
> I have been trying to get mpiexec to run across nodes and it kept
> failing with:
>
> /opt/mpich2/1.4.1p1/intel12/bin/hydra_pmi_proxy: error while loading
> shared libraries: libimf.so: cannot open shared object file: No such
> file or directory
>
> HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:184): assert
> (!closed) failed
>
> ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1
> downstream
>
> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
> returned error status
>
> HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error
> waiting for event
>
> main (./ui/mpich/mpiexec.c:405): process manager error waiting for
> completion
>
> Now I did troubleshoot this to be due to the fact that on the machines
> other than the one I am running mpiexec from do not have a proper
> LD_LIBRARY_PATH set.
>
> Shouldn’t that get passed to the ‘children’ when I do an mpiexec?
>
> We use modules and have different compilers here, so folks need to load
> the appropriate module to access the appropriate libraries. In this
> case, the application was compiled using the intel compiler, so the
> environment gets set using:
>
> module load compile/intel mpi/mpich2
>
> To get the application to run via mpich2 on multiple nodes, I have had
> to add that line to the .bashrc
>
> I have tried using the various genv and env options (genvall, envall,
> envlist, genvlist) to no avail. The only way I was able to successfully
> run was to set the LD_LIBRARY_PATH in my .bashrc.
>
> Am I missing something about the environment settings when using mpiexec
> or mpirun?
>
> Brian Andrus
>
> ITACS/Research Computing
>
> Naval Postgraduate School
>
> Monterey, California
>
> voice: 831-656-6238
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list