[mpich-discuss] envall genv not working?
Pavan Balaji
balaji at mcs.anl.gov
Tue Dec 27 01:04:38 CST 2011
Brian,
The -genv and other options only pass the environment variables to the
application, and not to the hydra_pmi_proxy. However, the build should
automatically be setting the rpath, so shared libraries should get
loaded on the remote nodes as well. In your setup, are the shared
libraries not located in the same location on all machines?
It would have been good if the LD_LIBRARY_PATH variable was passed to
the proxy as well, but unfortunately, I can't think of a clean and
portable way to pass this correctly. Internally, mpiexec is launching:
% ssh <hostname> hydra_pmi_proxy --some_parameters
In this case, if ssh doesn't pass environment variables correctly, the
proxy will not see the shared library path. There are some ways to pass
this along described here:
http://superuser.com/questions/163167/when-sshing-how-can-i-set-an-environment-variable-on-the-server-that-changes-f
But none of them seem portable or without requiring changes on the
server side (sshd). Also, I'd prefer a solution that'll work for not
just ssh, but all of the launchers that we support.
-- Pavan
On 12/13/2011 09:50 AM, Andrus, Brian Contractor wrote:
> Hello,
>
> I have been trying to get mpiexec to run across nodes and it kept
> failing with:
>
> /opt/mpich2/1.4.1p1/intel12/bin/hydra_pmi_proxy: error while loading
> shared libraries: libimf.so: cannot open shared object file: No such
> file or directory
>
> HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:184): assert
> (!closed) failed
>
> ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1
> downstream
>
> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
> returned error status
>
> HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error
> waiting for event
>
> main (./ui/mpich/mpiexec.c:405): process manager error waiting for
> completion
>
> Now I did troubleshoot this to be due to the fact that on the machines
> other than the one I am running mpiexec from do not have a proper
> LD_LIBRARY_PATH set.
>
> Shouldn’t that get passed to the ‘children’ when I do an mpiexec?
>
> We use modules and have different compilers here, so folks need to load
> the appropriate module to access the appropriate libraries. In this
> case, the application was compiled using the intel compiler, so the
> environment gets set using:
>
> module load compile/intel mpi/mpich2
>
> To get the application to run via mpich2 on multiple nodes, I have had
> to add that line to the .bashrc
>
> I have tried using the various genv and env options (genvall, envall,
> envlist, genvlist) to no avail. The only way I was able to successfully
> run was to set the LD_LIBRARY_PATH in my .bashrc.
>
> Am I missing something about the environment settings when using mpiexec
> or mpirun?
>
> Brian Andrus
>
> ITACS/Research Computing
>
> Naval Postgraduate School
>
> Monterey, California
>
> voice: 831-656-6238
>
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list