[mpich-discuss] MPICH2 with MOSIX
Alex Margolin
alex.margolin at mail.huji.ac.il
Sun Nov 27 13:05:45 CST 2011
On 11/27/2011 07:23 PM, Pavan Balaji wrote:
> Alex,
>
> On 11/27/2011 01:21 AM, Alex Margolin wrote:
>> alex at singularity:~/huji/benchmarks/simple$ mosrun
>> ~/huji/mpich/bin/mpiexec -n 3 ./simple
>> #1 Got 0 from Fatal error in MPI_Finalize: Other MPI error, error stack:
>> MPI_Finalize(281).................: MPI_Finalize failed
>> MPI_Finalize(209).................:
>> MPID_Finalize(117)................:
>
> It looks like MOSIX is providing a bunch of nodes to MPICH2, and the
> three processes are spread out across a few nodes. Otherwise, TCP
> would not have been used at all. You can pass the -verbose flag to
> mpiexec to see what exactly is going on over there. The best bet would
> be to try to reproduce it "natively" by executing from a command-line
> over a bunch of processes.
>
Thanks for your reply.
This makes sense, but my laptop (singularity) is a standalone machine
and I didn't configure anything - only the usual "./configure
(--prefix...) ; make ; make install ". My work is an optimization of the
TCP communication of MPI (dynamically loaded to replace send/recv and
other berkley socket API), so I ran with MPICH_NO_LOCAL hoping this will
use TCP even if all the instances are local. I tried to run it with the
mosrun both "inside" and "outside" the mpiexec and got the same result
(just to be clear - I didn't run anything before this command, so if
this requires any preparation such as running mpd server/agents - I
didn't do it... maybe this is the reason?):
alex at singularity:~/huji/benchmarks/simple$ MPICH_NO_LOCAL=1
~/huji/mpich/bin/mpiexec -verbose -n 3 /bin/mosrun -w ./simple
host: singularity
==================================================================================================
mpiexec options:
----------------
Base path: /home/alex/huji/mpich/bin/
Launcher: (null)
Debug level: 1
Enable X: -1
Global environment:
-------------------
MPICH_NO_LOCAL=1
SSH_AGENT_PID=5465
GPG_AGENT_INFO=/tmp/keyring-it7zNW/gpg:0:1
M2=/usr/local/apache-maven-3.0.3/bin
TERM=xterm
SHELL=/bin/bash
XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322417182.663487-739938010
WINDOWID=58720262
OLDPWD=/home/alex/huji/benchmarks
GNOME_KEYRING_CONTROL=/tmp/keyring-it7zNW
GTK_MODULES=canberra-gtk-module:canberra-gtk-module
USER=alex
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
SSH_AUTH_SOCK=/tmp/keyring-it7zNW/ssh
SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5427,unix/singularity:/tmp/.ICE-unix/5427
USERNAME=alex
DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin
MAVEN_HOME=/usr/local/apache-maven-3.0.3
DESKTOP_SESSION=ubuntu
LC_MESSAGES=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
PWD=/home/alex/huji/benchmarks/simple
JAVA_HOME=/usr/lib/jvm/default-java
GNOME_KEYRING_PID=5418
LANG=en_US.UTF-8
MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
UBUNTU_MENUPROXY=libappmenu.so
COMPIZ_CONFIG_PROFILE=ubuntu
GDMSESSION=ubuntu
SHLVL=1
HOME=/home/alex
M2_HOME=/usr/local/apache-maven-3.0.3
LANGUAGE=en_US:en
GNOME_DESKTOP_SESSION_ID=this-is-deprecated
IBUS_ENABLE_SYNC_MODE=1
LOGNAME=alex
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-steeAksbqm,guid=0a0c4494b5d2f522ea8cfa360000003d
LC_CTYPE=en_US.UTF-8
XDG_CURRENT_DESKTOP=Unity
COLORTERM=gnome-terminal
XAUTHORITY=/home/alex/.Xauthority
_=/home/alex/huji/mpich/bin/mpiexec
Hydra internal environment:
---------------------------
GFORTRAN_UNBUFFERED_PRECONNECTED=y
Proxy information:
*********************
[1] proxy: singularity (1 cores)
Exec list: /bin/mosrun (3 processes);
==================================================================================================
[mpiexec at singularity] Timeout set to -1 (-1 means infinite)
[mpiexec at singularity] Got a control port string of singularity:56971
Proxy launch args: /home/alex/huji/mpich/bin/hydra_pmi_proxy
--control-port singularity:56971 --debug --rmk user --launcher ssh
--demux poll --pgid 0 --retries 10 --proxy-id
[mpiexec at singularity] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
Arguments being passed to proxy 0:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
--hostname singularity --global-core-map 0,1,0 --filler-process-map
0,1,0 --global-process-count 3 --auto-cleanup 1 --pmi-rank -1
--pmi-kvsname kvs_6671_0 --pmi-process-mapping (vector,(0,1,1))
--ckpoint-num -1 --global-inherited-env 46 'MPICH_NO_LOCAL=1'
'SSH_AGENT_PID=5465' 'GPG_AGENT_INFO=/tmp/keyring-it7zNW/gpg:0:1'
'M2=/usr/local/apache-maven-3.0.3/bin' 'TERM=xterm' 'SHELL=/bin/bash'
'XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322417182.663487-739938010'
'WINDOWID=58720262' 'OLDPWD=/home/alex/huji/benchmarks'
'GNOME_KEYRING_CONTROL=/tmp/keyring-it7zNW'
'GTK_MODULES=canberra-gtk-module:canberra-gtk-module' 'USER=alex'
'XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0'
'XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0'
'SSH_AUTH_SOCK=/tmp/keyring-it7zNW/ssh'
'SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5427,unix/singularity:/tmp/.ICE-unix/5427'
'USERNAME=alex' 'DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path'
'XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg'
'PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin'
'MAVEN_HOME=/usr/local/apache-maven-3.0.3' 'DESKTOP_SESSION=ubuntu'
'LC_MESSAGES=en_US.UTF-8' 'LC_COLLATE=en_US.UTF-8'
'PWD=/home/alex/huji/benchmarks/simple'
'JAVA_HOME=/usr/lib/jvm/default-java' 'GNOME_KEYRING_PID=5418'
'LANG=en_US.UTF-8'
'MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path'
'UBUNTU_MENUPROXY=libappmenu.so' 'COMPIZ_CONFIG_PROFILE=ubuntu'
'GDMSESSION=ubuntu' 'SHLVL=1' 'HOME=/home/alex'
'M2_HOME=/usr/local/apache-maven-3.0.3' 'LANGUAGE=en_US:en'
'GNOME_DESKTOP_SESSION_ID=this-is-deprecated' 'IBUS_ENABLE_SYNC_MODE=1'
'LOGNAME=alex'
'XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/'
'DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-steeAksbqm,guid=0a0c4494b5d2f522ea8cfa360000003d'
'LC_CTYPE=en_US.UTF-8' 'XDG_CURRENT_DESKTOP=Unity'
'COLORTERM=gnome-terminal' 'XAUTHORITY=/home/alex/.Xauthority'
'_=/home/alex/huji/mpich/bin/mpiexec' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3
--exec-local-env 0 --exec-wdir /home/alex/huji/benchmarks/simple
--exec-args 3 /bin/mosrun -w ./simple
[mpiexec at singularity] Launch arguments:
/home/alex/huji/mpich/bin/hydra_pmi_proxy --control-port
singularity:56971 --debug --rmk user --launcher ssh --demux poll --pgid
0 --retries 10 --proxy-id 0
[proxy:0:0 at singularity] got pmi command (from 6): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 6): get_maxes
[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 6): get_appnum
[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in
[proxy:0:0 at singularity] got pmi command (from 8): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 8): get_maxes
[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 8): get_appnum
[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in
[proxy:0:0 at singularity] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 0): get_maxes
[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 0): get_appnum
[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in
[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] got pmi command (from 0): put
kvsname=kvs_6671_0 key=P0-businesscard
value=description#singularity$port#33786$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_6671_0 key=P0-businesscard
value=description#singularity$port#33786$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=put_result rc=0
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in
[proxy:0:0 at singularity] got pmi command (from 6): put
kvsname=kvs_6671_0 key=P1-businesscard
value=description#singularity$port#52009$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_6671_0 key=P1-businesscard
value=description#singularity$port#52009$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=put_result rc=0
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in
[proxy:0:0 at singularity] got pmi command (from 8): put
kvsname=kvs_6671_0 key=P2-businesscard
value=description#singularity$port#42011$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_6671_0 key=P2-businesscard
value=description#singularity$port#42011$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=put_result rc=0
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in
[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
Started as #0Started as #Started as # out of 12 out of out of 33
3
[proxy:0:0 at singularity] got pmi command (from 0): get
kvsname=kvs_6671_0 key=P1-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6671_0
key=P1-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get
kvsname=kvs_6671_0 key=P1-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0
msg=success value=description#singularity$port#52009$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result;
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 0): get
kvsname=kvs_6671_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6671_0
key=P2-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get
kvsname=kvs_6671_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0
msg=success value=description#singularity$port#42011$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result;
forwarding downstream
#0 Got 0 from 0
[proxy:0:0 at singularity] got pmi command (from 6): get
kvsname=kvs_6671_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6671_0
key=P2-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get
kvsname=kvs_6671_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=get_result rc=0
msg=success value=description#singularity$port#42011$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result;
forwarding downstream
#1 Got 0 from 0
#1Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid
argument
Got Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid
argument
1alex at singularity:~/huji/benchmarks/simple$
More information about the mpich-discuss
mailing list