[mpich-discuss] MPICH2 with MOSIX
Alex Margolin
alex.margolin at mail.huji.ac.il
Tue Nov 29 16:24:37 CST 2011
On 11/28/2011 03:08 AM, Pavan Balaji wrote:
> Does this problem not occur if you don't run it with mosrun and
> directly launch the application?
No, and that's the problem... I can't understand how could the mosrun
have this effect (output follows).
Would you know how can I make it end successfully (or at least close
gracefully) even if there is one invalid file descriptor at some point?
I can change it locally in the polling code, but that would by hacky,
and i'd like to enable MOSIX support in the mainline (MOSIX modification
is acceptable).
Can you tell me where this poll file descriptor could be set?
Regards,
Alex
Here's a run without mosrun:
alex at singularity:~/huji/benchmarks/simple$ MPICH_NO_LOCAL=1
~/huji/mpich/bin/mpiexec -n 3 ./simple
Started as #2 out of 3
Started as #0 out of Started as #31 out of 3
#0 Got 0 from 0
#1 Got 0 from 0
#1 Got 1 from 1
alex at singularity:~/huji/benchmarks/simple$ MPICH_NO_LOCAL=1
~/huji/mpich/bin/mpiexec -verbose -n 3 ./simple
host: singularity
==================================================================================================
mpiexec options:
----------------
Base path: /home/alex/huji/mpich/bin/
Launcher: (null)
Debug level: 1
Enable X: -1
Global environment:
-------------------
MPICH_NO_LOCAL=1
SSH_AGENT_PID=5616
GPG_AGENT_INFO=/tmp/keyring-LFH8Y8/gpg:0:1
M2=/usr/local/apache-maven-3.0.3/bin
TERM=xterm
SHELL=/bin/bash
XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322602836.837449-839385204
WINDOWID=60817414
OLDPWD=/home/alex
GNOME_KEYRING_CONTROL=/tmp/keyring-LFH8Y8
GTK_MODULES=canberra-gtk-module:canberra-gtk-module
USER=alex
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
SSH_AUTH_SOCK=/tmp/keyring-LFH8Y8/ssh
SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5578,unix/singularity:/tmp/.ICE-unix/5578
USERNAME=alex
DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin
MAVEN_HOME=/usr/local/apache-maven-3.0.3
DESKTOP_SESSION=ubuntu
LC_MESSAGES=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
PWD=/home/alex/huji/benchmarks/simple
JAVA_HOME=/usr/lib/jvm/default-java
GNOME_KEYRING_PID=5569
LANG=en_US.UTF-8
MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
UBUNTU_MENUPROXY=libappmenu.so
COMPIZ_CONFIG_PROFILE=ubuntu
GDMSESSION=ubuntu
SHLVL=1
HOME=/home/alex
M2_HOME=/usr/local/apache-maven-3.0.3
LANGUAGE=en_US:en
GNOME_DESKTOP_SESSION_ID=this-is-deprecated
IBUS_ENABLE_SYNC_MODE=1
LOGNAME=alex
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-BACHSkd2lq,guid=2120de282c96b2648e0d633b0000004f
LC_CTYPE=en_US.UTF-8
XDG_CURRENT_DESKTOP=Unity
COLORTERM=gnome-terminal
XAUTHORITY=/home/alex/.Xauthority
_=/home/alex/huji/mpich/bin/mpiexec
Hydra internal environment:
---------------------------
GFORTRAN_UNBUFFERED_PRECONNECTED=y
Proxy information:
*********************
[1] proxy: singularity (1 cores)
Exec list: ./simple (3 processes);
==================================================================================================
[mpiexec at singularity] Timeout set to -1 (-1 means infinite)
[mpiexec at singularity] Got a control port string of singularity:47026
Proxy launch args: /home/alex/huji/mpich/bin/hydra_pmi_proxy
--control-port singularity:47026 --debug --rmk user --launcher ssh
--demux poll --pgid 0 --retries 10 --proxy-id
[mpiexec at singularity] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
Arguments being passed to proxy 0:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
--hostname singularity --global-core-map 0,1,0 --filler-process-map
0,1,0 --global-process-count 3 --auto-cleanup 1 --pmi-rank -1
--pmi-kvsname kvs_6427_0 --pmi-process-mapping (vector,(0,1,1))
--ckpoint-num -1 --global-inherited-env 46 'MPICH_NO_LOCAL=1'
'SSH_AGENT_PID=5616' 'GPG_AGENT_INFO=/tmp/keyring-LFH8Y8/gpg:0:1'
'M2=/usr/local/apache-maven-3.0.3/bin' 'TERM=xterm' 'SHELL=/bin/bash'
'XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322602836.837449-839385204'
'WINDOWID=60817414' 'OLDPWD=/home/alex'
'GNOME_KEYRING_CONTROL=/tmp/keyring-LFH8Y8'
'GTK_MODULES=canberra-gtk-module:canberra-gtk-module' 'USER=alex'
'XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0'
'XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0'
'SSH_AUTH_SOCK=/tmp/keyring-LFH8Y8/ssh'
'SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5578,unix/singularity:/tmp/.ICE-unix/5578'
'USERNAME=alex' 'DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path'
'XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg'
'PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin'
'MAVEN_HOME=/usr/local/apache-maven-3.0.3' 'DESKTOP_SESSION=ubuntu'
'LC_MESSAGES=en_US.UTF-8' 'LC_COLLATE=en_US.UTF-8'
'PWD=/home/alex/huji/benchmarks/simple'
'JAVA_HOME=/usr/lib/jvm/default-java' 'GNOME_KEYRING_PID=5569'
'LANG=en_US.UTF-8'
'MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path'
'UBUNTU_MENUPROXY=libappmenu.so' 'COMPIZ_CONFIG_PROFILE=ubuntu'
'GDMSESSION=ubuntu' 'SHLVL=1' 'HOME=/home/alex'
'M2_HOME=/usr/local/apache-maven-3.0.3' 'LANGUAGE=en_US:en'
'GNOME_DESKTOP_SESSION_ID=this-is-deprecated' 'IBUS_ENABLE_SYNC_MODE=1'
'LOGNAME=alex'
'XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/'
'DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-BACHSkd2lq,guid=2120de282c96b2648e0d633b0000004f'
'LC_CTYPE=en_US.UTF-8' 'XDG_CURRENT_DESKTOP=Unity'
'COLORTERM=gnome-terminal' 'XAUTHORITY=/home/alex/.Xauthority'
'_=/home/alex/huji/mpich/bin/mpiexec' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3
--exec-local-env 0 --exec-wdir /home/alex/huji/benchmarks/simple
--exec-args 1 ./simple
[mpiexec at singularity] Launch arguments:
/home/alex/huji/mpich/bin/hydra_pmi_proxy --control-port
singularity:47026 --debug --rmk user --launcher ssh --demux poll --pgid
0 --retries 10 --proxy-id 0
[proxy:0:0 at singularity] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 8): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 8): get_maxes
[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 8): get_appnum
[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 6): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 0): get_maxes
[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 0): get_appnum
[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 6): get_maxes
[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 6): get_appnum
[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname
[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in
[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] got pmi command (from 0): put
kvsname=kvs_6427_0 key=P0-businesscard
value=description#singularity$port#41963$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding
upstream
[proxy:0:0 at singularity] got pmi command (from 8): put
kvsname=kvs_6427_0 key=P2-businesscard
value=description#singularity$port#47755$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_6427_0 key=P0-businesscard
value=description#singularity$port#41963$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=put_result rc=0
msg=success
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_6427_0 key=P2-businesscard
value=description#singularity$port#47755$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=put_result rc=0
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at singularity] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in
[proxy:0:0 at singularity] got pmi command (from 6): put
kvsname=kvs_6427_0 key=P1-businesscard
value=description#singularity$port#53631$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding
upstream
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_6427_0 key=P1-businesscard
value=description#singularity$port#53631$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=put_result rc=0
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in
[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
Started as #Started as #0 out of 3[proxy:0:0 at singularity] got pmi
command (from 0): get
kvsname=kvs_6427_0 key=P1-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6427_0
key=P1-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get
kvsname=kvs_6427_0 key=P1-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0
msg=success value=description#singularity$port#53631$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result;
forwarding downstream
2 out of 3
Started as #1 out of 3
[proxy:0:0 at singularity] got pmi command (from 0): get
kvsname=kvs_6427_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6427_0
key=P2-businesscard) upstream
[proxy:0:0 at singularity] got pmi command (from 6): get
kvsname=kvs_6427_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6427_0
key=P2-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get
kvsname=kvs_6427_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0
msg=success value=description#singularity$port#47755$ifname#127.0.0.1$
#0 Got 0 from 0
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get
kvsname=kvs_6427_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=get_result rc=0
msg=success value=description#singularity$port#47755$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result;
forwarding downstream
[proxy:0:0 at singularity] we don't understand the response get_result;
forwarding downstream
#1 Got 0 from 0
#1 Got 1 from 1
[proxy:0:0 at singularity] got pmi command (from 0): finalize
[proxy:0:0 at singularity] PMI response: cmd=finalize_ack
[proxy:0:0 at singularity] got pmi command (from 6): finalize
[proxy:0:0 at singularity] PMI response: cmd=finalize_ack
[proxy:0:0 at singularity] got pmi command (from 8): finalize
[proxy:0:0 at singularity] PMI response: cmd=finalize_ack
alex at singularity:~/huji/benchmarks/simple$
More information about the mpich-discuss
mailing list