[mpich-discuss] MPICH2 with MOSIX

Alex Margolin alex.margolin at mail.huji.ac.il
Tue Nov 29 16:24:37 CST 2011


On 11/28/2011 03:08 AM, Pavan Balaji wrote:
> Does this problem not occur if you don't run it with mosrun and 
> directly launch the application? 

No, and that's the problem... I can't understand how could the mosrun 
have this effect (output follows).
Would you know how can I make it end successfully (or at least close 
gracefully) even if there is one invalid file descriptor at some point?
I can change it locally in the polling code, but that would by hacky, 
and i'd like to enable MOSIX support in the mainline (MOSIX modification 
is acceptable).
Can you tell me where this poll file descriptor could be set?

Regards,
Alex

Here's a run without mosrun:

alex at singularity:~/huji/benchmarks/simple$ MPICH_NO_LOCAL=1 
~/huji/mpich/bin/mpiexec -n 3 ./simple
Started as #2 out of 3
Started as #0 out of Started as #31 out of 3

#0 Got 0 from 0
#1 Got 0 from 0
#1 Got 1 from 1
alex at singularity:~/huji/benchmarks/simple$ MPICH_NO_LOCAL=1 
~/huji/mpich/bin/mpiexec -verbose -n 3 ./simple
host: singularity

==================================================================================================
mpiexec options:
----------------
   Base path: /home/alex/huji/mpich/bin/
   Launcher: (null)
   Debug level: 1
   Enable X: -1

   Global environment:
   -------------------
     MPICH_NO_LOCAL=1
     SSH_AGENT_PID=5616
     GPG_AGENT_INFO=/tmp/keyring-LFH8Y8/gpg:0:1
     M2=/usr/local/apache-maven-3.0.3/bin
     TERM=xterm
     SHELL=/bin/bash
     
XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322602836.837449-839385204
     WINDOWID=60817414
     OLDPWD=/home/alex
     GNOME_KEYRING_CONTROL=/tmp/keyring-LFH8Y8
     GTK_MODULES=canberra-gtk-module:canberra-gtk-module
     USER=alex
     XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
     XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
     SSH_AUTH_SOCK=/tmp/keyring-LFH8Y8/ssh
     
SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5578,unix/singularity:/tmp/.ICE-unix/5578
     USERNAME=alex
     DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
     XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
     
PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin
     MAVEN_HOME=/usr/local/apache-maven-3.0.3
     DESKTOP_SESSION=ubuntu
     LC_MESSAGES=en_US.UTF-8
     LC_COLLATE=en_US.UTF-8
     PWD=/home/alex/huji/benchmarks/simple
     JAVA_HOME=/usr/lib/jvm/default-java
     GNOME_KEYRING_PID=5569
     LANG=en_US.UTF-8
     MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
     UBUNTU_MENUPROXY=libappmenu.so
     COMPIZ_CONFIG_PROFILE=ubuntu
     GDMSESSION=ubuntu
     SHLVL=1
     HOME=/home/alex
     M2_HOME=/usr/local/apache-maven-3.0.3
     LANGUAGE=en_US:en
     GNOME_DESKTOP_SESSION_ID=this-is-deprecated
     IBUS_ENABLE_SYNC_MODE=1
     LOGNAME=alex
     
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/
     
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-BACHSkd2lq,guid=2120de282c96b2648e0d633b0000004f
     LC_CTYPE=en_US.UTF-8
     XDG_CURRENT_DESKTOP=Unity
     COLORTERM=gnome-terminal
     XAUTHORITY=/home/alex/.Xauthority
     _=/home/alex/huji/mpich/bin/mpiexec

   Hydra internal environment:
   ---------------------------
     GFORTRAN_UNBUFFERED_PRECONNECTED=y


     Proxy information:
     *********************
       [1] proxy: singularity (1 cores)
       Exec list: ./simple (3 processes);


==================================================================================================

[mpiexec at singularity] Timeout set to -1 (-1 means infinite)
[mpiexec at singularity] Got a control port string of singularity:47026

Proxy launch args: /home/alex/huji/mpich/bin/hydra_pmi_proxy 
--control-port singularity:47026 --debug --rmk user --launcher ssh 
--demux poll --pgid 0 --retries 10 --proxy-id

[mpiexec at singularity] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
Arguments being passed to proxy 0:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME 
--hostname singularity --global-core-map 0,1,0 --filler-process-map 
0,1,0 --global-process-count 3 --auto-cleanup 1 --pmi-rank -1 
--pmi-kvsname kvs_6427_0 --pmi-process-mapping (vector,(0,1,1)) 
--ckpoint-num -1 --global-inherited-env 46 'MPICH_NO_LOCAL=1' 
'SSH_AGENT_PID=5616' 'GPG_AGENT_INFO=/tmp/keyring-LFH8Y8/gpg:0:1' 
'M2=/usr/local/apache-maven-3.0.3/bin' 'TERM=xterm' 'SHELL=/bin/bash' 
'XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322602836.837449-839385204' 
'WINDOWID=60817414' 'OLDPWD=/home/alex' 
'GNOME_KEYRING_CONTROL=/tmp/keyring-LFH8Y8' 
'GTK_MODULES=canberra-gtk-module:canberra-gtk-module' 'USER=alex' 
'XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0' 
'XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0' 
'SSH_AUTH_SOCK=/tmp/keyring-LFH8Y8/ssh' 
'SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5578,unix/singularity:/tmp/.ICE-unix/5578' 
'USERNAME=alex' 'DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path' 
'XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg' 
'PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin' 
'MAVEN_HOME=/usr/local/apache-maven-3.0.3' 'DESKTOP_SESSION=ubuntu' 
'LC_MESSAGES=en_US.UTF-8' 'LC_COLLATE=en_US.UTF-8' 
'PWD=/home/alex/huji/benchmarks/simple' 
'JAVA_HOME=/usr/lib/jvm/default-java' 'GNOME_KEYRING_PID=5569' 
'LANG=en_US.UTF-8' 
'MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path' 
'UBUNTU_MENUPROXY=libappmenu.so' 'COMPIZ_CONFIG_PROFILE=ubuntu' 
'GDMSESSION=ubuntu' 'SHLVL=1' 'HOME=/home/alex' 
'M2_HOME=/usr/local/apache-maven-3.0.3' 'LANGUAGE=en_US:en' 
'GNOME_DESKTOP_SESSION_ID=this-is-deprecated' 'IBUS_ENABLE_SYNC_MODE=1' 
'LOGNAME=alex' 
'XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/' 
'DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-BACHSkd2lq,guid=2120de282c96b2648e0d633b0000004f' 
'LC_CTYPE=en_US.UTF-8' 'XDG_CURRENT_DESKTOP=Unity' 
'COLORTERM=gnome-terminal' 'XAUTHORITY=/home/alex/.Xauthority' 
'_=/home/alex/huji/mpich/bin/mpiexec' --global-user-env 0 
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3 
--exec-local-env 0 --exec-wdir /home/alex/huji/benchmarks/simple 
--exec-args 1 ./simple

[mpiexec at singularity] Launch arguments: 
/home/alex/huji/mpich/bin/hydra_pmi_proxy --control-port 
singularity:47026 --debug --rmk user --launcher ssh --demux poll --pgid 
0 --retries 10 --proxy-id 0
[proxy:0:0 at singularity] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1 
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 8): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1 
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 8): get_maxes

[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256 
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 8): get_appnum

[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 6): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1 
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 0): get_maxes

[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256 
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 0): get_appnum

[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in

[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 6): get_maxes

[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256 
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 6): get_appnum

[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6427_0
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in

[proxy:0:0 at singularity] got pmi command (from 8): barrier_in

[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] got pmi command (from 0): put
kvsname=kvs_6427_0 key=P0-businesscard 
value=description#singularity$port#41963$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding 
upstream
[proxy:0:0 at singularity] got pmi command (from 8): put
kvsname=kvs_6427_0 key=P2-businesscard 
value=description#singularity$port#47755$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding 
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put 
kvsname=kvs_6427_0 key=P0-businesscard 
value=description#singularity$port#41963$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=put_result rc=0 
msg=success
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put 
kvsname=kvs_6427_0 key=P2-businesscard 
value=description#singularity$port#47755$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=put_result rc=0 
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result; 
forwarding downstream
[proxy:0:0 at singularity] we don't understand the response put_result; 
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in

[proxy:0:0 at singularity] got pmi command (from 6): put
kvsname=kvs_6427_0 key=P1-businesscard 
value=description#singularity$port#53631$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding 
upstream
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in

[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put 
kvsname=kvs_6427_0 key=P1-businesscard 
value=description#singularity$port#53631$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=put_result rc=0 
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result; 
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in

[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
Started as #Started as #0 out of 3[proxy:0:0 at singularity] got pmi 
command (from 0): get
kvsname=kvs_6427_0 key=P1-businesscard

[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6427_0 
key=P1-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get 
kvsname=kvs_6427_0 key=P1-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0 
msg=success value=description#singularity$port#53631$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result; 
forwarding downstream
2 out of 3
Started as #1 out of 3
[proxy:0:0 at singularity] got pmi command (from 0): get
kvsname=kvs_6427_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6427_0 
key=P2-businesscard) upstream
[proxy:0:0 at singularity] got pmi command (from 6): get
kvsname=kvs_6427_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6427_0 
key=P2-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get 
kvsname=kvs_6427_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0 
msg=success value=description#singularity$port#47755$ifname#127.0.0.1$
#0 Got 0 from 0
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get 
kvsname=kvs_6427_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=get_result rc=0 
msg=success value=description#singularity$port#47755$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result; 
forwarding downstream
[proxy:0:0 at singularity] we don't understand the response get_result; 
forwarding downstream
#1 Got 0 from 0
#1 Got 1 from 1
[proxy:0:0 at singularity] got pmi command (from 0): finalize

[proxy:0:0 at singularity] PMI response: cmd=finalize_ack
[proxy:0:0 at singularity] got pmi command (from 6): finalize

[proxy:0:0 at singularity] PMI response: cmd=finalize_ack
[proxy:0:0 at singularity] got pmi command (from 8): finalize

[proxy:0:0 at singularity] PMI response: cmd=finalize_ack
alex at singularity:~/huji/benchmarks/simple$





More information about the mpich-discuss mailing list