[mpich-discuss] MPICH2 with MOSIX

Alex Margolin alex.margolin at mail.huji.ac.il
Sun Nov 27 13:05:45 CST 2011


On 11/27/2011 07:23 PM, Pavan Balaji wrote:
> Alex,
>
> On 11/27/2011 01:21 AM, Alex Margolin wrote:
>> alex at singularity:~/huji/benchmarks/simple$ mosrun
>> ~/huji/mpich/bin/mpiexec -n 3 ./simple
>> #1 Got 0 from Fatal error in MPI_Finalize: Other MPI error, error stack:
>> MPI_Finalize(281).................: MPI_Finalize failed
>> MPI_Finalize(209).................:
>> MPID_Finalize(117)................:
>
> It looks like MOSIX is providing a bunch of nodes to MPICH2, and the 
> three processes are spread out across a few nodes. Otherwise, TCP 
> would not have been used at all. You can pass the -verbose flag to 
> mpiexec to see what exactly is going on over there. The best bet would 
> be to try to reproduce it "natively" by executing from a command-line 
> over a bunch of processes.
>

Thanks for your reply.
This makes sense, but my laptop (singularity) is a standalone machine 
and I didn't configure anything - only the usual "./configure 
(--prefix...) ; make ; make install ". My work is an optimization of the 
TCP communication of MPI (dynamically loaded to replace send/recv and 
other berkley socket API), so I ran with MPICH_NO_LOCAL hoping this will 
use TCP even if all the instances are local. I tried to run it with the 
mosrun both "inside" and "outside" the mpiexec and got the same result 
(just to be clear - I didn't run anything before this command, so if 
this requires any preparation such as running mpd server/agents - I 
didn't do it... maybe this is the reason?):

alex at singularity:~/huji/benchmarks/simple$ MPICH_NO_LOCAL=1 
~/huji/mpich/bin/mpiexec -verbose -n 3 /bin/mosrun -w ./simple
host: singularity

==================================================================================================
mpiexec options:
----------------
   Base path: /home/alex/huji/mpich/bin/
   Launcher: (null)
   Debug level: 1
   Enable X: -1

   Global environment:
   -------------------
     MPICH_NO_LOCAL=1
     SSH_AGENT_PID=5465
     GPG_AGENT_INFO=/tmp/keyring-it7zNW/gpg:0:1
     M2=/usr/local/apache-maven-3.0.3/bin
     TERM=xterm
     SHELL=/bin/bash
     
XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322417182.663487-739938010
     WINDOWID=58720262
     OLDPWD=/home/alex/huji/benchmarks
     GNOME_KEYRING_CONTROL=/tmp/keyring-it7zNW
     GTK_MODULES=canberra-gtk-module:canberra-gtk-module
     USER=alex
     XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
     XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
     SSH_AUTH_SOCK=/tmp/keyring-it7zNW/ssh
     
SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5427,unix/singularity:/tmp/.ICE-unix/5427
     USERNAME=alex
     DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
     XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
     
PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin
     MAVEN_HOME=/usr/local/apache-maven-3.0.3
     DESKTOP_SESSION=ubuntu
     LC_MESSAGES=en_US.UTF-8
     LC_COLLATE=en_US.UTF-8
     PWD=/home/alex/huji/benchmarks/simple
     JAVA_HOME=/usr/lib/jvm/default-java
     GNOME_KEYRING_PID=5418
     LANG=en_US.UTF-8
     MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
     UBUNTU_MENUPROXY=libappmenu.so
     COMPIZ_CONFIG_PROFILE=ubuntu
     GDMSESSION=ubuntu
     SHLVL=1
     HOME=/home/alex
     M2_HOME=/usr/local/apache-maven-3.0.3
     LANGUAGE=en_US:en
     GNOME_DESKTOP_SESSION_ID=this-is-deprecated
     IBUS_ENABLE_SYNC_MODE=1
     LOGNAME=alex
     
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/
     
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-steeAksbqm,guid=0a0c4494b5d2f522ea8cfa360000003d
     LC_CTYPE=en_US.UTF-8
     XDG_CURRENT_DESKTOP=Unity
     COLORTERM=gnome-terminal
     XAUTHORITY=/home/alex/.Xauthority
     _=/home/alex/huji/mpich/bin/mpiexec

   Hydra internal environment:
   ---------------------------
     GFORTRAN_UNBUFFERED_PRECONNECTED=y


     Proxy information:
     *********************
       [1] proxy: singularity (1 cores)
       Exec list: /bin/mosrun (3 processes);


==================================================================================================

[mpiexec at singularity] Timeout set to -1 (-1 means infinite)
[mpiexec at singularity] Got a control port string of singularity:56971

Proxy launch args: /home/alex/huji/mpich/bin/hydra_pmi_proxy 
--control-port singularity:56971 --debug --rmk user --launcher ssh 
--demux poll --pgid 0 --retries 10 --proxy-id

[mpiexec at singularity] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
Arguments being passed to proxy 0:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME 
--hostname singularity --global-core-map 0,1,0 --filler-process-map 
0,1,0 --global-process-count 3 --auto-cleanup 1 --pmi-rank -1 
--pmi-kvsname kvs_6671_0 --pmi-process-mapping (vector,(0,1,1)) 
--ckpoint-num -1 --global-inherited-env 46 'MPICH_NO_LOCAL=1' 
'SSH_AGENT_PID=5465' 'GPG_AGENT_INFO=/tmp/keyring-it7zNW/gpg:0:1' 
'M2=/usr/local/apache-maven-3.0.3/bin' 'TERM=xterm' 'SHELL=/bin/bash' 
'XDG_SESSION_COOKIE=9c464babfe9bdc4b66b07aee0000000a-1322417182.663487-739938010' 
'WINDOWID=58720262' 'OLDPWD=/home/alex/huji/benchmarks' 
'GNOME_KEYRING_CONTROL=/tmp/keyring-it7zNW' 
'GTK_MODULES=canberra-gtk-module:canberra-gtk-module' 'USER=alex' 
'XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0' 
'XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0' 
'SSH_AUTH_SOCK=/tmp/keyring-it7zNW/ssh' 
'SESSION_MANAGER=local/singularity:@/tmp/.ICE-unix/5427,unix/singularity:/tmp/.ICE-unix/5427' 
'USERNAME=alex' 'DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path' 
'XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg' 
'PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin' 
'MAVEN_HOME=/usr/local/apache-maven-3.0.3' 'DESKTOP_SESSION=ubuntu' 
'LC_MESSAGES=en_US.UTF-8' 'LC_COLLATE=en_US.UTF-8' 
'PWD=/home/alex/huji/benchmarks/simple' 
'JAVA_HOME=/usr/lib/jvm/default-java' 'GNOME_KEYRING_PID=5418' 
'LANG=en_US.UTF-8' 
'MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path' 
'UBUNTU_MENUPROXY=libappmenu.so' 'COMPIZ_CONFIG_PROFILE=ubuntu' 
'GDMSESSION=ubuntu' 'SHLVL=1' 'HOME=/home/alex' 
'M2_HOME=/usr/local/apache-maven-3.0.3' 'LANGUAGE=en_US:en' 
'GNOME_DESKTOP_SESSION_ID=this-is-deprecated' 'IBUS_ENABLE_SYNC_MODE=1' 
'LOGNAME=alex' 
'XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/' 
'DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-steeAksbqm,guid=0a0c4494b5d2f522ea8cfa360000003d' 
'LC_CTYPE=en_US.UTF-8' 'XDG_CURRENT_DESKTOP=Unity' 
'COLORTERM=gnome-terminal' 'XAUTHORITY=/home/alex/.Xauthority' 
'_=/home/alex/huji/mpich/bin/mpiexec' --global-user-env 0 
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3 
--exec-local-env 0 --exec-wdir /home/alex/huji/benchmarks/simple 
--exec-args 3 /bin/mosrun -w ./simple

[mpiexec at singularity] Launch arguments: 
/home/alex/huji/mpich/bin/hydra_pmi_proxy --control-port 
singularity:56971 --debug --rmk user --launcher ssh --demux poll --pgid 
0 --retries 10 --proxy-id 0
[proxy:0:0 at singularity] got pmi command (from 6): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1 
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 6): get_maxes

[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256 
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 6): get_appnum

[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 6): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in

[proxy:0:0 at singularity] got pmi command (from 8): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1 
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 8): get_maxes

[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256 
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 8): get_appnum

[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 8): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in

[proxy:0:0 at singularity] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at singularity] PMI response: cmd=response_to_init pmi_version=1 
pmi_subversion=1 rc=0
[proxy:0:0 at singularity] got pmi command (from 0): get_maxes

[proxy:0:0 at singularity] PMI response: cmd=maxes kvsname_max=256 
keylen_max=64 vallen_max=1024
[proxy:0:0 at singularity] got pmi command (from 0): get_appnum

[proxy:0:0 at singularity] PMI response: cmd=appnum appnum=0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at singularity] PMI response: cmd=my_kvsname kvsname=kvs_6671_0
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in

[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] got pmi command (from 0): put
kvsname=kvs_6671_0 key=P0-businesscard 
value=description#singularity$port#33786$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding 
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put 
kvsname=kvs_6671_0 key=P0-businesscard 
value=description#singularity$port#33786$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=put_result rc=0 
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result; 
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 0): barrier_in

[proxy:0:0 at singularity] got pmi command (from 6): put
kvsname=kvs_6671_0 key=P1-businesscard 
value=description#singularity$port#52009$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding 
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put 
kvsname=kvs_6671_0 key=P1-businesscard 
value=description#singularity$port#52009$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=put_result rc=0 
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result; 
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 6): barrier_in

[proxy:0:0 at singularity] got pmi command (from 8): put
kvsname=kvs_6671_0 key=P2-businesscard 
value=description#singularity$port#42011$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand this command put; forwarding 
upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=put 
kvsname=kvs_6671_0 key=P2-businesscard 
value=description#singularity$port#42011$ifname#127.0.0.1$
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=put_result rc=0 
msg=success
[proxy:0:0 at singularity] we don't understand the response put_result; 
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 8): barrier_in

[proxy:0:0 at singularity] forwarding command (cmd=barrier_in) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at singularity] PMI response to fd 6 pid 8: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
[proxy:0:0 at singularity] PMI response: cmd=barrier_out
Started as #0Started as #Started as # out of 12 out of  out of 33

3
[proxy:0:0 at singularity] got pmi command (from 0): get
kvsname=kvs_6671_0 key=P1-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6671_0 
key=P1-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get 
kvsname=kvs_6671_0 key=P1-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0 
msg=success value=description#singularity$port#52009$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result; 
forwarding downstream
[proxy:0:0 at singularity] got pmi command (from 0): get
kvsname=kvs_6671_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6671_0 
key=P2-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get 
kvsname=kvs_6671_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 0: cmd=get_result rc=0 
msg=success value=description#singularity$port#42011$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result; 
forwarding downstream
#0 Got 0 from 0
[proxy:0:0 at singularity] got pmi command (from 6): get
kvsname=kvs_6671_0 key=P2-businesscard
[proxy:0:0 at singularity] forwarding command (cmd=get kvsname=kvs_6671_0 
key=P2-businesscard) upstream
[mpiexec at singularity] [pgid: 0] got PMI command: cmd=get 
kvsname=kvs_6671_0 key=P2-businesscard
[mpiexec at singularity] PMI response to fd 6 pid 6: cmd=get_result rc=0 
msg=success value=description#singularity$port#42011$ifname#127.0.0.1$
[proxy:0:0 at singularity] we don't understand the response get_result; 
forwarding downstream
#1 Got 0 from 0
#1Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device 
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid 
argument
  Got Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device 
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid 
argument
1alex at singularity:~/huji/benchmarks/simple$



More information about the mpich-discuss mailing list