[mpich-discuss] a question about process-core binding

teng ma xiaok1981 at gmail.com
Tue Aug 2 13:23:22 CDT 2011


tma at freims:~$ mpiexec -l -n 2 -binding cpu -f ~/host_mpich env
[0] SHELL=/bin/bash
[0] SSH_CLIENT=192.168.159.239 59246 22
[0] LC_ALL=en_US.UTF-8
[0] USER=tma
[0] MAIL=/var/mail/tma
[0]
PATH=/home/tma/opt/bin:/home/tma/opt/mpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/grid5000/code/bin
[0] PWD=/home/tma
[0] LANG=en_US.UTF-8
[0] SHLVL=1
[0] HOME=/home/tma
[0] LOGNAME=tma
[0] SSH_CONNECTION=192.168.159.239 59246 172.16.175.100 22
[0] _=/home/tma/opt/mpi/bin/mpiexec
[0] TERM=xterm
[0] OLDPWD=/home/tma/opt/mpi
[0] SSH_TTY=/dev/pts/26
[0] GFORTRAN_UNBUFFERED_PRECONNECTED=y
[0] MPICH_INTERFACE_HOSTNAME=stremi-4.reims.grid5000.fr
[0] PMI_RANK=0
[0] PMI_FD=6
[0] PMI_SIZE=2
[1] SHELL=/bin/bash
[1] SSH_CLIENT=192.168.159.239 59246 22
[1] LC_ALL=en_US.UTF-8
[1] USER=tma
[1] MAIL=/var/mail/tma
[1]
PATH=/home/tma/opt/bin:/home/tma/opt/mpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/grid5000/code/bin
[1] PWD=/home/tma
[1] LANG=en_US.UTF-8
[1] SHLVL=1
[1] HOME=/home/tma
[1] LOGNAME=tma
[1] SSH_CONNECTION=192.168.159.239 59246 172.16.175.100 22
[1] _=/home/tma/opt/mpi/bin/mpiexec
[1] TERM=xterm
[1] OLDPWD=/home/tma/opt/mpi
[1] SSH_TTY=/dev/pts/26
[1] GFORTRAN_UNBUFFERED_PRECONNECTED=y
[1] MPICH_INTERFACE_HOSTNAME=stremi-4.reims.grid5000.fr
[1] PMI_RANK=1
[1] PMI_FD=7
[1] PMI_SIZE=2


and


tma at freims:~$ mpiexec -l -n 2 -f ~/host_mpich env
[0] SHELL=/bin/bash
[0] SSH_CLIENT=192.168.159.239 59246 22
[0] LC_ALL=en_US.UTF-8
[0] USER=tma
[0] MAIL=/var/mail/tma
[0]
PATH=/home/tma/opt/bin:/home/tma/opt/mpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/grid5000/code/bin
[0] PWD=/home/tma
[0] LANG=en_US.UTF-8
[0] SHLVL=1
[0] HOME=/home/tma
[0] LOGNAME=tma
[0] SSH_CONNECTION=192.168.159.239 59246 172.16.175.100 22
[0] _=/home/tma/opt/mpi/bin/mpiexec
[0] TERM=xterm
[0] OLDPWD=/home/tma/opt/mpi
[0] SSH_TTY=/dev/pts/26
[0] GFORTRAN_UNBUFFERED_PRECONNECTED=y
[0] MPICH_INTERFACE_HOSTNAME=stremi-4.reims.grid5000.fr
[0] PMI_RANK=0
[0] PMI_FD=5
[0] PMI_SIZE=2
[1] SHELL=/bin/bash
[1] SSH_CLIENT=192.168.159.239 59246 22
[1] LC_ALL=en_US.UTF-8
[1] USER=tma
[1] MAIL=/var/mail/tma
[1]
PATH=/home/tma/opt/bin:/home/tma/opt/mpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/grid5000/code/bin
[1] PWD=/home/tma
[1] LANG=en_US.UTF-8
[1] SHLVL=1
[1] HOME=/home/tma
[1] LOGNAME=tma
[1] SSH_CONNECTION=192.168.159.239 59246 172.16.175.100 22
[1] _=/home/tma/opt/mpi/bin/mpiexec
[1] TERM=xterm
[1] OLDPWD=/home/tma/opt/mpi
[1] SSH_TTY=/dev/pts/26
[1] GFORTRAN_UNBUFFERED_PRECONNECTED=y
[1] MPICH_INTERFACE_HOSTNAME=stremi-4.reims.grid5000.fr
[1] PMI_RANK=1
[1] PMI_FD=6
[1] PMI_SIZE=2



On Tue, Aug 2, 2011 at 1:49 PM, Darius Buntinas <buntinas at mcs.anl.gov>wrote:

>
> Can you send us the output of the following?
>
>    mpiexec -l -n 2 -binding cpu -f ~/host_mpich env
> and
>    mpiexec -l -n 2 -f ~/host_mpich env
>
> Thanks,
> -d
>
> On Aug 2, 2011, at 12:18 PM, teng ma wrote:
>
> > If -binding is removed, it's no problem to scale to 768 processes. (32
> nodes, 24 core /node). if without binding parameter, what kind of binding
> strategy mpich2 will use? ( fill out all slots of one nodes, and then
> another node,   or round robin along nodes?)
> >
> > Thanks
> > Teng
> >
> > On Tue, Aug 2, 2011 at 1:14 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> >
> > Please keep mpich-discuss cc'ed. The below error doesn't seem to be a
> binding issue. Did you try removing the -binding option to see if it works
> without that?
> >
> >
> > On 08/02/2011 12:12 PM, teng ma wrote:
> > thanks for the answer. I met another issue with hydra binding. When
> > processes launched exceed 408,  it throws error like following:
> >
> >
> > I run it like
> > mpiexec -n 408 -binding cpu -f ~/host_mpich ./IMB-MPI1 Bcast -npmin 408
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> > Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> > MPIR_Init_thread(388)..............:
> > MPID_Init(139).....................: channel initialization failed
> > MPIDI_CH3_Init(38).................:
> > MPID_nem_init(234).................:
> > MPID_nem_tcp_init(99)..............:
> > MPID_nem_tcp_get_business_card(325):
> > MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such
> device
> >
> >
> > When processes is less than 407, -binding cpu/rr looks good.   If I
> > remove -binding cpu/rr, just with -f ~/host_mpich, it's still ok no
> > matter how many processes. My host_mpich is like:
> >
> > stremi-7.reims.grid5000.fr:24 <http://stremi-7.reims.grid5000.fr:24>
> > stremi-35.reims.grid5000.fr:24 <http://stremi-35.reims.grid5000.fr:24>
> > stremi-28.reims.grid5000.fr:24 <http://stremi-28.reims.grid5000.fr:24>
> > stremi-38.reims.grid5000.fr:24 <http://stremi-38.reims.grid5000.fr:24>
> > stremi-32.reims.grid5000.fr:24 <http://stremi-32.reims.grid5000.fr:24>
> > stremi-26.reims.grid5000.fr:24 <http://stremi-26.reims.grid5000.fr:24>
> > stremi-22.reims.grid5000.fr:24 <http://stremi-22.reims.grid5000.fr:24>
> > stremi-43.reims.grid5000.fr:24 <http://stremi-43.reims.grid5000.fr:24>
> > stremi-30.reims.grid5000.fr:24 <http://stremi-30.reims.grid5000.fr:24>
> > stremi-41.reims.grid5000.fr:24 <http://stremi-41.reims.grid5000.fr:24>
> > stremi-4.reims.grid5000.fr:24 <http://stremi-4.reims.grid5000.fr:24>
> > stremi-34.reims.grid5000.fr:24 <http://stremi-34.reims.grid5000.fr:24>
> > stremi-24.reims.grid5000.fr:24 <http://stremi-24.reims.grid5000.fr:24>
> > stremi-23.reims.grid5000.fr:24 <http://stremi-23.reims.grid5000.fr:24>
> > stremi-20.reims.grid5000.fr:24 <http://stremi-20.reims.grid5000.fr:24>
> > stremi-36.reims.grid5000.fr:24 <http://stremi-36.reims.grid5000.fr:24>
> > stremi-29.reims.grid5000.fr:24 <http://stremi-29.reims.grid5000.fr:24>
> > stremi-19.reims.grid5000.fr:24 <http://stremi-19.reims.grid5000.fr:24>
> > stremi-42.reims.grid5000.fr:24 <http://stremi-42.reims.grid5000.fr:24>
> > stremi-39.reims.grid5000.fr:24 <http://stremi-39.reims.grid5000.fr:24>
> > stremi-27.reims.grid5000.fr:24 <http://stremi-27.reims.grid5000.fr:24>
> > stremi-44.reims.grid5000.fr:24 <http://stremi-44.reims.grid5000.fr:24>
> > stremi-37.reims.grid5000.fr:24 <http://stremi-37.reims.grid5000.fr:24>
> > stremi-31.reims.grid5000.fr:24 <http://stremi-31.reims.grid5000.fr:24>
> > stremi-6.reims.grid5000.fr:24 <http://stremi-6.reims.grid5000.fr:24>
> > stremi-33.reims.grid5000.fr:24 <http://stremi-33.reims.grid5000.fr:24>
> > stremi-3.reims.grid5000.fr:24 <http://stremi-3.reims.grid5000.fr:24>
> > stremi-2.reims.grid5000.fr:24 <http://stremi-2.reims.grid5000.fr:24>
> > stremi-40.reims.grid5000.fr:24 <http://stremi-40.reims.grid5000.fr:24>
> > stremi-21.reims.grid5000.fr:24 <http://stremi-21.reims.grid5000.fr:24>
> > stremi-5.reims.grid5000.fr:24 <http://stremi-5.reims.grid5000.fr:24>
> > stremi-25.reims.grid5000.fr:24 <http://stremi-25.reims.grid5000.fr:24>
> >
> >
> > The configure of mpich2 is just default configure.
> >
> > Thanks
> > Teng
> >
> > On Tue, Aug 2, 2011 at 12:43 PM, Pavan Balaji <balaji at mcs.anl.gov
> > <mailto:balaji at mcs.anl.gov>> wrote:
> >
> >
> >    mpiexec -binding rr
> >
> >      -- Pavan
> >
> >
> >    On 08/02/2011 11:35 AM, teng ma wrote:
> >
> >        If I want to do a process-core binding like MVAPICH2's scatter
> way:
> >        assign MPI ranks by nodes in host file, e.g.
> >        host1
> >        host2
> >        host3
> >
> >        rank 0 host 1's core 0
> >        rank 1 host 2's core 0
> >        rank 2 host 3's core 0
> >        rank 3 host 1's core 1
> >        rank 4 host 2's core 1
> >        rank 5 host 3's core 1
> >
> >        Is there any easy method in mpich2-1.4 to achieve this binding?
> >
> >        Teng Ma
> >
> >
> >
> >        _________________________________________________
> >        mpich-discuss mailing list
> >        mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> >
> >        https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss
> >        <https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
> >
> >
> >    --
> >    Pavan Balaji
> >    http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
> >
> >
> >
> > --
> > Pavan Balaji
> > http://www.mcs.anl.gov/~balaji
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110802/fc1e53ce/attachment-0001.htm>


More information about the mpich-discuss mailing list