[mpich-discuss] a question about process-core binding

Darius Buntinas buntinas at mcs.anl.gov
Tue Aug 2 12:49:21 CDT 2011


Can you send us the output of the following?

    mpiexec -l -n 2 -binding cpu -f ~/host_mpich env
and
    mpiexec -l -n 2 -f ~/host_mpich env

Thanks,
-d

On Aug 2, 2011, at 12:18 PM, teng ma wrote:

> If -binding is removed, it's no problem to scale to 768 processes. (32 nodes, 24 core /node). if without binding parameter, what kind of binding strategy mpich2 will use? ( fill out all slots of one nodes, and then another node,   or round robin along nodes?)
> 
> Thanks
> Teng 
> 
> On Tue, Aug 2, 2011 at 1:14 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> 
> Please keep mpich-discuss cc'ed. The below error doesn't seem to be a binding issue. Did you try removing the -binding option to see if it works without that?
> 
> 
> On 08/02/2011 12:12 PM, teng ma wrote:
> thanks for the answer. I met another issue with hydra binding. When
> processes launched exceed 408,  it throws error like following:
> 
> 
> I run it like
> mpiexec -n 408 -binding cpu -f ~/host_mpich ./IMB-MPI1 Bcast -npmin 408
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(388)..............:
> MPID_Init(139).....................: channel initialization failed
> MPIDI_CH3_Init(38).................:
> MPID_nem_init(234).................:
> MPID_nem_tcp_init(99)..............:
> MPID_nem_tcp_get_business_card(325):
> MPIDI_Get_IP_for_iface(276)........: ioctl failed errno=19 - No such device
> 
> 
> When processes is less than 407, -binding cpu/rr looks good.   If I
> remove -binding cpu/rr, just with -f ~/host_mpich, it's still ok no
> matter how many processes. My host_mpich is like:
> 
> stremi-7.reims.grid5000.fr:24 <http://stremi-7.reims.grid5000.fr:24>
> stremi-35.reims.grid5000.fr:24 <http://stremi-35.reims.grid5000.fr:24>
> stremi-28.reims.grid5000.fr:24 <http://stremi-28.reims.grid5000.fr:24>
> stremi-38.reims.grid5000.fr:24 <http://stremi-38.reims.grid5000.fr:24>
> stremi-32.reims.grid5000.fr:24 <http://stremi-32.reims.grid5000.fr:24>
> stremi-26.reims.grid5000.fr:24 <http://stremi-26.reims.grid5000.fr:24>
> stremi-22.reims.grid5000.fr:24 <http://stremi-22.reims.grid5000.fr:24>
> stremi-43.reims.grid5000.fr:24 <http://stremi-43.reims.grid5000.fr:24>
> stremi-30.reims.grid5000.fr:24 <http://stremi-30.reims.grid5000.fr:24>
> stremi-41.reims.grid5000.fr:24 <http://stremi-41.reims.grid5000.fr:24>
> stremi-4.reims.grid5000.fr:24 <http://stremi-4.reims.grid5000.fr:24>
> stremi-34.reims.grid5000.fr:24 <http://stremi-34.reims.grid5000.fr:24>
> stremi-24.reims.grid5000.fr:24 <http://stremi-24.reims.grid5000.fr:24>
> stremi-23.reims.grid5000.fr:24 <http://stremi-23.reims.grid5000.fr:24>
> stremi-20.reims.grid5000.fr:24 <http://stremi-20.reims.grid5000.fr:24>
> stremi-36.reims.grid5000.fr:24 <http://stremi-36.reims.grid5000.fr:24>
> stremi-29.reims.grid5000.fr:24 <http://stremi-29.reims.grid5000.fr:24>
> stremi-19.reims.grid5000.fr:24 <http://stremi-19.reims.grid5000.fr:24>
> stremi-42.reims.grid5000.fr:24 <http://stremi-42.reims.grid5000.fr:24>
> stremi-39.reims.grid5000.fr:24 <http://stremi-39.reims.grid5000.fr:24>
> stremi-27.reims.grid5000.fr:24 <http://stremi-27.reims.grid5000.fr:24>
> stremi-44.reims.grid5000.fr:24 <http://stremi-44.reims.grid5000.fr:24>
> stremi-37.reims.grid5000.fr:24 <http://stremi-37.reims.grid5000.fr:24>
> stremi-31.reims.grid5000.fr:24 <http://stremi-31.reims.grid5000.fr:24>
> stremi-6.reims.grid5000.fr:24 <http://stremi-6.reims.grid5000.fr:24>
> stremi-33.reims.grid5000.fr:24 <http://stremi-33.reims.grid5000.fr:24>
> stremi-3.reims.grid5000.fr:24 <http://stremi-3.reims.grid5000.fr:24>
> stremi-2.reims.grid5000.fr:24 <http://stremi-2.reims.grid5000.fr:24>
> stremi-40.reims.grid5000.fr:24 <http://stremi-40.reims.grid5000.fr:24>
> stremi-21.reims.grid5000.fr:24 <http://stremi-21.reims.grid5000.fr:24>
> stremi-5.reims.grid5000.fr:24 <http://stremi-5.reims.grid5000.fr:24>
> stremi-25.reims.grid5000.fr:24 <http://stremi-25.reims.grid5000.fr:24>
> 
> 
> The configure of mpich2 is just default configure.
> 
> Thanks
> Teng
> 
> On Tue, Aug 2, 2011 at 12:43 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
> 
> 
>    mpiexec -binding rr
> 
>      -- Pavan
> 
> 
>    On 08/02/2011 11:35 AM, teng ma wrote:
> 
>        If I want to do a process-core binding like MVAPICH2's scatter way:
>        assign MPI ranks by nodes in host file, e.g.
>        host1
>        host2
>        host3
> 
>        rank 0 host 1's core 0
>        rank 1 host 2's core 0
>        rank 2 host 3's core 0
>        rank 3 host 1's core 1
>        rank 4 host 2's core 1
>        rank 5 host 3's core 1
> 
>        Is there any easy method in mpich2-1.4 to achieve this binding?
> 
>        Teng Ma
> 
> 
> 
>        _________________________________________________
>        mpich-discuss mailing list
>        mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> 
>        https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss
>        <https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
> 
> 
>    --
>    Pavan Balaji
>    http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
> 
> 
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list