[mpich-discuss] a question about process-core binding

teng ma xiaok1981 at gmail.com
Tue Aug 2 12:18:31 CDT 2011


If -binding is removed, it's no problem to scale to 768 processes. (32
nodes, 24 core /node). if without binding parameter, what kind of binding
strategy mpich2 will use? ( fill out all slots of one nodes, and then
another node,   or round robin along nodes?)

Thanks
Teng

On Tue, Aug 2, 2011 at 1:14 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

>
> Please keep mpich-discuss cc'ed. The below error doesn't seem to be a
> binding issue. Did you try removing the -binding option to see if it works
> without that?
>
>
> On 08/02/2011 12:12 PM, teng ma wrote:
>
>> thanks for the answer. I met another issue with hydra binding. When
>> processes launched exceed 408,  it throws error like following:
>>
>>
>> I run it like
>> mpiexec -n 408 -binding cpu -f ~/host_mpich ./IMB-MPI1 Bcast -npmin 408
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
>> MPIR_Init_thread(388).........**.....:
>> MPID_Init(139)................**.....: channel initialization failed
>> MPIDI_CH3_Init(38)............**.....:
>> MPID_nem_init(234)............**.....:
>> MPID_nem_tcp_init(99).........**.....:
>> MPID_nem_tcp_get_business_**card(325):
>> MPIDI_Get_IP_for_iface(276)...**.....: ioctl failed errno=19 - No such
>> device
>>
>>
>> When processes is less than 407, -binding cpu/rr looks good.   If I
>> remove -binding cpu/rr, just with -f ~/host_mpich, it's still ok no
>> matter how many processes. My host_mpich is like:
>>
>> stremi-7.reims.grid5000.fr:24 <http://stremi-7.reims.**grid5000.fr:24<http://stremi-7.reims.grid5000.fr:24>
>> >
>> stremi-35.reims.grid5000.fr:24 <http://stremi-35.reims.**grid5000.fr:24<http://stremi-35.reims.grid5000.fr:24>
>> >
>> stremi-28.reims.grid5000.fr:24 <http://stremi-28.reims.**grid5000.fr:24<http://stremi-28.reims.grid5000.fr:24>
>> >
>> stremi-38.reims.grid5000.fr:24 <http://stremi-38.reims.**grid5000.fr:24<http://stremi-38.reims.grid5000.fr:24>
>> >
>> stremi-32.reims.grid5000.fr:24 <http://stremi-32.reims.**grid5000.fr:24<http://stremi-32.reims.grid5000.fr:24>
>> >
>> stremi-26.reims.grid5000.fr:24 <http://stremi-26.reims.**grid5000.fr:24<http://stremi-26.reims.grid5000.fr:24>
>> >
>> stremi-22.reims.grid5000.fr:24 <http://stremi-22.reims.**grid5000.fr:24<http://stremi-22.reims.grid5000.fr:24>
>> >
>> stremi-43.reims.grid5000.fr:24 <http://stremi-43.reims.**grid5000.fr:24<http://stremi-43.reims.grid5000.fr:24>
>> >
>> stremi-30.reims.grid5000.fr:24 <http://stremi-30.reims.**grid5000.fr:24<http://stremi-30.reims.grid5000.fr:24>
>> >
>> stremi-41.reims.grid5000.fr:24 <http://stremi-41.reims.**grid5000.fr:24<http://stremi-41.reims.grid5000.fr:24>
>> >
>> stremi-4.reims.grid5000.fr:24 <http://stremi-4.reims.**grid5000.fr:24<http://stremi-4.reims.grid5000.fr:24>
>> >
>> stremi-34.reims.grid5000.fr:24 <http://stremi-34.reims.**grid5000.fr:24<http://stremi-34.reims.grid5000.fr:24>
>> >
>> stremi-24.reims.grid5000.fr:24 <http://stremi-24.reims.**grid5000.fr:24<http://stremi-24.reims.grid5000.fr:24>
>> >
>> stremi-23.reims.grid5000.fr:24 <http://stremi-23.reims.**grid5000.fr:24<http://stremi-23.reims.grid5000.fr:24>
>> >
>> stremi-20.reims.grid5000.fr:24 <http://stremi-20.reims.**grid5000.fr:24<http://stremi-20.reims.grid5000.fr:24>
>> >
>> stremi-36.reims.grid5000.fr:24 <http://stremi-36.reims.**grid5000.fr:24<http://stremi-36.reims.grid5000.fr:24>
>> >
>> stremi-29.reims.grid5000.fr:24 <http://stremi-29.reims.**grid5000.fr:24<http://stremi-29.reims.grid5000.fr:24>
>> >
>> stremi-19.reims.grid5000.fr:24 <http://stremi-19.reims.**grid5000.fr:24<http://stremi-19.reims.grid5000.fr:24>
>> >
>> stremi-42.reims.grid5000.fr:24 <http://stremi-42.reims.**grid5000.fr:24<http://stremi-42.reims.grid5000.fr:24>
>> >
>> stremi-39.reims.grid5000.fr:24 <http://stremi-39.reims.**grid5000.fr:24<http://stremi-39.reims.grid5000.fr:24>
>> >
>> stremi-27.reims.grid5000.fr:24 <http://stremi-27.reims.**grid5000.fr:24<http://stremi-27.reims.grid5000.fr:24>
>> >
>> stremi-44.reims.grid5000.fr:24 <http://stremi-44.reims.**grid5000.fr:24<http://stremi-44.reims.grid5000.fr:24>
>> >
>> stremi-37.reims.grid5000.fr:24 <http://stremi-37.reims.**grid5000.fr:24<http://stremi-37.reims.grid5000.fr:24>
>> >
>> stremi-31.reims.grid5000.fr:24 <http://stremi-31.reims.**grid5000.fr:24<http://stremi-31.reims.grid5000.fr:24>
>> >
>> stremi-6.reims.grid5000.fr:24 <http://stremi-6.reims.**grid5000.fr:24<http://stremi-6.reims.grid5000.fr:24>
>> >
>> stremi-33.reims.grid5000.fr:24 <http://stremi-33.reims.**grid5000.fr:24<http://stremi-33.reims.grid5000.fr:24>
>> >
>> stremi-3.reims.grid5000.fr:24 <http://stremi-3.reims.**grid5000.fr:24<http://stremi-3.reims.grid5000.fr:24>
>> >
>> stremi-2.reims.grid5000.fr:24 <http://stremi-2.reims.**grid5000.fr:24<http://stremi-2.reims.grid5000.fr:24>
>> >
>> stremi-40.reims.grid5000.fr:24 <http://stremi-40.reims.**grid5000.fr:24<http://stremi-40.reims.grid5000.fr:24>
>> >
>> stremi-21.reims.grid5000.fr:24 <http://stremi-21.reims.**grid5000.fr:24<http://stremi-21.reims.grid5000.fr:24>
>> >
>> stremi-5.reims.grid5000.fr:24 <http://stremi-5.reims.**grid5000.fr:24<http://stremi-5.reims.grid5000.fr:24>
>> >
>> stremi-25.reims.grid5000.fr:24 <http://stremi-25.reims.**grid5000.fr:24<http://stremi-25.reims.grid5000.fr:24>
>> >
>>
>>
>> The configure of mpich2 is just default configure.
>>
>> Thanks
>> Teng
>>
>> On Tue, Aug 2, 2011 at 12:43 PM, Pavan Balaji <balaji at mcs.anl.gov
>> <mailto:balaji at mcs.anl.gov>> wrote:
>>
>>
>>    mpiexec -binding rr
>>
>>      -- Pavan
>>
>>
>>    On 08/02/2011 11:35 AM, teng ma wrote:
>>
>>        If I want to do a process-core binding like MVAPICH2's scatter way:
>>        assign MPI ranks by nodes in host file, e.g.
>>        host1
>>        host2
>>        host3
>>
>>        rank 0 host 1's core 0
>>        rank 1 host 2's core 0
>>        rank 2 host 3's core 0
>>        rank 3 host 1's core 1
>>        rank 4 host 2's core 1
>>        rank 5 host 3's core 1
>>
>>        Is there any easy method in mpich2-1.4 to achieve this binding?
>>
>>        Teng Ma
>>
>>
>>
>>        ______________________________**___________________
>>        mpich-discuss mailing list
>>        mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.**gov<mpich-discuss at mcs.anl.gov>
>> >
>>
>>        https://lists.mcs.anl.gov/__**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss>
>>        <https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>> **>
>>
>>
>>    --
>>    Pavan Balaji
>>    http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%**7Ebalaji<http://www.mcs.anl.gov/%7Ebalaji>
>> >
>>
>>
>>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110802/1a97bbc9/attachment-0001.htm>


More information about the mpich-discuss mailing list