[mpich-discuss] How to config mpiexec use rsh as default bootstrap?

Limin Gu lgu at penguincomputing.com
Fri Feb 18 11:02:48 CST 2011


Pavan,

Thank you!

I tried to reconfigure with --with-hydra-bss=rsh,ssh,fork,slurm, I got
segmentation fault at run time. I'll try
--with-hydra-bss=rsh,ssh,fork,slurm,ll,lsf,sge,pbs,none,persist later.

Below I first run with "-bootstrap rsh -bootstrap-exec /usr/bin/bprsh", it
looks great. Then I set the HYDRA_BOOTSTRAP and HYDRA_BOOTSTRAP_EXEC
environment variables, and run without bootstrap option, it still runs, but
a bunch of garbage also spill out. Are they normal message and I just
somehow make them quite?

[root at flatline examples]# mpiexec -bootstrap rsh -bootstrap-exec
/usr/bin/bprsh -n 12 -f machinefile  ./cpi
Process 4 of 12 is on n1
Process 8 of 12 is on n2
Process 0 of 12 is on n0
Process 5 of 12 is on n1
Process 9 of 12 is on n2
Process 1 of 12 is on n0
Process 6 of 12 is on n1
Process 10 of 12 is on n2
Process 2 of 12 is on n0
Process 7 of 12 is on n1
Process 11 of 12 is on n2
Process 3 of 12 is on n0
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.004450
[root at flatline examples]# export HYDRA_BOOTSTRAP=rsh
[root at flatline examples]# export HYDRA_BOOTSTRAP_EXEC=/usr/bin/bprsh
[root at flatline examples]# mpiexec -n 12 -f machinefile  ./cpi
Process 1 of 12 is on n0
Process 4 of 12 is on n1
Process 8 of 12 is on n2
Process 2 of 12 is on n0
Process 6 of 12 is on n1
Process 9 of 12 is on n2
Process 3 of 12 is on n0
Process 7 of 12 is on n1
Process 10 of 12 is on n2
Process 0 of 12 is on n0
Process 5 of 12 is on n1
Process 11 of 12 is on n2
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.003011
*** glibc detected *** /home/lgu/mpich2-install/bin/hydra_pmi_proxy:
munmap_chunk(): invalid pointer: 0x00007fff63b6ec5d ***
*** glibc detected *** /home/lgu/mpich2-install/bin/hydra_pmi_proxy:
munmap_chunk(): invalid pointer: 0x00007fffcf689c5d ***
*** glibc detected *** /home/lgu/mpich2-install/bin/hydra_pmi_proxy:
munmap_chunk(): invalid pointer: 0x00007fff94651c5d ***
======= Backtrace: =========
======= Backtrace: =========
/lib64/libc.so.6(cfree+0x166)[0x3ef10729d6]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x41a52b]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x404ad5]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3ef101d994]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x403ab9]
======= Memory map: ========
/lib64/libc.so.6(cfree+0x166)[0x3ef10729d6]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x41a52b]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x404ad5]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3ef101d994]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x403ab9]
======= Memory map: ========
00400000-0043c000 r-xp 00000000 00:16 2810395
/home/lgu/mpich2-install/bin/hydra_pmi_proxy
0063c000-0063e000 rw-p 0003c000 00:16 2810395
/home/lgu/mpich2-install/bin/hydra_pmi_proxy
0063e000-00662000 rw-p 0063e000 00:00 0
0c13b000-0c15c000 rw-p 0c13b000 00:00 0
[heap]
3ef0c00000-3ef0c1c000 r-xp 00000000 00:12 3430
/lib64/ld-2.5.so
3ef0e1b000-3ef0e1c000 r--p 0001b000 00:12 3430
/lib64/ld-2.5.so
3ef0e1c000-3ef0e1d000 rw-p 0001c000 00:12 3430
/lib64/ld-2.5.so
3ef1000000-3ef114e000 r-xp 00000000 00:12 813
/lib64/libc.so.6
3ef114e000-3ef134e000 ---p 0014e000 00:12 813
/lib64/libc.so.6
3ef134e000-3ef1352000 r--p 0014e000 00:12 813
/lib64/libc.so.6
3ef1352000-3ef1353000 rw-p 00152000 00:12 813
/lib64/libc.so.6
3ef1353000-3ef1358000 rw-p 3ef1353000 00:00 0
3ef1400000-3ef1482000 r-xp 00000000 00:12 859
/lib64/libm.so.6
3ef1482000-3ef1681000 ---p 00082000 00:12 859
/lib64/libm.so.6
3ef1681000-3ef1682000 r--p 00081000 00:12 859
/lib64/libm.so.6
3ef1682000-3ef1683000 rw-p 00082000 00:12 859
/lib64/libm.so.6
3ef1800000-3ef1805000 r-xp 00000000 00:12 5939
/usr/lib64/libnuma.so.1
3ef1805000-3ef1a04000 ---p 00005000 00:12 5939
/usr/lib64/libnuma.so.1
3ef1a04000-3ef1a05000 rw-p 00004000 00:12 5939
/usr/lib64/libnuma.so.1
3ef1c00000-3ef1c16000 r-xp 00000000 00:12 1098
/lib64/libpthread.so.0
3ef1c16000-3ef1e15000 ---p 00016000 00:12 1098
/lib64/libpthread.so.0
3ef1e15000-3ef1e16000 r--p 00015000 00:12 1098
/lib64/libpthread.so.0
3ef1e16000-3ef1e17000 rw-p 00016000 00:12 1098
/lib64/libpthread.so.0
3ef1e17000-3ef1e1b000 rw-p 3ef1e17000 00:00 0
3ef2000000-3ef2014000 r-xp 00000000 00:12 6497
/usr/lib64/libz.so.1
3ef2014000-3ef2213000 ---p 00014000 00:12 6497
/usr/lib64/libz.so.1
3ef2213000-3ef2214000 rw-p 00013000 00:12 6497
/usr/lib64/libz.so.1
3ef2400000-3ef2407000 r-xp 00000000 00:12 1094
/lib64/librt.so.1
3ef2407000-3ef2607000 ---p 00007000 00:12 1094
/lib64/librt.so.1
3ef2607000-3ef2608000 r--p 00007000 00:12 1094
/lib64/librt.so.1
3ef2608000-3ef2609000 rw-p 00008000 00:12 1094
/lib64/librt.so.1
3ef3c00000-3ef3c15000 r-xp 00000000 00:12 1106
/lib64/libnsl.so.1
3ef3c15000-3ef3e14000 ---p 00015000 00:12 1106
/lib64/libnsl.so.1
3ef3e14000-3ef3e15000 r--p 00014000 00:12 1106
/lib64/libnsl.so.1
3ef3e15000-3ef3e16000 rw-p 00015000 00:12 1106
/lib64/libnsl.so.1
3ef3e16000-3ef3e18000 rw-p 3ef3e16000 00:00 0
3ef9400000-3ef9533000 r-xp 00000000 00:12 6495
/usr/lib64/libxml2.so.2
3ef9533000-3ef9733000 ---p 00133000 00:12 6495
/usr/lib64/libxml2.so.2
3ef9733000-3ef973c000 rw-p 00133000 00:12 6495
/usr/lib64/libxml2.so.2
3ef973c000-3ef973d000 rw-p 3ef973c000 00:00 0
3efbc00000-3efbc0d000 r-xp 00000000 00:12 4377
/lib64/libgcc_s.so.1
3efbc0d000-3efbe0d000 ---p 0000d000 00:12 4377
/lib64/libgcc_s.so.1
3efbe0d000-3efbe0e000 rw-p 0000d000 00:12 4377
/lib64/libgcc_s.so.1
2abbba01e000-2abbba020000 rw-p 2abbba01e000 00:00 0
2abbba020000-2abbba024000 r-xp 00000000 00:16 2647315
/home/lgu/mpich2-install/lib/libmpl.so.1.0.0
2abbba024000-2abbba223000 ---p 00004000 00:16 2647315
/home/lgu/mpich2-install/lib/libmpl.so.1.0.0
2abbba223000-2abbba224000 rw-p 00003000 00:16 2647315
/home/lgu/mpich2-install/lib/libmpl.so.1.0.0
2abbba243000-2abbba246000 rw-p 2abbba243000 00:00 0
2abbba246000-2abbba248000 r-xp 00000000 00:12 830
/lib64/libdl.so.2
2abbba248000-2abbba448000 ---p 00002000 00:12 830
/lib64/libdl.so.2
2abbba448000-2abbba449000 r--p 00002000 00:12 830
/lib64/libdl.so.2
2abbba449000-2abbba44a000 rw-p 00003000 00:12 830
/lib64/libdl.so.2
2abbba44a000-2abbba44c000 rw-p 2abbba44a000 00:00 0
2abbba44c000-2abbba481000 r--s 00000000 00:12 3970
/var/run/nscd/db52YQK7 (deleted)
7fff9463d000-7fff94652000 rw-p 7ffffffe9000 00:00 0
[stack]7fff94693000-7fff94696000 r-xp 7fff94693000 00:00
0                      [vdso]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0
[vsyscall]
======= Backtrace: =========
/lib64/libc.so.6(cfree+0x166)[0x3ef10729d6]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x41a52b]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x404ad5]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3ef101d994]
/home/lgu/mpich2-install/bin/hydra_pmi_proxy[0x403ab9]
======= Memory map: ========
00400000-0043c000 r-xp 00000000 00:16 2810395
/home/lgu/mpich2-install/bin/hydra_pmi_proxy
0063c000-0063e000 rw-p 0003c000 00:16 2810395
/home/lgu/mpich2-install/bin/hydra_pmi_proxy
0063e000-00662000 rw-p 0063e000 00:00 0
09275000-09296000 rw-p 09275000 00:00 0
[heap]
3ef0c00000-3ef0c1c000 r-xp 00000000 00:12 3477
/lib64/ld-2.5.so
3ef0e1b000-3ef0e1c000 r--p 0001b000 00:12 3477
/lib64/ld-2.5.so
3ef0e1c000-3ef0e1d000 rw-p 0001c000 00:12 3477
/lib64/ld-2.5.so
3ef1000000-3ef114e000 r-xp 00000000 00:12 813
/lib64/libc.so.6
3ef114e000-3ef134e000 ---p 0014e000 00:12 813
/lib64/libc.so.6
3ef134e000-3ef1352000 r--p 0014e000 00:12 813
/lib64/libc.so.6
3ef1352000-3ef1353000 rw-p 00152000 00:12 813
/lib64/libc.so.6
3ef1353000-3ef1358000 rw-p 3ef1353000 00:00 0
3ef1400000-3ef1482000 r-xp 00000000 00:12 859
/lib64/libm.so.6
3ef1482000-3ef1681000 ---p 00082000 00:12 859
/lib64/libm.so.6
3ef1681000-3ef1682000 r--p 00081000 00:12 859
/lib64/libm.so.6
3ef1682000-3ef1683000 rw-p 00082000 00:12 859
/lib64/libm.so.6
3ef1800000-3ef1805000 r-xp 00000000 00:12 5898
/usr/lib64/libnuma.so.1
3ef1805000-3ef1a04000 ---p 00005000 00:12 5898
/usr/lib64/libnuma.so.1
3ef1a04000-3ef1a05000 rw-p 00004000 00:12 5898
/usr/lib64/libnuma.so.1
3ef1c00000-3ef1c16000 r-xp 00000000 00:12 1115
/lib64/libpthread.so.0
3ef1c16000-3ef1e15000 ---p 00016000 00:12 1115
/lib64/libpthread.so.0
3ef1e15000-3ef1e16000 r--p 00015000 00:12 1115
/lib64/libpthread.so.0
3ef1e16000-3ef1e17000 rw-p 00016000 00:12 1115
/lib64/libpthread.so.0
3ef1e17000-3ef1e1b000 rw-p 3ef1e17000 00:00 0
3ef2000000-3ef2014000 r-xp 00000000 00:12 5896
/usr/lib64/libz.so.1
3ef2014000-3ef2213000 ---p 00014000 00:12 5896
/usr/lib64/libz.so.1
3ef2213000-3ef2214000 rw-p 00013000 00:12 5896
/usr/lib64/libz.so.1
3ef2400000-3ef2407000 r-xp 00000000 00:12 1111
/lib64/librt.so.1
3ef2407000-3ef2607000 ---p 00007000 00:12 1111
/lib64/librt.so.1
3ef2607000-3ef2608000 r--p 00007000 00:12 1111
/lib64/librt.so.1
3ef2608000-3ef2609000 rw-p 00008000 00:12 1111
/lib64/librt.so.1
3ef3c00000-3ef3c15000 r-xp 00000000 00:12 1123
/lib64/libnsl.so.1
3ef3c15000-3ef3e14000 ---p 00015000 00:12 1123
/lib64/libnsl.so.1
3ef3e14000-3ef3e15000 r--p 00014000 00:12 1123
/lib64/libnsl.so.1
3ef3e15000-3ef3e16000 rw-p 00015000 00:12 1123
/lib64/libnsl.so.1
3ef3e16000-3ef3e18000 rw-p 3ef3e16000 00:00 0
3ef9400000-3ef9533000 r-xp 00000000 00:12 5894
/usr/lib64/libxml2.so.2
3ef9533000-3ef9733000 ---p 00133000 00:12 5894
/usr/lib64/libxml2.so.2
3ef9733000-3ef973c000 rw-p 00133000 00:12 5894
/usr/lib64/libxml2.so.2
3ef973c000-3ef973d000 rw-p 3ef973c000 00:00 0
3efbc00000-3efbc0d000 r-xp 00000000 00:12 4435
/lib64/libgcc_s.so.1
3efbc0d000-3efbe0d000 ---p 0000d000 00:12 4435
/lib64/libgcc_s.so.1
3efbe0d000-3efbe0e000 rw-p 0000d000 00:12 4435
/lib64/libgcc_s.so.1
2afc5843f000-2afc58441000 rw-p 2afc5843f000 00:00 0
2afc58441000-2afc58445000 r-xp 00000000 00:16 2647315
/home/lgu/mpich2-install/lib/libmpl.so.1.0.0
2afc58445000-2afc58644000 ---p 00004000 00:16 2647315
/home/lgu/mpich2-install/lib/libmpl.so.1.0.0
2afc58644000-2afc58645000 rw-p 00003000 00:16 2647315
/home/lgu/mpich2-install/lib/libmpl.so.1.0.0
2afc58664000-2afc58667000 rw-p 2afc58664000 00:00 0
2afc58667000-2afc58669000 r-xp 00000000 00:12 830
/lib64/libdl.so.2
2afc58669000-2afc58869000 ---p 00002000 00:12 830
/lib64/libdl.so.2
2afc58869000-2afc5886a000 r--p 00002000 00:12 830
/lib64/libdl.so.2
2afc5886a000-2afc5886b000 rw-p 00003000 00:12 830
/lib64/libdl.so.2
2afc5886b000-2afc5886d000 rw-p 2afc5886b000 00:00 0
2afc5886d000-2afc588a2000 r--s 00000000 00:12 4024
/var/run/nscd/dbrOewAG (deleted)
7fffcf675000-7fffcf68a000 rw-p 7ffffffe9000 00:00 0
[stack]7fffcf6d2000-7fffcf6d5000 r-xp 7fffcf6d2000 00:00
0                      [vdso]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0
[vsyscall]
bpsh: Child process exited abnormally.
[mpiexec at flatline] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
[mpiexec at flatline] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for
completion
[mpiexec at flatline] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for
completion
[mpiexec at flatline] main (./ui/mpich/mpiexec.c:404): process manager error
waiting for completion
[root at flatline examples]#



On Fri, Feb 18, 2011 at 11:47 AM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

>
> Setting the HYDRA_BOOTSTRAP and HYDRA_BOOTSTRAP_EXEC environment variables
> should work. What error are you seeing?
>
> Alternatively, you can reconfigure MPICH2 as with
> --with-hydra-bss=rsh,ssh,fork,slurm,ll,lsf,sge,pbs,none,persist
>
> This will reprioritize the launchers to give a higher priority to rsh.
>
>  -- Pavan
>
>
> On 02/18/2011 10:39 AM, Limin Gu wrote:
>
>> Hi,
>>
>> I have successfully built and run mpich 1.3.2 on our cluster. But since
>> we rather use bprsh (rsh like) between nodes, I have to specify the
>> bootstrap at mpiexec command line, like this:
>>
>> mpiexec -bootstrap rsh -bootstrap-exec /usr/bin/bprsh
>>
>> It works, but is there a way that I can make rsh as the default
>> bootstrap in some config file, so I don't have to specify that on every
>> mpiexec command?
>>
>> I have tried to set HYDRA_BOOTSTRAP and HYDRA_BOOTSTRAP_EXEC environment
>> variables, that didn't work.
>>
>> Thank you!
>>
>> Limin
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110218/597b2d5d/attachment-0001.htm>


More information about the mpich-discuss mailing list