[mpich-discuss] Unable to run program parallely on cluster...Its running properly on single machine...
Jeff Hammond
jhammond at alcf.anl.gov
Sat May 26 16:08:13 CDT 2012
Rajeev answered your question two days ago.
Jeff
On Sat, May 26, 2012 at 3:53 PM, Albert Spade <albert.spade at gmail.com> wrote:
> Hi Jeff,
>
> Thanks for you reply.
> N sorry I didnt changed the subject before.
> I want to say, as you said I am already using gatherv then why i am getting
> this error???
>
>
> MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank],
> MPI::CHAR, (void*)(Data), Count, Displ, MPI::CHAR, 0);
> //MPI::COMM_WORLD.Gatherv((const
> void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR,
> (void*)(Data), Count, Displ, MPI::CHAR, 0);
>>
>>
>>
>> Message: 2
>> Date: Fri, 25 May 2012 09:46:11 -0500
>> From: Jeff Hammond <jhammond at alcf.anl.gov>
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpich-discuss Digest, Vol 44, Issue 36
>> Message-ID:
>>
>> <CAGKz=uKG+upZGHp-cZ5_2hv9von+KFjKO3QRdp3qcorJQh81_g at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> The error is pretty obvious in the output:
>>
>> PMPI_Gatherv(335): sendbuf cannot be MPI_IN_PLACE
>>
>> You cannot use MPI_IN_PLACE with MPI_Gather
>> (https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/163) but it is
>> allowed in MPI_Gatherv as of MPI 2.2, so I don't know why the
>> implementation does not allow this.
>>
>> Jeff
>>
>> On Fri, May 25, 2012 at 8:21 AM, Albert Spade <albert.spade at gmail.com>
>> wrote:
>> > Thanks Rajeev and Darius,
>> >
>> > I tried to use MPI_IN_PLACE but not getting the desired results. Can you
>> > please tell me how to make it working.
>> >
>> > This is the previous code :
>> >
>> > ?? ? ? ? //MPI::COMM_WORLD.Gatherv((const
>> > void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR,
>> > (void*)(Data), Count, Displ, MPI::CHAR, 0);
>> >
>> > And this is how I changed it.
>> >
>> > ?MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank], MPI::CHAR,
>> > (void*)(Data), Count, Displ, MPI::CHAR, 0);
>> >
>> > Am I doing it wrong?
>> >
>> > Thanks.
>> >
>> > My output after making above changes.
>> > ==============================
>> > [root at beowulf programs]# mpiexec -n 1 ./output
>> > Time taken for 16 elements using 1 processors = 2.81334e-05 seconds
>> > [root at beowulf programs]# mpiexec -n 2 ./output
>> > Fatal error in PMPI_Gatherv: Invalid buffer pointer, error stack:
>> > PMPI_Gatherv(398): MPI_Gatherv failed(sbuf=MPI_IN_PLACE, scount=64,
>> > MPI_CHAR, rbuf=0x879d500, rcnts=0x879d6b8, displs=0x879d6c8, MPI_CHAR,
>> > root=0, MPI_COMM_WORLD) failed
>> > PMPI_Gatherv(335): sendbuf cannot be MPI_IN_PLACE
>> >
>> >
>> > =====================================================================================
>> > = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > = ? EXIT CODE: 256
>> > = ? CLEANING UP REMAINING PROCESSES
>> > = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >
>> > =====================================================================================
>> > *** glibc detected *** mpiexec: double free or corruption (fasttop):
>> > 0x094fb038 ***
>> > ======= Backtrace: =========
>> > /lib/libc.so.6[0x7d4a31]
>> > mpiexec[0x8077b11]
>> > mpiexec[0x8053c7f]
>> > mpiexec[0x8053e73]
>> > mpiexec[0x805592a]
>> > mpiexec[0x8077186]
>> > mpiexec[0x807639e]
>> > mpiexec[0x80518f8]
>> > mpiexec[0x804ad65]
>> > /lib/libc.so.6(__libc_start_main+0xe6)[0x77cce6]
>> > mpiexec[0x804a061]
>> > ======= Memory map: ========
>> > 00547000-00548000 r-xp 00000000 00:00 0 ? ? ? ? ?[vdso]
>> > 0054b000-0068f000 r-xp 00000000 fd:00 939775 ? ?
>> > /usr/lib/libxml2.so.2.7.6
>> > 0068f000-00694000 rw-p 00143000 fd:00 939775 ? ?
>> > /usr/lib/libxml2.so.2.7.6
>> > 00694000-00695000 rw-p 00000000 00:00 0
>> > 00740000-0075e000 r-xp 00000000 fd:00 2105890 ? ?/lib/ld-2.12.so
>> > 0075e000-0075f000 r--p 0001d000 fd:00 2105890 ? ?/lib/ld-2.12.so
>> > 0075f000-00760000 rw-p 0001e000 fd:00 2105890 ? ?/lib/ld-2.12.so
>> > 00766000-008ef000 r-xp 00000000 fd:00 2105891 ? ?/lib/libc-2.12.so
>> > 008ef000-008f0000 ---p 00189000 fd:00 2105891 ? ?/lib/libc-2.12.so
>> > 008f0000-008f2000 r--p 00189000 fd:00 2105891 ? ?/lib/libc-2.12.so
>> > 008f2000-008f3000 rw-p 0018b000 fd:00 2105891 ? ?/lib/libc-2.12.so
>> > 008f3000-008f6000 rw-p 00000000 00:00 0
>> > 008f8000-008fb000 r-xp 00000000 fd:00 2105893 ? ?/lib/libdl-2.12.so
>> > 008fb000-008fc000 r--p 00002000 fd:00 2105893 ? ?/lib/libdl-2.12.so
>> > 008fc000-008fd000 rw-p 00003000 fd:00 2105893 ? ?/lib/libdl-2.12.so
>> > 008ff000-00916000 r-xp 00000000 fd:00 2105900 ? ?/lib/libpthread-2.12.so
>> > 00916000-00917000 r--p 00016000 fd:00 2105900 ? ?/lib/libpthread-2.12.so
>> > 00917000-00918000 rw-p 00017000 fd:00 2105900 ? ?/lib/libpthread-2.12.so
>> > 00918000-0091a000 rw-p 00000000 00:00 0
>> > 0091c000-0092e000 r-xp 00000000 fd:00 2105904 ? ?/lib/libz.so.1.2.3
>> > 0092e000-0092f000 r--p 00011000 fd:00 2105904 ? ?/lib/libz.so.1.2.3
>> > 0092f000-00930000 rw-p 00012000 fd:00 2105904 ? ?/lib/libz.so.1.2.3
>> > 00932000-0095a000 r-xp 00000000 fd:00 2098429 ? ?/lib/libm-2.12.so
>> > 0095a000-0095b000 r--p 00027000 fd:00 2098429 ? ?/lib/libm-2.12.so
>> > 0095b000-0095c000 rw-p 00028000 fd:00 2098429 ? ?/lib/libm-2.12.so
>> > 00bb0000-00bcd000 r-xp 00000000 fd:00 2105914
>> > ?/lib/libgcc_s-4.4.6-20110824.so.1
>> > 00bcd000-00bce000 rw-p 0001d000 fd:00 2105914
>> > ?/lib/libgcc_s-4.4.6-20110824.so.1
>> > 00c18000-00c24000 r-xp 00000000 fd:00 2098123 ?
>> > ?/lib/libnss_files-2.12.so
>> > 00c24000-00c25000 r--p 0000b000 fd:00 2098123 ?
>> > ?/lib/libnss_files-2.12.so
>> > 00c25000-00c26000 rw-p 0000c000 fd:00 2098123 ?
>> > ?/lib/libnss_files-2.12.so
>> > 00ce9000-00d00000 r-xp 00000000 fd:00 2105929 ? ?/lib/libnsl-2.12.so
>> > 00d00000-00d01000 r--p 00016000 fd:00 2105929 ? ?/lib/libnsl-2.12.so
>> > 00d01000-00d02000 rw-p 00017000 fd:00 2105929 ? ?/lib/libnsl-2.12.so
>> > 00d02000-00d04000 rw-p 00000000 00:00 0
>> > 08048000-080a0000 r-xp 00000000 fd:00 656990
>> > /opt/mpich2-1.4.1p1/bin/bin/mpiexec.hydra
>> > 080a0000-080a1000 rw-p 00058000 fd:00 656990
>> > /opt/mpich2-1.4.1p1/bin/bin/mpiexec.hydra
>> > 080a1000-080a3000 rw-p 00000000 00:00 0
>> > 094ee000-0950f000 rw-p 00000000 00:00 0 ? ? ? ? ?[heap]
>> > b7893000-b7896000 rw-p 00000000 00:00 0
>> > b78a4000-b78a7000 rw-p 00000000 00:00 0
>> > bff80000-bff95000 rw-p 00000000 00:00 0 ? ? ? ? ?[stack]
>> > Aborted (core dumped)
>> > [root at beowulf programs]#
>> >
>> >
>> > On Tue, May 22, 2012 at 10:30 PM, <mpich-discuss-request at mcs.anl.gov>
>> > wrote:
>> >>
>> >> Send mpich-discuss mailing list submissions to
>> >> ? ? ? ?mpich-discuss at mcs.anl.gov
>> >>
>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >> ? ? ? ?https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> >> or, via email, send a message with subject or body 'help' to
>> >> ? ? ? ?mpich-discuss-request at mcs.anl.gov
>> >>
>> >> You can reach the person managing the list at
>> >> ? ? ? ?mpich-discuss-owner at mcs.anl.gov
>> >>
>> >> When replying, please edit your Subject line so it is more specific
>> >> than "Re: Contents of mpich-discuss digest..."
>> >>
>> >>
>> >> Today's Topics:
>> >>
>> >> ? 1. ?Unable to run program parallely on cluster... Its running
>> >> ? ? ?properly on single machine... (Albert Spade)
>> >> ? 2. ?Not able to run program parallely on cluster... (Albert Spade)
>> >> ? 3. Re: ?Unable to run program parallely on cluster... ? ? ? ?Its
>> >> ? ? ?running properly on single machine... (Darius Buntinas)
>> >> ? 4. Re: ?Not able to run program parallely on cluster...
>> >> ? ? ?(Rajeev Thakur)
>> >> ? 5. ?replication of mpi applications (Thomas Ropars)
>> >>
>> >>
>> >> ----------------------------------------------------------------------
>> >>
>> >> Message: 1
>> >> Date: Tue, 22 May 2012 00:12:24 +0530
>> >> From: Albert Spade <albert.spade at gmail.com>
>> >> To: mpich-discuss at mcs.anl.gov
>> >> Subject: [mpich-discuss] Unable to run program parallely on cluster...
>> >> ? ? ? ?Its running properly on single machine...
>> >> Message-ID:
>> >>
>> >> ?<CAP2uaQopgOwaFNfCF49gcnW9REw8CQtWGMgf0U8RyNYStTFw1A at mail.gmail.com>
>> >> Content-Type: text/plain; charset="iso-8859-1"
>> >>
>> >> Hi everybody,
>> >>
>> >> I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1
>> >> I have a cluster of 5 machines.
>> >> When I am trying to run the program for parallel fast fourier transform
>> >> on
>> >> single machine it runs correctly but on a cluster it gives error.
>> >> Can you please tell me why its happening.
>> >>
>> >> Thanks.
>> >>
>> >> Here is my sample output:
>> >>
>> >>
>> >> ---------------------------------------------------------------------------------------
>> >>
>> >> [root at beowulf programs]# mpiexec -n 1 ./Radix2
>> >> Time taken for 16 elements using 1 processors = 2.7895e-05 seconds
>> >> [root at beowulf programs]#
>> >> [root at beowulf programs]# mpiexec -n 4 ./Radix2
>> >> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
>> >> assert
>> >> (!closed) failed
>> >> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
>> >> (./tools/demux/demux_poll.c:77): callback returned error status
>> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
>> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> manager
>> >> error waiting for completion
>> >> [root at beowulf programs]# mpiexec -n 2 ./Radix2
>> >> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
>> >> assert
>> >> (!closed) failed
>> >> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
>> >> (./tools/demux/demux_poll.c:77): callback returned error status
>> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
>> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> manager
>> >> error waiting for completion
>> >> [root at beowulf programs]# mpiexec -n 4 ./Radix2
>> >> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
>> >> assert
>> >> (!closed) failed
>> >> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
>> >> (./tools/demux/demux_poll.c:77): callback returned error status
>> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
>> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> manager
>> >> error waiting for completion
>> >> [root at beowulf programs]#
>> >> -------------- next part --------------
>> >> An HTML attachment was scrubbed...
>> >> URL:
>> >>
>> >> <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/25975b06/attachment-0001.html>
>> >>
>> >> ------------------------------
>> >>
>> >> Message: 2
>> >> Date: Tue, 22 May 2012 00:59:27 +0530
>> >> From: Albert Spade <albert.spade at gmail.com>
>> >> To: mpich-discuss at mcs.anl.gov
>> >> Subject: [mpich-discuss] Not able to run program parallely on
>> >> ? ? ? ?cluster...
>> >> Message-ID:
>> >>
>> >> ?<CAP2uaQpiMV0yqHsHfsWpgAQ=_K3M_ZGxsCm-S5BPvzbxH+Z9zQ at mail.gmail.com>
>> >> Content-Type: text/plain; charset="iso-8859-1"
>> >>
>> >> This is my new error after making few changes...
>> >> Results are quite similar... No succes with cluster...
>> >>
>> >> Sample run
>> >> --------------------------------------------------------
>> >>
>> >> [root at beowulf testing]# mpiexec -n 1 ./Radix
>> >> Time taken for 16 elements using 1 processors = 4.72069e-05 seconds
>> >> [root at beowulf testing]# mpiexec -n 2 ./Radix
>> >> Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
>> >> PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x97d0500, scount=64,
>> >> MPI_CHAR, rbuf=0x97d0500, rcnts=0x97d06b8, displs=0x97d06c8, MPI_CHAR,
>> >> root=0, MPI_COMM_WORLD) failed
>> >> MPIR_Gatherv_impl(210):
>> >> MPIR_Gatherv(104).....:
>> >> MPIR_Localcopy(357)...: memcpy arguments alias each other,
>> >> dst=0x97d0500
>> >> src=0x97d0500 len=64
>> >>
>> >>
>> >>
>> >> =====================================================================================
>> >> = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >> = ? EXIT CODE: 256
>> >> = ? CLEANING UP REMAINING PROCESSES
>> >> = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >>
>> >>
>> >> =====================================================================================
>> >> [proxy:0:1 at beowulf.node1] HYD_pmcd_pmip_control_cmd_cb
>> >> (./pm/pmiserv/pmip_cb.c:927): assert (!closed) failed
>> >> [proxy:0:1 at beowulf.node1] HYDT_dmxu_poll_wait_for_event
>> >> (./tools/demux/demux_poll.c:77): callback returned error status
>> >> [proxy:0:1 at beowulf.node1] main (./pm/pmiserv/pmip.c:221): demux engine
>> >> error waiting for event
>> >> [mpiexec at beowulf.master] HYDT_bscu_wait_for_completion
>> >> (./tools/bootstrap/utils/bscu_wait.c:77): one of the processes
>> >> terminated
>> >> badly; aborting
>> >> [mpiexec at beowulf.master] HYDT_bsci_wait_for_completion
>> >> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
>> >> for
>> >> completion
>> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> (./pm/pmiserv/pmiserv_pmci.c:225): launcher returned error waiting for
>> >> completion
>> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> manager
>> >> error waiting for completion
>> >> [root at beowulf testing]#
>> >> -------------- next part --------------
>> >> An HTML attachment was scrubbed...
>> >> URL:
>> >>
>> >> <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/7b1db8c0/attachment-0001.html>
>> >>
>> >> ------------------------------
>> >>
>> >> Message: 3
>> >> Date: Tue, 22 May 2012 03:36:44 +0800
>> >> From: Darius Buntinas <buntinas at mcs.anl.gov>
>> >> To: mpich-discuss at mcs.anl.gov
>> >> Subject: Re: [mpich-discuss] Unable to run program parallely on
>> >> ? ? ? ?cluster... ? ? ?Its running properly on single machine...
>> >> Message-ID: <B411B6C1-CB5A-4A1C-AEBB-71680C9AF8C5 at mcs.anl.gov>
>> >> Content-Type: text/plain; charset=us-ascii
>> >>
>> >> It may be that one of your processes is failing, but also check to make
>> >> sure every process is calling MPI_Finalize before exiting.
>> >>
>> >> -d
>> >>
>> >> On May 22, 2012, at 2:42 AM, Albert Spade wrote:
>> >>
>> >> > Hi everybody,
>> >> >
>> >> > I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1
>> >> > I have a cluster of 5 machines.
>> >> > When I am trying to run the program for parallel fast fourier
>> >> > transform
>> >> > on single machine it runs correctly but on a cluster it gives error.
>> >> > Can you please tell me why its happening.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Here is my sample output:
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------------------------
>> >> >
>> >> > [root at beowulf programs]# mpiexec -n 1 ./Radix2
>> >> > Time taken for 16 elements using 1 processors = 2.7895e-05 seconds
>> >> > [root at beowulf programs]#
>> >> > [root at beowulf programs]# mpiexec -n 4 ./Radix2
>> >> > [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
>> >> > assert (!closed) failed
>> >> > [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status
>> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
>> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> > manager error waiting for completion
>> >> > [root at beowulf programs]# mpiexec -n 2 ./Radix2
>> >> > [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
>> >> > assert (!closed) failed
>> >> > [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status
>> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
>> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> > manager error waiting for completion
>> >> > [root at beowulf programs]# mpiexec -n 4 ./Radix2
>> >> > [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
>> >> > assert (!closed) failed
>> >> > [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status
>> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
>> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> > manager error waiting for completion
>> >> > [root at beowulf programs]#
>> >> > _______________________________________________
>> >> > mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
>> >> > To manage subscription options or unsubscribe:
>> >> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> >>
>> >>
>> >>
>> >> ------------------------------
>> >>
>> >> Message: 4
>> >> Date: Mon, 21 May 2012 20:14:35 -0500
>> >> From: Rajeev Thakur <thakur at mcs.anl.gov>
>> >> To: mpich-discuss at mcs.anl.gov
>> >> Subject: Re: [mpich-discuss] Not able to run program parallely on
>> >> ? ? ? ?cluster...
>> >> Message-ID: <8C80534E-3611-40D7-BBAF-F66110D25EE1 at mcs.anl.gov>
>> >> Content-Type: text/plain; charset=us-ascii
>> >>
>> >> You are passing the same buffer as the sendbuf and recvbuf to
>> >> MPI_Gatherv,
>> >> which is not allowed in MPI. Use MPI_IN_PLACE as described in the
>> >> standard.
>> >>
>> >>
>> >> On May 21, 2012, at 2:29 PM, Albert Spade wrote:
>> >>
>> >> > This is my new error after making few changes...
>> >> > Results are quite similar... No succes with cluster...
>> >> >
>> >> > Sample run
>> >> > --------------------------------------------------------
>> >> >
>> >> > [root at beowulf testing]# mpiexec -n 1 ./Radix
>> >> > Time taken for 16 elements using 1 processors = 4.72069e-05 seconds
>> >> > [root at beowulf testing]# mpiexec -n 2 ./Radix
>> >> > Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
>> >> > PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x97d0500, scount=64,
>> >> > MPI_CHAR, rbuf=0x97d0500, rcnts=0x97d06b8, displs=0x97d06c8,
>> >> > MPI_CHAR,
>> >> > root=0, MPI_COMM_WORLD) failed
>> >> > MPIR_Gatherv_impl(210):
>> >> > MPIR_Gatherv(104).....:
>> >> > MPIR_Localcopy(357)...: memcpy arguments alias each other,
>> >> > dst=0x97d0500
>> >> > src=0x97d0500 len=64
>> >> >
>> >> >
>> >> > =====================================================================================
>> >> > = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >> > = ? EXIT CODE: 256
>> >> > = ? CLEANING UP REMAINING PROCESSES
>> >> > = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >> >
>> >> >
>> >> > =====================================================================================
>> >> > [proxy:0:1 at beowulf.node1] HYD_pmcd_pmip_control_cmd_cb
>> >> > (./pm/pmiserv/pmip_cb.c:927): assert (!closed) failed
>> >> > [proxy:0:1 at beowulf.node1] HYDT_dmxu_poll_wait_for_event
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status
>> >> > [proxy:0:1 at beowulf.node1] main (./pm/pmiserv/pmip.c:221): demux
>> >> > engine
>> >> > error waiting for event
>> >> > [mpiexec at beowulf.master] HYDT_bscu_wait_for_completion
>> >> > (./tools/bootstrap/utils/bscu_wait.c:77): one of the processes
>> >> > terminated
>> >> > badly; aborting
>> >> > [mpiexec at beowulf.master] HYDT_bsci_wait_for_completion
>> >> > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>> >> > waiting for
>> >> > completion
>> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
>> >> > (./pm/pmiserv/pmiserv_pmci.c:225): launcher returned error waiting
>> >> > for
>> >> > completion
>> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
>> >> > manager error waiting for completion
>> >> > [root at beowulf testing]#
>> >> >
>>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
More information about the mpich-discuss
mailing list