[mpich-discuss] How to use MPI_IN_PLACE properly in MPI::COMM_WORLD.Gatherv?

Albert Spade albert.spade at gmail.com
Sun May 27 07:15:17 CDT 2012


Hi Jeff,

As Rajeev said I used MPI_IN_PLACE instead of senbuf, still I am not able
to get the things right.
Thats why I posted my used code and error.

I am using :
MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank], MPI::CHAR,
(void*)(Data), Count, Displ, MPI::CHAR, 0);

instead of my previous code:
//MPI::COMM_WORLD.Gatherv((const void*)(Data+StartFrom[nStages-1][rank]),
Count[rank], MPI::CHAR, (void*)(Data), Count, Displ, MPI::CHAR, 0);

and still getting the error. What wrong I am doing?

Thanks...

On Sun, May 27, 2012 at 7:29 AM, <mpich-discuss-request at mcs.anl.gov> wrote:

> Send mpich-discuss mailing list submissions to
>        mpich-discuss at mcs.anl.gov
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> or, via email, send a message with subject or body 'help' to
>        mpich-discuss-request at mcs.anl.gov
>
> You can reach the person managing the list at
>        mpich-discuss-owner at mcs.anl.gov
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of mpich-discuss digest..."
>
>
> Today's Topics:
>
>   1. Re:  Unable to run program parallely on cluster...Its running
>      properly on single machine... (Jeff Hammond)
>   2.  Help Mpich2 (angelo pascualetti)
>   3. Re:  Problem during running a parallel processing.
>      (Sinta Kartika Maharani)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 26 May 2012 16:08:13 -0500
> From: Jeff Hammond <jhammond at alcf.anl.gov>
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Unable to run program parallely on
>        cluster...Its running properly on single machine...
> Message-ID:
>        <CAGKz=u++EaYOLCbs3Stda4MWti2bS3JD3jaoa1dm2jGU0usYkQ at mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> Rajeev answered your question two days ago.
>
> Jeff
>
> On Sat, May 26, 2012 at 3:53 PM, Albert Spade <albert.spade at gmail.com>
> wrote:
> > Hi Jeff,
> >
> > Thanks for you reply.
> > N sorry I didnt changed the subject before.
> > I want to say, as you said I am already using gatherv then why i am
> getting
> > this error???
> >
> >
> > ? ? ? ? ? ? ? ? MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank],
> > MPI::CHAR, (void*)(Data), Count, Displ, MPI::CHAR, 0);
> > ? ? ? ? ? ? ? ? //MPI::COMM_WORLD.Gatherv((const
> > void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR,
> > (void*)(Data), Count, Displ, MPI::CHAR, 0);
> >>
> >>
> >>
> >> Message: 2
> >> Date: Fri, 25 May 2012 09:46:11 -0500
> >> From: Jeff Hammond <jhammond at alcf.anl.gov>
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] mpich-discuss Digest, Vol 44, Issue 36
> >> Message-ID:
> >>
> >> ?<CAGKz=uKG+upZGHp-cZ5_2hv9von+KFjKO3QRdp3qcorJQh81_g at mail.gmail.com>
> >> Content-Type: text/plain; charset=ISO-8859-1
> >>
> >> The error is pretty obvious in the output:
> >>
> >> PMPI_Gatherv(335): sendbuf cannot be MPI_IN_PLACE
> >>
> >> You cannot use MPI_IN_PLACE with MPI_Gather
> >> (https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/163) but it is
> >> allowed in MPI_Gatherv as of MPI 2.2, so I don't know why the
> >> implementation does not allow this.
> >>
> >> Jeff
> >>
> >> On Fri, May 25, 2012 at 8:21 AM, Albert Spade <albert.spade at gmail.com>
> >> wrote:
> >> > Thanks Rajeev and Darius,
> >> >
> >> > I tried to use MPI_IN_PLACE but not getting the desired results. Can
> you
> >> > please tell me how to make it working.
> >> >
> >> > This is the previous code :
> >> >
> >> > ?? ? ? ? //MPI::COMM_WORLD.Gatherv((const
> >> > void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR,
> >> > (void*)(Data), Count, Displ, MPI::CHAR, 0);
> >> >
> >> > And this is how I changed it.
> >> >
> >> > ?MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank], MPI::CHAR,
> >> > (void*)(Data), Count, Displ, MPI::CHAR, 0);
> >> >
> >> > Am I doing it wrong?
> >> >
> >> > Thanks.
> >> >
> >> > My output after making above changes.
> >> > ==============================
> >> > [root at beowulf programs]# mpiexec -n 1 ./output
> >> > Time taken for 16 elements using 1 processors = 2.81334e-05 seconds
> >> > [root at beowulf programs]# mpiexec -n 2 ./output
> >> > Fatal error in PMPI_Gatherv: Invalid buffer pointer, error stack:
> >> > PMPI_Gatherv(398): MPI_Gatherv failed(sbuf=MPI_IN_PLACE, scount=64,
> >> > MPI_CHAR, rbuf=0x879d500, rcnts=0x879d6b8, displs=0x879d6c8, MPI_CHAR,
> >> > root=0, MPI_COMM_WORLD) failed
> >> > PMPI_Gatherv(335): sendbuf cannot be MPI_IN_PLACE
> >> >
> >> >
> >> >
> =====================================================================================
> >> > = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> > = ? EXIT CODE: 256
> >> > = ? CLEANING UP REMAINING PROCESSES
> >> > = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> >
> >> >
> =====================================================================================
> >> > *** glibc detected *** mpiexec: double free or corruption (fasttop):
> >> > 0x094fb038 ***
> >> > ======= Backtrace: =========
> >> > /lib/libc.so.6[0x7d4a31]
> >> > mpiexec[0x8077b11]
> >> > mpiexec[0x8053c7f]
> >> > mpiexec[0x8053e73]
> >> > mpiexec[0x805592a]
> >> > mpiexec[0x8077186]
> >> > mpiexec[0x807639e]
> >> > mpiexec[0x80518f8]
> >> > mpiexec[0x804ad65]
> >> > /lib/libc.so.6(__libc_start_main+0xe6)[0x77cce6]
> >> > mpiexec[0x804a061]
> >> > ======= Memory map: ========
> >> > 00547000-00548000 r-xp 00000000 00:00 0 ? ? ? ? ?[vdso]
> >> > 0054b000-0068f000 r-xp 00000000 fd:00 939775 ? ?
> >> > /usr/lib/libxml2.so.2.7.6
> >> > 0068f000-00694000 rw-p 00143000 fd:00 939775 ? ?
> >> > /usr/lib/libxml2.so.2.7.6
> >> > 00694000-00695000 rw-p 00000000 00:00 0
> >> > 00740000-0075e000 r-xp 00000000 fd:00 2105890 ? ?/lib/ld-2.12.so
> >> > 0075e000-0075f000 r--p 0001d000 fd:00 2105890 ? ?/lib/ld-2.12.so
> >> > 0075f000-00760000 rw-p 0001e000 fd:00 2105890 ? ?/lib/ld-2.12.so
> >> > 00766000-008ef000 r-xp 00000000 fd:00 2105891 ? ?/lib/libc-2.12.so
> >> > 008ef000-008f0000 ---p 00189000 fd:00 2105891 ? ?/lib/libc-2.12.so
> >> > 008f0000-008f2000 r--p 00189000 fd:00 2105891 ? ?/lib/libc-2.12.so
> >> > 008f2000-008f3000 rw-p 0018b000 fd:00 2105891 ? ?/lib/libc-2.12.so
> >> > 008f3000-008f6000 rw-p 00000000 00:00 0
> >> > 008f8000-008fb000 r-xp 00000000 fd:00 2105893 ? ?/lib/libdl-2.12.so
> >> > 008fb000-008fc000 r--p 00002000 fd:00 2105893 ? ?/lib/libdl-2.12.so
> >> > 008fc000-008fd000 rw-p 00003000 fd:00 2105893 ? ?/lib/libdl-2.12.so
> >> > 008ff000-00916000 r-xp 00000000 fd:00 2105900 ? ?/lib/
> libpthread-2.12.so
> >> > 00916000-00917000 r--p 00016000 fd:00 2105900 ? ?/lib/
> libpthread-2.12.so
> >> > 00917000-00918000 rw-p 00017000 fd:00 2105900 ? ?/lib/
> libpthread-2.12.so
> >> > 00918000-0091a000 rw-p 00000000 00:00 0
> >> > 0091c000-0092e000 r-xp 00000000 fd:00 2105904 ? ?/lib/libz.so.1.2.3
> >> > 0092e000-0092f000 r--p 00011000 fd:00 2105904 ? ?/lib/libz.so.1.2.3
> >> > 0092f000-00930000 rw-p 00012000 fd:00 2105904 ? ?/lib/libz.so.1.2.3
> >> > 00932000-0095a000 r-xp 00000000 fd:00 2098429 ? ?/lib/libm-2.12.so
> >> > 0095a000-0095b000 r--p 00027000 fd:00 2098429 ? ?/lib/libm-2.12.so
> >> > 0095b000-0095c000 rw-p 00028000 fd:00 2098429 ? ?/lib/libm-2.12.so
> >> > 00bb0000-00bcd000 r-xp 00000000 fd:00 2105914
> >> > ?/lib/libgcc_s-4.4.6-20110824.so.1
> >> > 00bcd000-00bce000 rw-p 0001d000 fd:00 2105914
> >> > ?/lib/libgcc_s-4.4.6-20110824.so.1
> >> > 00c18000-00c24000 r-xp 00000000 fd:00 2098123 ?
> >> > ?/lib/libnss_files-2.12.so
> >> > 00c24000-00c25000 r--p 0000b000 fd:00 2098123 ?
> >> > ?/lib/libnss_files-2.12.so
> >> > 00c25000-00c26000 rw-p 0000c000 fd:00 2098123 ?
> >> > ?/lib/libnss_files-2.12.so
> >> > 00ce9000-00d00000 r-xp 00000000 fd:00 2105929 ? ?/lib/libnsl-2.12.so
> >> > 00d00000-00d01000 r--p 00016000 fd:00 2105929 ? ?/lib/libnsl-2.12.so
> >> > 00d01000-00d02000 rw-p 00017000 fd:00 2105929 ? ?/lib/libnsl-2.12.so
> >> > 00d02000-00d04000 rw-p 00000000 00:00 0
> >> > 08048000-080a0000 r-xp 00000000 fd:00 656990
> >> > /opt/mpich2-1.4.1p1/bin/bin/mpiexec.hydra
> >> > 080a0000-080a1000 rw-p 00058000 fd:00 656990
> >> > /opt/mpich2-1.4.1p1/bin/bin/mpiexec.hydra
> >> > 080a1000-080a3000 rw-p 00000000 00:00 0
> >> > 094ee000-0950f000 rw-p 00000000 00:00 0 ? ? ? ? ?[heap]
> >> > b7893000-b7896000 rw-p 00000000 00:00 0
> >> > b78a4000-b78a7000 rw-p 00000000 00:00 0
> >> > bff80000-bff95000 rw-p 00000000 00:00 0 ? ? ? ? ?[stack]
> >> > Aborted (core dumped)
> >> > [root at beowulf programs]#
> >> >
> >> >
> >> > On Tue, May 22, 2012 at 10:30 PM, <mpich-discuss-request at mcs.anl.gov>
> >> > wrote:
> >> >>
> >> >> Send mpich-discuss mailing list submissions to
> >> >> ? ? ? ?mpich-discuss at mcs.anl.gov
> >> >>
> >> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> >> ? ? ? ?https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >> >> or, via email, send a message with subject or body 'help' to
> >> >> ? ? ? ?mpich-discuss-request at mcs.anl.gov
> >> >>
> >> >> You can reach the person managing the list at
> >> >> ? ? ? ?mpich-discuss-owner at mcs.anl.gov
> >> >>
> >> >> When replying, please edit your Subject line so it is more specific
> >> >> than "Re: Contents of mpich-discuss digest..."
> >> >>
> >> >>
> >> >> Today's Topics:
> >> >>
> >> >> ? 1. ?Unable to run program parallely on cluster... Its running
> >> >> ? ? ?properly on single machine... (Albert Spade)
> >> >> ? 2. ?Not able to run program parallely on cluster... (Albert Spade)
> >> >> ? 3. Re: ?Unable to run program parallely on cluster... ? ? ? ?Its
> >> >> ? ? ?running properly on single machine... (Darius Buntinas)
> >> >> ? 4. Re: ?Not able to run program parallely on cluster...
> >> >> ? ? ?(Rajeev Thakur)
> >> >> ? 5. ?replication of mpi applications (Thomas Ropars)
> >> >>
> >> >>
> >> >>
> ----------------------------------------------------------------------
> >> >>
> >> >> Message: 1
> >> >> Date: Tue, 22 May 2012 00:12:24 +0530
> >> >> From: Albert Spade <albert.spade at gmail.com>
> >> >> To: mpich-discuss at mcs.anl.gov
> >> >> Subject: [mpich-discuss] Unable to run program parallely on
> cluster...
> >> >> ? ? ? ?Its running properly on single machine...
> >> >> Message-ID:
> >> >>
> >> >> ?<CAP2uaQopgOwaFNfCF49gcnW9REw8CQtWGMgf0U8RyNYStTFw1A at mail.gmail.com
> >
> >> >> Content-Type: text/plain; charset="iso-8859-1"
> >> >>
> >> >> Hi everybody,
> >> >>
> >> >> I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1
> >> >> I have a cluster of 5 machines.
> >> >> When I am trying to run the program for parallel fast fourier
> transform
> >> >> on
> >> >> single machine it runs correctly but on a cluster it gives error.
> >> >> Can you please tell me why its happening.
> >> >>
> >> >> Thanks.
> >> >>
> >> >> Here is my sample output:
> >> >>
> >> >>
> >> >>
> ---------------------------------------------------------------------------------------
> >> >>
> >> >> [root at beowulf programs]# mpiexec -n 1 ./Radix2
> >> >> Time taken for 16 elements using 1 processors = 2.7895e-05 seconds
> >> >> [root at beowulf programs]#
> >> >> [root at beowulf programs]# mpiexec -n 4 ./Radix2
> >> >> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
> >> >> assert
> >> >> (!closed) failed
> >> >> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
> >> >> (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> >> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> manager
> >> >> error waiting for completion
> >> >> [root at beowulf programs]# mpiexec -n 2 ./Radix2
> >> >> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
> >> >> assert
> >> >> (!closed) failed
> >> >> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
> >> >> (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> >> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> manager
> >> >> error waiting for completion
> >> >> [root at beowulf programs]# mpiexec -n 4 ./Radix2
> >> >> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):
> >> >> assert
> >> >> (!closed) failed
> >> >> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
> >> >> (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> >> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> manager
> >> >> error waiting for completion
> >> >> [root at beowulf programs]#
> >> >> -------------- next part --------------
> >> >> An HTML attachment was scrubbed...
> >> >> URL:
> >> >>
> >> >> <
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/25975b06/attachment-0001.html
> >
> >> >>
> >> >> ------------------------------
> >> >>
> >> >> Message: 2
> >> >> Date: Tue, 22 May 2012 00:59:27 +0530
> >> >> From: Albert Spade <albert.spade at gmail.com>
> >> >> To: mpich-discuss at mcs.anl.gov
> >> >> Subject: [mpich-discuss] Not able to run program parallely on
> >> >> ? ? ? ?cluster...
> >> >> Message-ID:
> >> >>
> >> >> ?<CAP2uaQpiMV0yqHsHfsWpgAQ=_K3M_ZGxsCm-S5BPvzbxH+Z9zQ at mail.gmail.com
> >
> >> >> Content-Type: text/plain; charset="iso-8859-1"
> >> >>
> >> >> This is my new error after making few changes...
> >> >> Results are quite similar... No succes with cluster...
> >> >>
> >> >> Sample run
> >> >> --------------------------------------------------------
> >> >>
> >> >> [root at beowulf testing]# mpiexec -n 1 ./Radix
> >> >> Time taken for 16 elements using 1 processors = 4.72069e-05 seconds
> >> >> [root at beowulf testing]# mpiexec -n 2 ./Radix
> >> >> Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
> >> >> PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x97d0500, scount=64,
> >> >> MPI_CHAR, rbuf=0x97d0500, rcnts=0x97d06b8, displs=0x97d06c8,
> MPI_CHAR,
> >> >> root=0, MPI_COMM_WORLD) failed
> >> >> MPIR_Gatherv_impl(210):
> >> >> MPIR_Gatherv(104).....:
> >> >> MPIR_Localcopy(357)...: memcpy arguments alias each other,
> >> >> dst=0x97d0500
> >> >> src=0x97d0500 len=64
> >> >>
> >> >>
> >> >>
> >> >>
> =====================================================================================
> >> >> = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> >> = ? EXIT CODE: 256
> >> >> = ? CLEANING UP REMAINING PROCESSES
> >> >> = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> >>
> >> >>
> >> >>
> =====================================================================================
> >> >> [proxy:0:1 at beowulf.node1] HYD_pmcd_pmip_control_cmd_cb
> >> >> (./pm/pmiserv/pmip_cb.c:927): assert (!closed) failed
> >> >> [proxy:0:1 at beowulf.node1] HYDT_dmxu_poll_wait_for_event
> >> >> (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> [proxy:0:1 at beowulf.node1] main (./pm/pmiserv/pmip.c:221): demux
> engine
> >> >> error waiting for event
> >> >> [mpiexec at beowulf.master] HYDT_bscu_wait_for_completion
> >> >> (./tools/bootstrap/utils/bscu_wait.c:77): one of the processes
> >> >> terminated
> >> >> badly; aborting
> >> >> [mpiexec at beowulf.master] HYDT_bsci_wait_for_completion
> >> >> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
> waiting
> >> >> for
> >> >> completion
> >> >> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> (./pm/pmiserv/pmiserv_pmci.c:225): launcher returned error waiting
> for
> >> >> completion
> >> >> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> manager
> >> >> error waiting for completion
> >> >> [root at beowulf testing]#
> >> >> -------------- next part --------------
> >> >> An HTML attachment was scrubbed...
> >> >> URL:
> >> >>
> >> >> <
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/7b1db8c0/attachment-0001.html
> >
> >> >>
> >> >> ------------------------------
> >> >>
> >> >> Message: 3
> >> >> Date: Tue, 22 May 2012 03:36:44 +0800
> >> >> From: Darius Buntinas <buntinas at mcs.anl.gov>
> >> >> To: mpich-discuss at mcs.anl.gov
> >> >> Subject: Re: [mpich-discuss] Unable to run program parallely on
> >> >> ? ? ? ?cluster... ? ? ?Its running properly on single machine...
> >> >> Message-ID: <B411B6C1-CB5A-4A1C-AEBB-71680C9AF8C5 at mcs.anl.gov>
> >> >> Content-Type: text/plain; charset=us-ascii
> >> >>
> >> >> It may be that one of your processes is failing, but also check to
> make
> >> >> sure every process is calling MPI_Finalize before exiting.
> >> >>
> >> >> -d
> >> >>
> >> >> On May 22, 2012, at 2:42 AM, Albert Spade wrote:
> >> >>
> >> >> > Hi everybody,
> >> >> >
> >> >> > I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1
> >> >> > I have a cluster of 5 machines.
> >> >> > When I am trying to run the program for parallel fast fourier
> >> >> > transform
> >> >> > on single machine it runs correctly but on a cluster it gives
> error.
> >> >> > Can you please tell me why its happening.
> >> >> >
> >> >> > Thanks.
> >> >> >
> >> >> > Here is my sample output:
> >> >> >
> >> >> >
> >> >> >
> ---------------------------------------------------------------------------------------
> >> >> >
> >> >> > [root at beowulf programs]# mpiexec -n 1 ./Radix2
> >> >> > Time taken for 16 elements using 1 processors = 2.7895e-05 seconds
> >> >> > [root at beowulf programs]#
> >> >> > [root at beowulf programs]# mpiexec -n 4 ./Radix2
> >> >> > [mpiexec at beowulf.master] control_cb
> (./pm/pmiserv/pmiserv_cb.c:197):
> >> >> > assert (!closed) failed
> >> >> > [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
> >> >> > (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> >> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> > manager error waiting for completion
> >> >> > [root at beowulf programs]# mpiexec -n 2 ./Radix2
> >> >> > [mpiexec at beowulf.master] control_cb
> (./pm/pmiserv/pmiserv_cb.c:197):
> >> >> > assert (!closed) failed
> >> >> > [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
> >> >> > (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> >> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> > manager error waiting for completion
> >> >> > [root at beowulf programs]# mpiexec -n 4 ./Radix2
> >> >> > [mpiexec at beowulf.master] control_cb
> (./pm/pmiserv/pmiserv_cb.c:197):
> >> >> > assert (!closed) failed
> >> >> > [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event
> >> >> > (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> >> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> > manager error waiting for completion
> >> >> > [root at beowulf programs]#
> >> >> > _______________________________________________
> >> >> > mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
> >> >> > To manage subscription options or unsubscribe:
> >> >> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >> >>
> >> >>
> >> >>
> >> >> ------------------------------
> >> >>
> >> >> Message: 4
> >> >> Date: Mon, 21 May 2012 20:14:35 -0500
> >> >> From: Rajeev Thakur <thakur at mcs.anl.gov>
> >> >> To: mpich-discuss at mcs.anl.gov
> >> >> Subject: Re: [mpich-discuss] Not able to run program parallely on
> >> >> ? ? ? ?cluster...
> >> >> Message-ID: <8C80534E-3611-40D7-BBAF-F66110D25EE1 at mcs.anl.gov>
> >> >> Content-Type: text/plain; charset=us-ascii
> >> >>
> >> >> You are passing the same buffer as the sendbuf and recvbuf to
> >> >> MPI_Gatherv,
> >> >> which is not allowed in MPI. Use MPI_IN_PLACE as described in the
> >> >> standard.
> >> >>
> >> >>
> >> >> On May 21, 2012, at 2:29 PM, Albert Spade wrote:
> >> >>
> >> >> > This is my new error after making few changes...
> >> >> > Results are quite similar... No succes with cluster...
> >> >> >
> >> >> > Sample run
> >> >> > --------------------------------------------------------
> >> >> >
> >> >> > [root at beowulf testing]# mpiexec -n 1 ./Radix
> >> >> > Time taken for 16 elements using 1 processors = 4.72069e-05 seconds
> >> >> > [root at beowulf testing]# mpiexec -n 2 ./Radix
> >> >> > Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
> >> >> > PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x97d0500,
> scount=64,
> >> >> > MPI_CHAR, rbuf=0x97d0500, rcnts=0x97d06b8, displs=0x97d06c8,
> >> >> > MPI_CHAR,
> >> >> > root=0, MPI_COMM_WORLD) failed
> >> >> > MPIR_Gatherv_impl(210):
> >> >> > MPIR_Gatherv(104).....:
> >> >> > MPIR_Localcopy(357)...: memcpy arguments alias each other,
> >> >> > dst=0x97d0500
> >> >> > src=0x97d0500 len=64
> >> >> >
> >> >> >
> >> >> >
> =====================================================================================
> >> >> > = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> >> > = ? EXIT CODE: 256
> >> >> > = ? CLEANING UP REMAINING PROCESSES
> >> >> > = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> >> >
> >> >> >
> >> >> >
> =====================================================================================
> >> >> > [proxy:0:1 at beowulf.node1] HYD_pmcd_pmip_control_cmd_cb
> >> >> > (./pm/pmiserv/pmip_cb.c:927): assert (!closed) failed
> >> >> > [proxy:0:1 at beowulf.node1] HYDT_dmxu_poll_wait_for_event
> >> >> > (./tools/demux/demux_poll.c:77): callback returned error status
> >> >> > [proxy:0:1 at beowulf.node1] main (./pm/pmiserv/pmip.c:221): demux
> >> >> > engine
> >> >> > error waiting for event
> >> >> > [mpiexec at beowulf.master] HYDT_bscu_wait_for_completion
> >> >> > (./tools/bootstrap/utils/bscu_wait.c:77): one of the processes
> >> >> > terminated
> >> >> > badly; aborting
> >> >> > [mpiexec at beowulf.master] HYDT_bsci_wait_for_completion
> >> >> > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
> >> >> > waiting for
> >> >> > completion
> >> >> > [mpiexec at beowulf.master] HYD_pmci_wait_for_completion
> >> >> > (./pm/pmiserv/pmiserv_pmci.c:225): launcher returned error waiting
> >> >> > for
> >> >> > completion
> >> >> > [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process
> >> >> > manager error waiting for completion
> >> >> > [root at beowulf testing]#
> >> >> >
> >>
> >
> > _______________________________________________
> > mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 26 May 2012 19:43:04 -0400
> From: angelo pascualetti <apascualetti at gmail.com>
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] Help Mpich2
> Message-ID:
>        <CAMTjRyTfSfwPu5RrU999Rm-qk3MQOcsivGrs9iXK0A7PVyb0=w at mail.gmail.com
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Good afternoon.
> I am installing the WRF numerical model in a single computer with 2 cores
> and 2 threads and I see that there are 3 ways to run: Serial, OpenMP and
> Dmpar.
> Mpich (Dmpar option) can be run on a single computer?.
> In other words, if I run the program with mpich indicating the two nuclei
> is
> faster than with openmp?
> How many processes (-np) should tell mpich to be faster than openmp?
> I'm running as follows:
>
> export CC=icc
> export FC=ifort
> ./configure --prefix=$HOME/Mpich2
> make
> make install
> cd $HOME
> touch .mpd.conf
> chmod 600 .mpd.conf
> gedit .mpdhost y escribir:
> MPD_SECRETWORD=angelo1903
> mpdboot -v -n 1 --ncpus=2
> mpirun -np 2 ./run.exe
>
>
> In advance thank you very much.
>
> --
> Angelo Pascualetti A.
> Meteor?logo
> Direccion General Aeronautica Civil
> Aeropuerto Cerro Moreno
> Antofagasta
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120526/fb7557ed/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Sun, 27 May 2012 08:59:02 +0700
> From: Sinta Kartika Maharani <sintakm114080010 at gmail.com>
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Problem during running a parallel
>        processing.
> Message-ID:
>        <CAC66hFHVhbPKx_==T3ePsLWZ3JDZ5K_i-Wvwsui9M-eKm68EDQ at mail.gmail.com
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> I'm using the debugger in visual c++. The error was "Unhandled
> exception at 0x00f0157e in lastproject.exe: 0x0000094 : integer
> division by zero" the code is attached.
> The error is in "averow = NRA/numworkers;" whereas I declare the
> number of processor when running it with mpiexec.
>
> 2012/5/26 Jeff Hammond <jhammond at alcf.anl.gov>:
> > You need to include the code if you want helpful responses. ?An
> > infinite loop of "do you call X?" for all X is not a viable support
> > solution.
> >
> > Have you run your code through valgrind and gdb to ensure it is not a
> > simple bug unrelated to MPI? ?Does the program run without error in
> > serial?
> >
> > Jeff
> >
> > On Fri, May 25, 2012 at 12:06 PM, Sinta Kartika Maharani
> > <sintakm114080010 at gmail.com> wrote:
> >> Yes I do. Do I need to include the codes?
> >> :)
> >>
> >> 2012/5/25 Ju JiaJia <jujj603 at gmail.com>:
> >>> Did you call MPI_Finalize ?
> >>>
> >>> On Fri, May 25, 2012 at 10:00 AM, Sinta Kartika Maharani
> >>> <sintakm114080010 at gmail.com> wrote:
> >>>>
> >>>> I have some codes, matrix multiplication in MPI. When i run it,
> appear an
> >>>> error,
> >>>>
> >>>> job aborted
> >>>> rank: node : exit code[: error message]
> >>>> 0: sinta-PC: -1073741676: process 0 exited without calling finalize
> >>>>
> >>>> why was appear the error?
> >>>> _______________________________________________
> >>>> mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
> >>>> To manage subscription options or unsubscribe:
> >>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>>
> >> _______________________________________________
> >> mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
> >> To manage subscription options or unsubscribe:
> >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> > --
> > Jeff Hammond
> > Argonne Leadership Computing Facility
> > University of Chicago Computation Institute
> > jhammond at alcf.anl.gov / (630) 252-5381
> > http://www.linkedin.com/in/jeffhammond
> > https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> > _______________________________________________
> > mpich-discuss mailing list ? ? mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: matrixmul.c
> Type: text/x-csrc
> Size: 4628 bytes
> Desc: not available
> URL: <
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120527/739ef7d6/attachment.c
> >
>
> ------------------------------
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> End of mpich-discuss Digest, Vol 44, Issue 44
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120527/7fdbc125/attachment-0001.html>


More information about the mpich-discuss mailing list