Hi Jeff,<div><br></div><div>As Rajeev said I used MPI_IN_PLACE instead of senbuf, still I am not able to get the things right.</div><div>Thats why I posted my used code and error.</div><div><br></div><div>I am using :</div>
<div>MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank], MPI::CHAR, (void*)(Data), Count, Displ, MPI::CHAR, 0);</div><div><br></div><div>instead of my previous code:<br>//MPI::COMM_WORLD.Gatherv((const void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR, (void*)(Data), Count, Displ, MPI::CHAR, 0); </div>
<div><br></div><div>and still getting the error. What wrong I am doing?<br><br>Thanks...</div><div><br><div class="gmail_quote">On Sun, May 27, 2012 at 7:29 AM, <span dir="ltr"><<a href="mailto:mpich-discuss-request@mcs.anl.gov" target="_blank">mpich-discuss-request@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send mpich-discuss mailing list submissions to<br>
<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:mpich-discuss-request@mcs.anl.gov">mpich-discuss-request@mcs.anl.gov</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:mpich-discuss-owner@mcs.anl.gov">mpich-discuss-owner@mcs.anl.gov</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of mpich-discuss digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: Unable to run program parallely on cluster...Its running<br>
properly on single machine... (Jeff Hammond)<br>
2. Help Mpich2 (angelo pascualetti)<br>
3. Re: Problem during running a parallel processing.<br>
(Sinta Kartika Maharani)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Sat, 26 May 2012 16:08:13 -0500<br>
From: Jeff Hammond <<a href="mailto:jhammond@alcf.anl.gov">jhammond@alcf.anl.gov</a>><br>
To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
Subject: Re: [mpich-discuss] Unable to run program parallely on<br>
cluster...Its running properly on single machine...<br>
Message-ID:<br>
<CAGKz=<a href="mailto:u%2B%2BEaYOLCbs3Stda4MWti2bS3JD3jaoa1dm2jGU0usYkQ@mail.gmail.com">u++EaYOLCbs3Stda4MWti2bS3JD3jaoa1dm2jGU0usYkQ@mail.gmail.com</a>><br>
Content-Type: text/plain; charset=ISO-8859-1<br>
<br>
Rajeev answered your question two days ago.<br>
<br>
Jeff<br>
<br>
On Sat, May 26, 2012 at 3:53 PM, Albert Spade <<a href="mailto:albert.spade@gmail.com">albert.spade@gmail.com</a>> wrote:<br>
> Hi Jeff,<br>
><br>
> Thanks for you reply.<br>
> N sorry I didnt changed the subject before.<br>
> I want to say, as you said I am already using gatherv then why i am getting<br>
> this error???<br>
><br>
><br>
> ? ? ? ? ? ? ? ? MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank],<br>
> MPI::CHAR, (void*)(Data), Count, Displ, MPI::CHAR, 0);<br>
> ? ? ? ? ? ? ? ? //MPI::COMM_WORLD.Gatherv((const<br>
> void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR,<br>
> (void*)(Data), Count, Displ, MPI::CHAR, 0);<br>
>><br>
>><br>
>><br>
>> Message: 2<br>
>> Date: Fri, 25 May 2012 09:46:11 -0500<br>
>> From: Jeff Hammond <<a href="mailto:jhammond@alcf.anl.gov">jhammond@alcf.anl.gov</a>><br>
>> To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> Subject: Re: [mpich-discuss] mpich-discuss Digest, Vol 44, Issue 36<br>
>> Message-ID:<br>
>><br>
>> ?<CAGKz=<a href="mailto:uKG%2BupZGHp-cZ5_2hv9von%2BKFjKO3QRdp3qcorJQh81_g@mail.gmail.com">uKG+upZGHp-cZ5_2hv9von+KFjKO3QRdp3qcorJQh81_g@mail.gmail.com</a>><br>
>> Content-Type: text/plain; charset=ISO-8859-1<br>
>><br>
>> The error is pretty obvious in the output:<br>
>><br>
>> PMPI_Gatherv(335): sendbuf cannot be MPI_IN_PLACE<br>
>><br>
>> You cannot use MPI_IN_PLACE with MPI_Gather<br>
>> (<a href="https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/163" target="_blank">https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/163</a>) but it is<br>
>> allowed in MPI_Gatherv as of MPI 2.2, so I don't know why the<br>
>> implementation does not allow this.<br>
>><br>
>> Jeff<br>
>><br>
>> On Fri, May 25, 2012 at 8:21 AM, Albert Spade <<a href="mailto:albert.spade@gmail.com">albert.spade@gmail.com</a>><br>
>> wrote:<br>
>> > Thanks Rajeev and Darius,<br>
>> ><br>
>> > I tried to use MPI_IN_PLACE but not getting the desired results. Can you<br>
>> > please tell me how to make it working.<br>
>> ><br>
>> > This is the previous code :<br>
>> ><br>
>> > ?? ? ? ? //MPI::COMM_WORLD.Gatherv((const<br>
>> > void*)(Data+StartFrom[nStages-1][rank]), Count[rank], MPI::CHAR,<br>
>> > (void*)(Data), Count, Displ, MPI::CHAR, 0);<br>
>> ><br>
>> > And this is how I changed it.<br>
>> ><br>
>> > ?MPI::COMM_WORLD.Gatherv(MPI_IN_PLACE, Count[rank], MPI::CHAR,<br>
>> > (void*)(Data), Count, Displ, MPI::CHAR, 0);<br>
>> ><br>
>> > Am I doing it wrong?<br>
>> ><br>
>> > Thanks.<br>
>> ><br>
>> > My output after making above changes.<br>
>> > ==============================<br>
>> > [root@beowulf programs]# mpiexec -n 1 ./output<br>
>> > Time taken for 16 elements using 1 processors = 2.81334e-05 seconds<br>
>> > [root@beowulf programs]# mpiexec -n 2 ./output<br>
>> > Fatal error in PMPI_Gatherv: Invalid buffer pointer, error stack:<br>
>> > PMPI_Gatherv(398): MPI_Gatherv failed(sbuf=MPI_IN_PLACE, scount=64,<br>
>> > MPI_CHAR, rbuf=0x879d500, rcnts=0x879d6b8, displs=0x879d6c8, MPI_CHAR,<br>
>> > root=0, MPI_COMM_WORLD) failed<br>
>> > PMPI_Gatherv(335): sendbuf cannot be MPI_IN_PLACE<br>
>> ><br>
>> ><br>
>> > =====================================================================================<br>
>> > = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
>> > = ? EXIT CODE: 256<br>
>> > = ? CLEANING UP REMAINING PROCESSES<br>
>> > = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
>> ><br>
>> > =====================================================================================<br>
>> > *** glibc detected *** mpiexec: double free or corruption (fasttop):<br>
>> > 0x094fb038 ***<br>
>> > ======= Backtrace: =========<br>
>> > /lib/libc.so.6[0x7d4a31]<br>
>> > mpiexec[0x8077b11]<br>
>> > mpiexec[0x8053c7f]<br>
>> > mpiexec[0x8053e73]<br>
>> > mpiexec[0x805592a]<br>
>> > mpiexec[0x8077186]<br>
>> > mpiexec[0x807639e]<br>
>> > mpiexec[0x80518f8]<br>
>> > mpiexec[0x804ad65]<br>
>> > /lib/libc.so.6(__libc_start_main+0xe6)[0x77cce6]<br>
>> > mpiexec[0x804a061]<br>
>> > ======= Memory map: ========<br>
>> > 00547000-00548000 r-xp 00000000 00:00 0 ? ? ? ? ?[vdso]<br>
>> > 0054b000-0068f000 r-xp 00000000 fd:00 939775 ? ?<br>
>> > /usr/lib/libxml2.so.2.7.6<br>
>> > 0068f000-00694000 rw-p 00143000 fd:00 939775 ? ?<br>
>> > /usr/lib/libxml2.so.2.7.6<br>
>> > 00694000-00695000 rw-p 00000000 00:00 0<br>
>> > 00740000-0075e000 r-xp 00000000 fd:00 2105890 ? ?/lib/<a href="http://ld-2.12.so" target="_blank">ld-2.12.so</a><br>
>> > 0075e000-0075f000 r--p 0001d000 fd:00 2105890 ? ?/lib/<a href="http://ld-2.12.so" target="_blank">ld-2.12.so</a><br>
>> > 0075f000-00760000 rw-p 0001e000 fd:00 2105890 ? ?/lib/<a href="http://ld-2.12.so" target="_blank">ld-2.12.so</a><br>
>> > 00766000-008ef000 r-xp 00000000 fd:00 2105891 ? ?/lib/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a><br>
>> > 008ef000-008f0000 ---p 00189000 fd:00 2105891 ? ?/lib/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a><br>
>> > 008f0000-008f2000 r--p 00189000 fd:00 2105891 ? ?/lib/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a><br>
>> > 008f2000-008f3000 rw-p 0018b000 fd:00 2105891 ? ?/lib/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a><br>
>> > 008f3000-008f6000 rw-p 00000000 00:00 0<br>
>> > 008f8000-008fb000 r-xp 00000000 fd:00 2105893 ? ?/lib/<a href="http://libdl-2.12.so" target="_blank">libdl-2.12.so</a><br>
>> > 008fb000-008fc000 r--p 00002000 fd:00 2105893 ? ?/lib/<a href="http://libdl-2.12.so" target="_blank">libdl-2.12.so</a><br>
>> > 008fc000-008fd000 rw-p 00003000 fd:00 2105893 ? ?/lib/<a href="http://libdl-2.12.so" target="_blank">libdl-2.12.so</a><br>
>> > 008ff000-00916000 r-xp 00000000 fd:00 2105900 ? ?/lib/<a href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a><br>
>> > 00916000-00917000 r--p 00016000 fd:00 2105900 ? ?/lib/<a href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a><br>
>> > 00917000-00918000 rw-p 00017000 fd:00 2105900 ? ?/lib/<a href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a><br>
>> > 00918000-0091a000 rw-p 00000000 00:00 0<br>
>> > 0091c000-0092e000 r-xp 00000000 fd:00 2105904 ? ?/lib/libz.so.1.2.3<br>
>> > 0092e000-0092f000 r--p 00011000 fd:00 2105904 ? ?/lib/libz.so.1.2.3<br>
>> > 0092f000-00930000 rw-p 00012000 fd:00 2105904 ? ?/lib/libz.so.1.2.3<br>
>> > 00932000-0095a000 r-xp 00000000 fd:00 2098429 ? ?/lib/<a href="http://libm-2.12.so" target="_blank">libm-2.12.so</a><br>
>> > 0095a000-0095b000 r--p 00027000 fd:00 2098429 ? ?/lib/<a href="http://libm-2.12.so" target="_blank">libm-2.12.so</a><br>
>> > 0095b000-0095c000 rw-p 00028000 fd:00 2098429 ? ?/lib/<a href="http://libm-2.12.so" target="_blank">libm-2.12.so</a><br>
>> > 00bb0000-00bcd000 r-xp 00000000 fd:00 2105914<br>
>> > ?/lib/libgcc_s-4.4.6-20110824.so.1<br>
>> > 00bcd000-00bce000 rw-p 0001d000 fd:00 2105914<br>
>> > ?/lib/libgcc_s-4.4.6-20110824.so.1<br>
>> > 00c18000-00c24000 r-xp 00000000 fd:00 2098123 ?<br>
>> > ?/lib/<a href="http://libnss_files-2.12.so" target="_blank">libnss_files-2.12.so</a><br>
>> > 00c24000-00c25000 r--p 0000b000 fd:00 2098123 ?<br>
>> > ?/lib/<a href="http://libnss_files-2.12.so" target="_blank">libnss_files-2.12.so</a><br>
>> > 00c25000-00c26000 rw-p 0000c000 fd:00 2098123 ?<br>
>> > ?/lib/<a href="http://libnss_files-2.12.so" target="_blank">libnss_files-2.12.so</a><br>
>> > 00ce9000-00d00000 r-xp 00000000 fd:00 2105929 ? ?/lib/<a href="http://libnsl-2.12.so" target="_blank">libnsl-2.12.so</a><br>
>> > 00d00000-00d01000 r--p 00016000 fd:00 2105929 ? ?/lib/<a href="http://libnsl-2.12.so" target="_blank">libnsl-2.12.so</a><br>
>> > 00d01000-00d02000 rw-p 00017000 fd:00 2105929 ? ?/lib/<a href="http://libnsl-2.12.so" target="_blank">libnsl-2.12.so</a><br>
>> > 00d02000-00d04000 rw-p 00000000 00:00 0<br>
>> > 08048000-080a0000 r-xp 00000000 fd:00 656990<br>
>> > /opt/mpich2-1.4.1p1/bin/bin/mpiexec.hydra<br>
>> > 080a0000-080a1000 rw-p 00058000 fd:00 656990<br>
>> > /opt/mpich2-1.4.1p1/bin/bin/mpiexec.hydra<br>
>> > 080a1000-080a3000 rw-p 00000000 00:00 0<br>
>> > 094ee000-0950f000 rw-p 00000000 00:00 0 ? ? ? ? ?[heap]<br>
>> > b7893000-b7896000 rw-p 00000000 00:00 0<br>
>> > b78a4000-b78a7000 rw-p 00000000 00:00 0<br>
>> > bff80000-bff95000 rw-p 00000000 00:00 0 ? ? ? ? ?[stack]<br>
>> > Aborted (core dumped)<br>
>> > [root@beowulf programs]#<br>
>> ><br>
>> ><br>
>> > On Tue, May 22, 2012 at 10:30 PM, <<a href="mailto:mpich-discuss-request@mcs.anl.gov">mpich-discuss-request@mcs.anl.gov</a>><br>
>> > wrote:<br>
>> >><br>
>> >> Send mpich-discuss mailing list submissions to<br>
>> >> ? ? ? ?<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> >><br>
>> >> To subscribe or unsubscribe via the World Wide Web, visit<br>
>> >> ? ? ? ?<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
>> >> or, via email, send a message with subject or body 'help' to<br>
>> >> ? ? ? ?<a href="mailto:mpich-discuss-request@mcs.anl.gov">mpich-discuss-request@mcs.anl.gov</a><br>
>> >><br>
>> >> You can reach the person managing the list at<br>
>> >> ? ? ? ?<a href="mailto:mpich-discuss-owner@mcs.anl.gov">mpich-discuss-owner@mcs.anl.gov</a><br>
>> >><br>
>> >> When replying, please edit your Subject line so it is more specific<br>
>> >> than "Re: Contents of mpich-discuss digest..."<br>
>> >><br>
>> >><br>
>> >> Today's Topics:<br>
>> >><br>
>> >> ? 1. ?Unable to run program parallely on cluster... Its running<br>
>> >> ? ? ?properly on single machine... (Albert Spade)<br>
>> >> ? 2. ?Not able to run program parallely on cluster... (Albert Spade)<br>
>> >> ? 3. Re: ?Unable to run program parallely on cluster... ? ? ? ?Its<br>
>> >> ? ? ?running properly on single machine... (Darius Buntinas)<br>
>> >> ? 4. Re: ?Not able to run program parallely on cluster...<br>
>> >> ? ? ?(Rajeev Thakur)<br>
>> >> ? 5. ?replication of mpi applications (Thomas Ropars)<br>
>> >><br>
>> >><br>
>> >> ----------------------------------------------------------------------<br>
>> >><br>
>> >> Message: 1<br>
>> >> Date: Tue, 22 May 2012 00:12:24 +0530<br>
>> >> From: Albert Spade <<a href="mailto:albert.spade@gmail.com">albert.spade@gmail.com</a>><br>
>> >> To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> >> Subject: [mpich-discuss] Unable to run program parallely on cluster...<br>
>> >> ? ? ? ?Its running properly on single machine...<br>
>> >> Message-ID:<br>
>> >><br>
>> >> ?<<a href="mailto:CAP2uaQopgOwaFNfCF49gcnW9REw8CQtWGMgf0U8RyNYStTFw1A@mail.gmail.com">CAP2uaQopgOwaFNfCF49gcnW9REw8CQtWGMgf0U8RyNYStTFw1A@mail.gmail.com</a>><br>
>> >> Content-Type: text/plain; charset="iso-8859-1"<br>
>> >><br>
>> >> Hi everybody,<br>
>> >><br>
>> >> I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1<br>
>> >> I have a cluster of 5 machines.<br>
>> >> When I am trying to run the program for parallel fast fourier transform<br>
>> >> on<br>
>> >> single machine it runs correctly but on a cluster it gives error.<br>
>> >> Can you please tell me why its happening.<br>
>> >><br>
>> >> Thanks.<br>
>> >><br>
>> >> Here is my sample output:<br>
>> >><br>
>> >><br>
>> >> ---------------------------------------------------------------------------------------<br>
>> >><br>
>> >> [root@beowulf programs]# mpiexec -n 1 ./Radix2<br>
>> >> Time taken for 16 elements using 1 processors = 2.7895e-05 seconds<br>
>> >> [root@beowulf programs]#<br>
>> >> [root@beowulf programs]# mpiexec -n 4 ./Radix2<br>
>> >> [mpiexec@beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):<br>
>> >> assert<br>
>> >> (!closed) failed<br>
>> >> [mpiexec@beowulf.master] HYDT_dmxu_poll_wait_for_event<br>
>> >> (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event<br>
>> >> [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> manager<br>
>> >> error waiting for completion<br>
>> >> [root@beowulf programs]# mpiexec -n 2 ./Radix2<br>
>> >> [mpiexec@beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):<br>
>> >> assert<br>
>> >> (!closed) failed<br>
>> >> [mpiexec@beowulf.master] HYDT_dmxu_poll_wait_for_event<br>
>> >> (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event<br>
>> >> [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> manager<br>
>> >> error waiting for completion<br>
>> >> [root@beowulf programs]# mpiexec -n 4 ./Radix2<br>
>> >> [mpiexec@beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):<br>
>> >> assert<br>
>> >> (!closed) failed<br>
>> >> [mpiexec@beowulf.master] HYDT_dmxu_poll_wait_for_event<br>
>> >> (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event<br>
>> >> [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> manager<br>
>> >> error waiting for completion<br>
>> >> [root@beowulf programs]#<br>
>> >> -------------- next part --------------<br>
>> >> An HTML attachment was scrubbed...<br>
>> >> URL:<br>
>> >><br>
>> >> <<a href="http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/25975b06/attachment-0001.html" target="_blank">http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/25975b06/attachment-0001.html</a>><br>
>> >><br>
>> >> ------------------------------<br>
>> >><br>
>> >> Message: 2<br>
>> >> Date: Tue, 22 May 2012 00:59:27 +0530<br>
>> >> From: Albert Spade <<a href="mailto:albert.spade@gmail.com">albert.spade@gmail.com</a>><br>
>> >> To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> >> Subject: [mpich-discuss] Not able to run program parallely on<br>
>> >> ? ? ? ?cluster...<br>
>> >> Message-ID:<br>
>> >><br>
>> >> ?<CAP2uaQpiMV0yqHsHfsWpgAQ=_<a href="mailto:K3M_ZGxsCm-S5BPvzbxH%2BZ9zQ@mail.gmail.com">K3M_ZGxsCm-S5BPvzbxH+Z9zQ@mail.gmail.com</a>><br>
>> >> Content-Type: text/plain; charset="iso-8859-1"<br>
>> >><br>
>> >> This is my new error after making few changes...<br>
>> >> Results are quite similar... No succes with cluster...<br>
>> >><br>
>> >> Sample run<br>
>> >> --------------------------------------------------------<br>
>> >><br>
>> >> [root@beowulf testing]# mpiexec -n 1 ./Radix<br>
>> >> Time taken for 16 elements using 1 processors = 4.72069e-05 seconds<br>
>> >> [root@beowulf testing]# mpiexec -n 2 ./Radix<br>
>> >> Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:<br>
>> >> PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x97d0500, scount=64,<br>
>> >> MPI_CHAR, rbuf=0x97d0500, rcnts=0x97d06b8, displs=0x97d06c8, MPI_CHAR,<br>
>> >> root=0, MPI_COMM_WORLD) failed<br>
>> >> MPIR_Gatherv_impl(210):<br>
>> >> MPIR_Gatherv(104).....:<br>
>> >> MPIR_Localcopy(357)...: memcpy arguments alias each other,<br>
>> >> dst=0x97d0500<br>
>> >> src=0x97d0500 len=64<br>
>> >><br>
>> >><br>
>> >><br>
>> >> =====================================================================================<br>
>> >> = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
>> >> = ? EXIT CODE: 256<br>
>> >> = ? CLEANING UP REMAINING PROCESSES<br>
>> >> = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
>> >><br>
>> >><br>
>> >> =====================================================================================<br>
>> >> [proxy:0:1@beowulf.node1] HYD_pmcd_pmip_control_cmd_cb<br>
>> >> (./pm/pmiserv/pmip_cb.c:927): assert (!closed) failed<br>
>> >> [proxy:0:1@beowulf.node1] HYDT_dmxu_poll_wait_for_event<br>
>> >> (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> [proxy:0:1@beowulf.node1] main (./pm/pmiserv/pmip.c:221): demux engine<br>
>> >> error waiting for event<br>
>> >> [mpiexec@beowulf.master] HYDT_bscu_wait_for_completion<br>
>> >> (./tools/bootstrap/utils/bscu_wait.c:77): one of the processes<br>
>> >> terminated<br>
>> >> badly; aborting<br>
>> >> [mpiexec@beowulf.master] HYDT_bsci_wait_for_completion<br>
>> >> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting<br>
>> >> for<br>
>> >> completion<br>
>> >> [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> (./pm/pmiserv/pmiserv_pmci.c:225): launcher returned error waiting for<br>
>> >> completion<br>
>> >> [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> manager<br>
>> >> error waiting for completion<br>
>> >> [root@beowulf testing]#<br>
>> >> -------------- next part --------------<br>
>> >> An HTML attachment was scrubbed...<br>
>> >> URL:<br>
>> >><br>
>> >> <<a href="http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/7b1db8c0/attachment-0001.html" target="_blank">http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120522/7b1db8c0/attachment-0001.html</a>><br>
>> >><br>
>> >> ------------------------------<br>
>> >><br>
>> >> Message: 3<br>
>> >> Date: Tue, 22 May 2012 03:36:44 +0800<br>
>> >> From: Darius Buntinas <<a href="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</a>><br>
>> >> To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> >> Subject: Re: [mpich-discuss] Unable to run program parallely on<br>
>> >> ? ? ? ?cluster... ? ? ?Its running properly on single machine...<br>
>> >> Message-ID: <<a href="mailto:B411B6C1-CB5A-4A1C-AEBB-71680C9AF8C5@mcs.anl.gov">B411B6C1-CB5A-4A1C-AEBB-71680C9AF8C5@mcs.anl.gov</a>><br>
>> >> Content-Type: text/plain; charset=us-ascii<br>
>> >><br>
>> >> It may be that one of your processes is failing, but also check to make<br>
>> >> sure every process is calling MPI_Finalize before exiting.<br>
>> >><br>
>> >> -d<br>
>> >><br>
>> >> On May 22, 2012, at 2:42 AM, Albert Spade wrote:<br>
>> >><br>
>> >> > Hi everybody,<br>
>> >> ><br>
>> >> > I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1<br>
>> >> > I have a cluster of 5 machines.<br>
>> >> > When I am trying to run the program for parallel fast fourier<br>
>> >> > transform<br>
>> >> > on single machine it runs correctly but on a cluster it gives error.<br>
>> >> > Can you please tell me why its happening.<br>
>> >> ><br>
>> >> > Thanks.<br>
>> >> ><br>
>> >> > Here is my sample output:<br>
>> >> ><br>
>> >> ><br>
>> >> > ---------------------------------------------------------------------------------------<br>
>> >> ><br>
>> >> > [root@beowulf programs]# mpiexec -n 1 ./Radix2<br>
>> >> > Time taken for 16 elements using 1 processors = 2.7895e-05 seconds<br>
>> >> > [root@beowulf programs]#<br>
>> >> > [root@beowulf programs]# mpiexec -n 4 ./Radix2<br>
>> >> > [mpiexec@beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):<br>
>> >> > assert (!closed) failed<br>
>> >> > [mpiexec@beowulf.master] HYDT_dmxu_poll_wait_for_event<br>
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> > [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event<br>
>> >> > [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> > manager error waiting for completion<br>
>> >> > [root@beowulf programs]# mpiexec -n 2 ./Radix2<br>
>> >> > [mpiexec@beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):<br>
>> >> > assert (!closed) failed<br>
>> >> > [mpiexec@beowulf.master] HYDT_dmxu_poll_wait_for_event<br>
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> > [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event<br>
>> >> > [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> > manager error waiting for completion<br>
>> >> > [root@beowulf programs]# mpiexec -n 4 ./Radix2<br>
>> >> > [mpiexec@beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197):<br>
>> >> > assert (!closed) failed<br>
>> >> > [mpiexec@beowulf.master] HYDT_dmxu_poll_wait_for_event<br>
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> > [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> > (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event<br>
>> >> > [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> > manager error waiting for completion<br>
>> >> > [root@beowulf programs]#<br>
>> >> > _______________________________________________<br>
>> >> > mpich-discuss mailing list ? ? <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> >> > To manage subscription options or unsubscribe:<br>
>> >> > <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
>> >><br>
>> >><br>
>> >><br>
>> >> ------------------------------<br>
>> >><br>
>> >> Message: 4<br>
>> >> Date: Mon, 21 May 2012 20:14:35 -0500<br>
>> >> From: Rajeev Thakur <<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>><br>
>> >> To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> >> Subject: Re: [mpich-discuss] Not able to run program parallely on<br>
>> >> ? ? ? ?cluster...<br>
>> >> Message-ID: <<a href="mailto:8C80534E-3611-40D7-BBAF-F66110D25EE1@mcs.anl.gov">8C80534E-3611-40D7-BBAF-F66110D25EE1@mcs.anl.gov</a>><br>
>> >> Content-Type: text/plain; charset=us-ascii<br>
>> >><br>
>> >> You are passing the same buffer as the sendbuf and recvbuf to<br>
>> >> MPI_Gatherv,<br>
>> >> which is not allowed in MPI. Use MPI_IN_PLACE as described in the<br>
>> >> standard.<br>
>> >><br>
>> >><br>
>> >> On May 21, 2012, at 2:29 PM, Albert Spade wrote:<br>
>> >><br>
>> >> > This is my new error after making few changes...<br>
>> >> > Results are quite similar... No succes with cluster...<br>
>> >> ><br>
>> >> > Sample run<br>
>> >> > --------------------------------------------------------<br>
>> >> ><br>
>> >> > [root@beowulf testing]# mpiexec -n 1 ./Radix<br>
>> >> > Time taken for 16 elements using 1 processors = 4.72069e-05 seconds<br>
>> >> > [root@beowulf testing]# mpiexec -n 2 ./Radix<br>
>> >> > Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:<br>
>> >> > PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x97d0500, scount=64,<br>
>> >> > MPI_CHAR, rbuf=0x97d0500, rcnts=0x97d06b8, displs=0x97d06c8,<br>
>> >> > MPI_CHAR,<br>
>> >> > root=0, MPI_COMM_WORLD) failed<br>
>> >> > MPIR_Gatherv_impl(210):<br>
>> >> > MPIR_Gatherv(104).....:<br>
>> >> > MPIR_Localcopy(357)...: memcpy arguments alias each other,<br>
>> >> > dst=0x97d0500<br>
>> >> > src=0x97d0500 len=64<br>
>> >> ><br>
>> >> ><br>
>> >> > =====================================================================================<br>
>> >> > = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
>> >> > = ? EXIT CODE: 256<br>
>> >> > = ? CLEANING UP REMAINING PROCESSES<br>
>> >> > = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
>> >> ><br>
>> >> ><br>
>> >> > =====================================================================================<br>
>> >> > [proxy:0:1@beowulf.node1] HYD_pmcd_pmip_control_cmd_cb<br>
>> >> > (./pm/pmiserv/pmip_cb.c:927): assert (!closed) failed<br>
>> >> > [proxy:0:1@beowulf.node1] HYDT_dmxu_poll_wait_for_event<br>
>> >> > (./tools/demux/demux_poll.c:77): callback returned error status<br>
>> >> > [proxy:0:1@beowulf.node1] main (./pm/pmiserv/pmip.c:221): demux<br>
>> >> > engine<br>
>> >> > error waiting for event<br>
>> >> > [mpiexec@beowulf.master] HYDT_bscu_wait_for_completion<br>
>> >> > (./tools/bootstrap/utils/bscu_wait.c:77): one of the processes<br>
>> >> > terminated<br>
>> >> > badly; aborting<br>
>> >> > [mpiexec@beowulf.master] HYDT_bsci_wait_for_completion<br>
>> >> > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error<br>
>> >> > waiting for<br>
>> >> > completion<br>
>> >> > [mpiexec@beowulf.master] HYD_pmci_wait_for_completion<br>
>> >> > (./pm/pmiserv/pmiserv_pmci.c:225): launcher returned error waiting<br>
>> >> > for<br>
>> >> > completion<br>
>> >> > [mpiexec@beowulf.master] main (./ui/mpich/mpiexec.c:437): process<br>
>> >> > manager error waiting for completion<br>
>> >> > [root@beowulf testing]#<br>
>> >> ><br>
>><br>
><br>
> _______________________________________________<br>
> mpich-discuss mailing list ? ? <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
Argonne Leadership Computing Facility<br>
University of Chicago Computation Institute<br>
<a href="mailto:jhammond@alcf.anl.gov">jhammond@alcf.anl.gov</a> / (630) 252-5381<br>
<a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/jeffhammond</a><br>
<a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond</a><br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Sat, 26 May 2012 19:43:04 -0400<br>
From: angelo pascualetti <<a href="mailto:apascualetti@gmail.com">apascualetti@gmail.com</a>><br>
To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
Subject: [mpich-discuss] Help Mpich2<br>
Message-ID:<br>
<CAMTjRyTfSfwPu5RrU999Rm-qk3MQOcsivGrs9iXK0A7PVyb0=<a href="mailto:w@mail.gmail.com">w@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Good afternoon.<br>
I am installing the WRF numerical model in a single computer with 2 cores<br>
and 2 threads and I see that there are 3 ways to run: Serial, OpenMP and<br>
Dmpar.<br>
Mpich (Dmpar option) can be run on a single computer?.<br>
In other words, if I run the program with mpich indicating the two nuclei is<br>
faster than with openmp?<br>
How many processes (-np) should tell mpich to be faster than openmp?<br>
I'm running as follows:<br>
<br>
export CC=icc<br>
export FC=ifort<br>
./configure --prefix=$HOME/Mpich2<br>
make<br>
make install<br>
cd $HOME<br>
touch .mpd.conf<br>
chmod 600 .mpd.conf<br>
gedit .mpdhost y escribir:<br>
MPD_SECRETWORD=angelo1903<br>
mpdboot -v -n 1 --ncpus=2<br>
mpirun -np 2 ./run.exe<br>
<br>
<br>
In advance thank you very much.<br>
<br>
--<br>
Angelo Pascualetti A.<br>
Meteor?logo<br>
Direccion General Aeronautica Civil<br>
Aeropuerto Cerro Moreno<br>
Antofagasta<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120526/fb7557ed/attachment-0001.html" target="_blank">http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120526/fb7557ed/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Sun, 27 May 2012 08:59:02 +0700<br>
From: Sinta Kartika Maharani <<a href="mailto:sintakm114080010@gmail.com">sintakm114080010@gmail.com</a>><br>
To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
Subject: Re: [mpich-discuss] Problem during running a parallel<br>
processing.<br>
Message-ID:<br>
<CAC66hFHVhbPKx_==<a href="mailto:T3ePsLWZ3JDZ5K_i-Wvwsui9M-eKm68EDQ@mail.gmail.com">T3ePsLWZ3JDZ5K_i-Wvwsui9M-eKm68EDQ@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
I'm using the debugger in visual c++. The error was "Unhandled<br>
exception at 0x00f0157e in lastproject.exe: 0x0000094 : integer<br>
division by zero" the code is attached.<br>
The error is in "averow = NRA/numworkers;" whereas I declare the<br>
number of processor when running it with mpiexec.<br>
<br>
2012/5/26 Jeff Hammond <<a href="mailto:jhammond@alcf.anl.gov">jhammond@alcf.anl.gov</a>>:<br>
> You need to include the code if you want helpful responses. ?An<br>
> infinite loop of "do you call X?" for all X is not a viable support<br>
> solution.<br>
><br>
> Have you run your code through valgrind and gdb to ensure it is not a<br>
> simple bug unrelated to MPI? ?Does the program run without error in<br>
> serial?<br>
><br>
> Jeff<br>
><br>
> On Fri, May 25, 2012 at 12:06 PM, Sinta Kartika Maharani<br>
> <<a href="mailto:sintakm114080010@gmail.com">sintakm114080010@gmail.com</a>> wrote:<br>
>> Yes I do. Do I need to include the codes?<br>
>> :)<br>
>><br>
>> 2012/5/25 Ju JiaJia <<a href="mailto:jujj603@gmail.com">jujj603@gmail.com</a>>:<br>
>>> Did you call MPI_Finalize ?<br>
>>><br>
>>> On Fri, May 25, 2012 at 10:00 AM, Sinta Kartika Maharani<br>
>>> <<a href="mailto:sintakm114080010@gmail.com">sintakm114080010@gmail.com</a>> wrote:<br>
>>>><br>
>>>> I have some codes, matrix multiplication in MPI. When i run it, appear an<br>
>>>> error,<br>
>>>><br>
>>>> job aborted<br>
>>>> rank: node : exit code[: error message]<br>
>>>> 0: sinta-PC: -1073741676: process 0 exited without calling finalize<br>
>>>><br>
>>>> why was appear the error?<br>
>>>> _______________________________________________<br>
>>>> mpich-discuss mailing list ? ? <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>>>> To manage subscription options or unsubscribe:<br>
>>>> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
>>><br>
>>><br>
>>><br>
>>> _______________________________________________<br>
>>> mpich-discuss mailing list ? ? <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>>> To manage subscription options or unsubscribe:<br>
>>> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
>>><br>
>> _______________________________________________<br>
>> mpich-discuss mailing list ? ? <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
>> To manage subscription options or unsubscribe:<br>
>> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
><br>
><br>
> --<br>
> Jeff Hammond<br>
> Argonne Leadership Computing Facility<br>
> University of Chicago Computation Institute<br>
> <a href="mailto:jhammond@alcf.anl.gov">jhammond@alcf.anl.gov</a> / (630) 252-5381<br>
> <a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/jeffhammond</a><br>
> <a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond</a><br>
> _______________________________________________<br>
> mpich-discuss mailing list ? ? <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: matrixmul.c<br>
Type: text/x-csrc<br>
Size: 4628 bytes<br>
Desc: not available<br>
URL: <<a href="http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120527/739ef7d6/attachment.c" target="_blank">http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120527/739ef7d6/attachment.c</a>><br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
<br>
End of mpich-discuss Digest, Vol 44, Issue 44<br>
*********************************************<br>
</blockquote></div><br></div>