[petsc-users] VecSetValues

Thu Nov 17 15:57:57 CST 2011

On Thu, Nov 17, 2011 at 3:54 PM, Robert Ellis <Robert.Ellis at geosoft.com>wrote:

>  Hello All,****
>
> ** **
>
> I have a troubling intermittent problem with the simple
> VecSetValues/VecAssemblyBegin functions after porting a robust long working
> application to a cloud platform. ****
>
> ** **
>
> **·         **I have 30M doubles on rank0****
>
> **·         **I intend to assign them non sequentially among 32
> processors, ranks 1-31.****
>
> **·         **On rank0 only I use VecSetValues(x,...) to make the
> assignment. So far everything is fine.****
>
> **·         **I call VecAssemblyBegin expecting this to distribute the
> values appropriately.****
>
> ** **
>
> Sometimes this works, but about 50% of the time I see errors, immediately
> on calling VecAssemblyBegin, of the following form:****
>
> ** **
>
>                 [23]PETSC ERROR: Fatal error in MPI_Allreduce: Other MPI
> error, error stack:****
>
>                MPI_Allreduce(919).........................:
> MPI_Allreduce(sbuf=0000000012DE29B0, rbuf=00000000069F6ED0, count=32,
> dtype=USER, op=0x98000000, comm=0x84000002) failed****
>
>                 MPIR_Allreduce_impl(776)...................:****
>
>                 MPIR_Allreduce_intra(220)..................:****
>
>                 MPIR_Bcast_impl(1273)......................:****
>
>                 MPIR_Bcast_intra(1107).....................:****
>
>                 MPIR_Bcast_binomial(143)...................:****
>
>                 MPIC_Recv(110).............................:****
>
>                 MPIC_Wait(540).............................:****
>
>                 MPIDI_CH3I_Progress(353)...................:****
>
>                 MPID_nem_mpich2_blocking_recv(905).........:****
>
>                 MPID_nem_newtcp_module_poll(37)............:****
>
>                 MPID_nem_newtcp_module_connpoll(2655)......:****
>
>                 recv_id_or_tmpvc_info_success_handler(1278): read from
> socket failed - No error****
>
>                 --------------------- Error Message
> ------------------------------------****
>
>                 [23]PETSC ERROR: Out of memory. This could be due to
> allocating****
>
>                 [23]PETSC ERROR: too large an object or bleeding by not
> properly****
>
>                 [23]PETSC ERROR: destroying unneeded objects.****
>
>                 [23]PETSC ERROR: Memory allocated 0 Memory used by process
> 0****
>
>                 [23]PETSC ERROR: Try running with -malloc_dump or
> -malloc_log for info.****
>
>                 [23]PETSC ERROR: Memory requested 18446744066053327000!***
> *
>
>                 [23]PETSC ERROR:
> ------------------------------------------------------------------------**
> **
>
>                 [23]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon
> Dec 20 14:26:37 CST 2010****
>
>                 [23]PETSC ERROR: See docs/changes/index.html for recent
> updates.****
>
>                 [23]PETSC ERROR: See docs/faq.html for hints about trouble
> shooting.****
>
>                 [23]PETSC ERROR: See docs/in ...****
>
> ** **
>
> My questions are (1) has anybody seen anything like this type of
> VecAssemblyBegin error? or (2) is it likely that splitting the VecSetValue
> in smaller blocks will help? or (4) is it likely that moving to mpich2
> 1.4p1 would help?  (3) any other thoughts?
>

I would recommend interleaving your VecSetValues() with
VecAssemblyBegin/End() calls. It certainly sounds like you are overflowing
buffers in the MPI implementation.

   Matt

> Thanks in advance,****
>
> Rob  ****
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111117/aaa27bc9/attachment.htm>