[petsc-users] VecSetValues

Robert Ellis Robert.Ellis at geosoft.com
Thu Nov 17 15:54:55 CST 2011


Hello All,

I have a troubling intermittent problem with the simple VecSetValues/VecAssemblyBegin functions after porting a robust long working application to a cloud platform.


*         I have 30M doubles on rank0

*         I intend to assign them non sequentially among 32 processors, ranks 1-31.

*         On rank0 only I use VecSetValues(x,...) to make the assignment. So far everything is fine.

*         I call VecAssemblyBegin expecting this to distribute the values appropriately.

Sometimes this works, but about 50% of the time I see errors, immediately on calling VecAssemblyBegin, of the following form:

                [23]PETSC ERROR: Fatal error in MPI_Allreduce: Other MPI error, error stack:
               MPI_Allreduce(919).........................: MPI_Allreduce(sbuf=0000000012DE29B0, rbuf=00000000069F6ED0, count=32, dtype=USER, op=0x98000000, comm=0x84000002) failed
                MPIR_Allreduce_impl(776)...................:
                MPIR_Allreduce_intra(220)..................:
                MPIR_Bcast_impl(1273)......................:
                MPIR_Bcast_intra(1107).....................:
                MPIR_Bcast_binomial(143)...................:
                MPIC_Recv(110).............................:
                MPIC_Wait(540).............................:
                MPIDI_CH3I_Progress(353)...................:
                MPID_nem_mpich2_blocking_recv(905).........:
                MPID_nem_newtcp_module_poll(37)............:
                MPID_nem_newtcp_module_connpoll(2655)......:
                recv_id_or_tmpvc_info_success_handler(1278): read from socket failed - No error
                --------------------- Error Message ------------------------------------
                [23]PETSC ERROR: Out of memory. This could be due to allocating
                [23]PETSC ERROR: too large an object or bleeding by not properly
                [23]PETSC ERROR: destroying unneeded objects.
                [23]PETSC ERROR: Memory allocated 0 Memory used by process 0
                [23]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
                [23]PETSC ERROR: Memory requested 18446744066053327000!
                [23]PETSC ERROR: ------------------------------------------------------------------------
                [23]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 CST 2010
                [23]PETSC ERROR: See docs/changes/index.html for recent updates.
                [23]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
                [23]PETSC ERROR: See docs/in ...

My questions are (1) has anybody seen anything like this type of VecAssemblyBegin error? or (2) is it likely that splitting the VecSetValue in smaller blocks will help? or (4) is it likely that moving to mpich2 1.4p1 would help?  (3) any other thoughts?

Thanks in advance,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111117/975b4ce7/attachment-0001.htm>


More information about the petsc-users mailing list