[petsc-dev] failed nightly
Lawrence Mitchell
lawrence.mitchell at imperial.ac.uk
Fri Apr 10 02:45:13 CDT 2015
(cc'ing petsc-dev as well)
> On 10 Apr 2015, at 00:51, Satish Balay <balay at mcs.anl.gov> wrote:
>
> Its likely a codebug some where. MPICH build also gives a valgrind trace.
It's probable that the OMPI implementation is buggy enough to completely not work. I'm a little confused by the MPICH issue. I don't understand enough about the datatype implementation in the window SF type to know if this is a PETSc issue, or an MPICH one. I note in passing that all the ex1 tests exhibit a similar valgrind trace.
For ex2 at least, maybe the simplest option is to turn off the window test entirely. Like this:
diff --git a/src/vec/is/sf/examples/tutorials/makefile b/src/vec/is/sf/examples/tutorials
index aeaf1e4..e7774c5 100644
--- a/src/vec/is/sf/examples/tutorials/makefile
+++ b/src/vec/is/sf/examples/tutorials/makefile
@@ -86,7 +86,7 @@ runex2_window:
${RM} -f ex2.tmp
TESTEXAMPLES_C = ex1.PETSc runex1_basic runex1_2_basic runex1_3_basic runex1
- ex2.PETSc runex2_basic runex2_window ex2.rm
+ ex2.PETSc runex2_basic ex2.rm
TESTEXAMPLES_C_X =
TESTEXAMPLES_FORTRAN =
TESTEXAMPLES_FORTRAN_MPIUNI =
Lawrence
> Satish
>
> ----------
>
> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
> $ mpiexec -n 2 valgrind --tool=memcheck -q --dsymutil=yes --num-callers=40 --track-origins=yes ./ex2 -sf_type window
> PetscSF Object: 2 MPI processes
> type: window
> synchronization=FENCE sort=rank-order
> [0] Number of roots=1, leaves=2, remote ranks=2
> [0] 0 <- (0,0)
> [0] 1 <- (1,0)
> [1] Number of roots=1, leaves=2, remote ranks=2
> [1] 0 <- (1,0)
> [1] 1 <- (0,0)
> ==29265== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==29265== at 0x8F474E7: writev (in /usr/lib64/libc-2.20.so)
> ==29265== by 0x894AD87: MPL_large_writev (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x8941A48: MPIDU_Sock_writev (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x892AC7D: MPIDI_CH3_iStartMsgv (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x8911D08: recv_rma_msg (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x8913D46: MPIDI_Win_fence (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x88C91EB: PMPI_Win_fence (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x50FB025: PetscSFRestoreWindow (sfwindow.c:348)
> ==29265== by 0x50FD4BF: PetscSFBcastEnd_Window (sfwindow.c:510)
> ==29265== by 0x5123CD9: PetscSFBcastEnd (sf.c:957)
> ==29265== by 0x401CAF: main (ex2.c:81)
> ==29265== Address 0x99436ec is 108 bytes inside a block of size 208 alloc'd
> ==29265== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==29265== by 0x890DA35: MPIDI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x88C406A: PMPI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x50FD0DA: PetscSFBcastBegin_Window (sfwindow.c:495)
> ==29265== by 0x51235B5: PetscSFBcastBegin (sf.c:924)
> ==29265== by 0x401BD3: main (ex2.c:79)
> ==29265== Uninitialised value was created by a heap allocation
> ==29265== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==29265== by 0x890DA35: MPIDI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x88C406A: PMPI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29265== by 0x50FD0DA: PetscSFBcastBegin_Window (sfwindow.c:495)
> ==29265== by 0x51235B5: PetscSFBcastBegin (sf.c:924)
> ==29265== by 0x401BD3: main (ex2.c:79)
> ==29265==
> ==29266== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==29266== at 0x8F474E7: writev (in /usr/lib64/libc-2.20.so)
> ==29266== by 0x894AD87: MPL_large_writev (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x8941A48: MPIDU_Sock_writev (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x892AC7D: MPIDI_CH3_iStartMsgv (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x8911D08: recv_rma_msg (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x8913D46: MPIDI_Win_fence (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x88C91EB: PMPI_Win_fence (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x50FB025: PetscSFRestoreWindow (sfwindow.c:348)
> ==29266== by 0x50FD4BF: PetscSFBcastEnd_Window (sfwindow.c:510)
> ==29266== by 0x5123CD9: PetscSFBcastEnd (sf.c:957)
> ==29266== by 0x401CAF: main (ex2.c:81)
> ==29266== Address 0x98d16dc is 108 bytes inside a block of size 208 alloc'd
> ==29266== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==29266== by 0x890DA35: MPIDI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x88C406A: PMPI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x50FD0DA: PetscSFBcastBegin_Window (sfwindow.c:495)
> ==29266== by 0x51235B5: PetscSFBcastBegin (sf.c:924)
> ==29266== by 0x401BD3: main (ex2.c:79)
> ==29266== Uninitialised value was created by a heap allocation
> ==29266== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==29266== by 0x890DA35: MPIDI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x88C406A: PMPI_Get (in /home/balay/soft/mpich-3.1.3/lib/libmpi.so.12.0.4)
> ==29266== by 0x50FD0DA: PetscSFBcastBegin_Window (sfwindow.c:495)
> ==29266== by 0x51235B5: PetscSFBcastBegin (sf.c:924)
> ==29266== by 0x401BD3: main (ex2.c:79)
> ==29266==
> Vec Object: 2 MPI processes
> type: mpi
> Process [0]
> 0
> 1
> Process [1]
> 1
> 0
> Vec Object: 2 MPI processes
> type: mpi
> Process [0]
> 10
> 11
> Process [1]
> 11
> 10
> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
> $
>
> On Thu, 9 Apr 2015, Satish Balay wrote:
>
>> here is a better valgrind trace..
>>
>> satish
>>
>> --------
>> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
>> $ /home/balay/petsc/arch-ompi/bin/mpiexec -n 2 valgrind --tool=memcheck -q --dsymutil=yes --num-callers=40 --track-origins=yes ./ex2 -sf_type window
>> PetscSF Object: 2 MPI processes
>> type: window
>> synchronization=FENCE sort=rank-order
>> [0] Number of roots=1, leaves=2, remote ranks=2
>> [0] 0 <- (0,0)
>> [0] 1 <- (1,0)
>> [1] Number of roots=1, leaves=2, remote ranks=2
>> [1] 0 <- (1,0)
>> [1] 1 <- (0,0)
>> ==14815== Invalid write of size 2
>> ==14815== at 0x4C2E36B: memcpy@@GLIBC_2.14 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==14815== by 0x8AFDABD: ompi_datatype_set_args (ompi_datatype_args.c:167)
>> ==14815== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>> ==14815== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>> ==14815== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>> ==14815== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>> ==14815== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>> ==14815== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>> ==14815== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>> ==14815== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>> ==14815== by 0xECCF0DD: ompi_request_complete (request.h:402)
>> ==14815== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>> ==14815== by 0xECCFF87: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:243)
>> ==14815== by 0xE68F875: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
>> ==14815== by 0xE690D82: mca_btl_vader_component_progress (btl_vader_component.c:695)
>> ==14815== by 0x9A9E9F2: opal_progress (opal_progress.c:187)
>> ==14815== by 0xECCA70A: opal_condition_wait (condition.h:78)
>> ==14815== by 0xECCA7F4: ompi_request_wait_completion (request.h:381)
>> ==14815== by 0xECCAF69: mca_pml_ob1_recv (pml_ob1_irecv.c:109)
>> ==14815== by 0xFD8938D: ompi_coll_tuned_reduce_intra_basic_linear (coll_tuned_reduce.c:677)
>> ==14815== by 0xFD79C26: ompi_coll_tuned_reduce_intra_dec_fixed (coll_tuned_decision_fixed.c:386)
>> ==14815== by 0xF0F3B91: mca_coll_basic_reduce_scatter_block_intra (coll_basic_reduce_scatter_block.c:96)
>> ==14815== by 0xF72BC58: ompi_osc_rdma_fence (osc_rdma_active_target.c:140)
>> ==14815== by 0x8B47078: PMPI_Win_fence (pwin_fence.c:59)
>> ==14815== by 0x5106D8F: PetscSFRestoreWindow (sfwindow.c:348)
>> ==14815== by 0x51092DA: PetscSFBcastEnd_Window (sfwindow.c:510)
>> ==14815== by 0x51303D6: PetscSFBcastEnd (sf.c:957)
>> ==14815== by 0x401DD3: main (ex2.c:81)
>> ==14815== Address 0x101c3b98 is 0 bytes after a block of size 72 alloc'd
>> ==14815== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==14815== by 0x8AFD755: ompi_datatype_set_args (ompi_datatype_args.c:123)
>> ==14815== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>> ==14815== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>> ==14815== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>> ==14815== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>> ==14815== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>> ==14815== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>> ==14815== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>> ==14815== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>> ==14815== by 0xECCF0DD: ompi_request_complete (request.h:402)
>> ==14815== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>> ==14815== by 0xECCFF87: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:243)
>> ==14815== by 0xE68F875: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
>> ==14815== by 0xE690D82: mca_btl_vader_component_progress (btl_vader_component.c:695)
>> ==14815== by 0x9A9E9F2: opal_progress (opal_progress.c:187)
>> ==14815== by 0xECCA70A: opal_condition_wait (condition.h:78)
>> ==14815== by 0xECCA7F4: ompi_request_wait_completion (request.h:381)
>> ==14815== by 0xECCAF69: mca_pml_ob1_recv (pml_ob1_irecv.c:109)
>> ==14815== by 0xFD8938D: ompi_coll_tuned_reduce_intra_basic_linear (coll_tuned_reduce.c:677)
>> ==14815== by 0xFD79C26: ompi_coll_tuned_reduce_intra_dec_fixed (coll_tuned_decision_fixed.c:386)
>> ==14815== by 0xF0F3B91: mca_coll_basic_reduce_scatter_block_intra (coll_basic_reduce_scatter_block.c:96)
>> ==14815== by 0xF72BC58: ompi_osc_rdma_fence (osc_rdma_active_target.c:140)
>> ==14815== by 0x8B47078: PMPI_Win_fence (pwin_fence.c:59)
>> ==14815== by 0x5106D8F: PetscSFRestoreWindow (sfwindow.c:348)
>> ==14815== by 0x51092DA: PetscSFBcastEnd_Window (sfwindow.c:510)
>> ==14815== by 0x51303D6: PetscSFBcastEnd (sf.c:957)
>> ==14815== by 0x401DD3: main (ex2.c:81)
>> ==14815==
>> ==14816== Invalid write of size 2
>> ==14816== at 0x4C2E36B: memcpy@@GLIBC_2.14 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==14816== by 0x8AFDABD: ompi_datatype_set_args (ompi_datatype_args.c:167)
>> ==14816== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>> ==14816== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>> ==14816== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>> ==14816== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>> ==14816== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>> ==14816== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>> ==14816== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>> ==14816== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>> ==14816== by 0xECCF0DD: ompi_request_complete (request.h:402)
>> ==14816== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>> ==14816== by 0xECCFF87: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:243)
>> ==14816== by 0xE68F875: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
>> ==14816== by 0xE690D82: mca_btl_vader_component_progress (btl_vader_component.c:695)
>> ==14816== by 0x9A9E9F2: opal_progress (opal_progress.c:187)
>> ==14816== by 0xECCA70A: opal_condition_wait (condition.h:78)
>> ==14816== by 0xECCA7F4: ompi_request_wait_completion (request.h:381)
>> ==14816== by 0xECCAF69: mca_pml_ob1_recv (pml_ob1_irecv.c:109)
>> ==14816== by 0xFD8D951: ompi_coll_tuned_scatter_intra_basic_linear (coll_tuned_scatter.c:231)
>> ==14816== by 0xFD7A66D: ompi_coll_tuned_scatter_intra_dec_fixed (coll_tuned_decision_fixed.c:769)
>> ==14816== by 0xF0F3BDB: mca_coll_basic_reduce_scatter_block_intra (coll_basic_reduce_scatter_block.c:102)
>> ==14816== by 0xF72BC58: ompi_osc_rdma_fence (osc_rdma_active_target.c:140)
>> ==14816== by 0x8B47078: PMPI_Win_fence (pwin_fence.c:59)
>> ==14816== by 0x5106D8F: PetscSFRestoreWindow (sfwindow.c:348)
>> ==14816== by 0x51092DA: PetscSFBcastEnd_Window (sfwindow.c:510)
>> ==14816== by 0x51303D6: PetscSFBcastEnd (sf.c:957)
>> ==14816== by 0x401DD3: main (ex2.c:81)
>> ==14816== Address 0x101bb398 is 0 bytes after a block of size 72 alloc'd
>> ==14816== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==14816== by 0x8AFD755: ompi_datatype_set_args (ompi_datatype_args.c:123)
>> ==14816== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>> ==14816== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>> ==14816== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>> ==14816== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>> ==14816== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>> ==14816== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>> ==14816== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>> ==14816== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>> ==14816== by 0xECCF0DD: ompi_request_complete (request.h:402)
>> ==14816== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>> ==14816== by 0xECCFF87: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:243)
>> ==14816== by 0xE68F875: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
>> ==14816== by 0xE690D82: mca_btl_vader_component_progress (btl_vader_component.c:695)
>> ==14816== by 0x9A9E9F2: opal_progress (opal_progress.c:187)
>> ==14816== by 0xECCA70A: opal_condition_wait (condition.h:78)
>> ==14816== by 0xECCA7F4: ompi_request_wait_completion (request.h:381)
>> ==14816== by 0xECCAF69: mca_pml_ob1_recv (pml_ob1_irecv.c:109)
>> ==14816== by 0xFD8D951: ompi_coll_tuned_scatter_intra_basic_linear (coll_tuned_scatter.c:231)
>> ==14816== by 0xFD7A66D: ompi_coll_tuned_scatter_intra_dec_fixed (coll_tuned_decision_fixed.c:769)
>> ==14816== by 0xF0F3BDB: mca_coll_basic_reduce_scatter_block_intra (coll_basic_reduce_scatter_block.c:102)
>> ==14816== by 0xF72BC58: ompi_osc_rdma_fence (osc_rdma_active_target.c:140)
>> ==14816== by 0x8B47078: PMPI_Win_fence (pwin_fence.c:59)
>> ==14816== by 0x5106D8F: PetscSFRestoreWindow (sfwindow.c:348)
>> ==14816== by 0x51092DA: PetscSFBcastEnd_Window (sfwindow.c:510)
>> ==14816== by 0x51303D6: PetscSFBcastEnd (sf.c:957)
>> ==14816== by 0x401DD3: main (ex2.c:81)
>> ==14816==
>> Vec Object: 2 MPI processes
>> type: mpi
>> Process [0]
>> 0
>> 1
>> Process [1]
>> 1
>> 0
>> Vec Object: 2 MPI processes
>> type: mpi
>> Process [0]
>> 10
>> 11
>> Process [1]
>> 11
>> 10
>> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
>> $
>>
>>
>> On Thu, 9 Apr 2015, Barry Smith wrote:
>>
>>>
>>> Satish,
>>>
>>> Why are you telling me :-). Tell the person who's been pushing this stuff into PETSc and he can debug it.
>>>
>>> Barry
>>>
>>> This is why "my part" of PETSc only uses MPI 1.1 :-)
>>>
>>>
>>>
>>>> On Apr 9, 2015, at 5:48 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, 9 Apr 2015, Barry Smith wrote:
>>>>
>>>>>
>>>>> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2015/04/08/examples_master_arch-linux-pkgs-opt_crank.log
>>>>
>>>>
>>>> The following test is hanging - perhaps --download-openmpi is the trigger.
>>>>
>>>>
>>>> petsc 14547 0.0 0.0 12312 1220 ? S 13:56 0:00 /bin/sh -c /sandbox/petsc/petsc.clone/arch-linux-pkgs-opt/bin/mpiexec -n 2 ./ex2 -sf_type window > ex2.tmp 2>&1; \? /usr/bin/diff -w output/ex2_window.out ex2.tmp || printf "/sandbox/petsc/petsc.clone/src/vec/is/sf/examples/tutorials\nPossible problem with ex2_window, diffs above\n=========================================\n"; \? /bin/rm -f -f ex2.tmp
>>>>
>>>>
>>>>
>>>> I can reproduce on my laptop [with the following trace].
>>>>
>>>> Satish
>>>>
>>>> ---------
>>>>
>>>> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
>>>> $ /home/balay/petsc/arch-ompi/bin/mpiexec -n 2 ./ex2 -sf_type window
>>>> PetscSF Object: 2 MPI processes
>>>> type: window
>>>> synchronization=FENCE sort=rank-order
>>>> [0] Number of roots=1, leaves=2, remote ranks=2
>>>> [0] 0 <- (0,0)
>>>> [0] 1 <- (1,0)
>>>> [1] Number of roots=1, leaves=2, remote ranks=2
>>>> [1] 0 <- (1,0)
>>>> [1] 1 <- (0,0)
>>>> *** Error in `./ex2': free(): invalid next size (fast): 0x0000000002395ed0 ***
>>>> [asterix:14290] *** Process received signal ***
>>>> [asterix:14290] Signal: Aborted (6)
>>>> [asterix:14290] Signal code: (-6)
>>>> ======= Backtrace: =========
>>>> /lib64/libc.so.6(+0x77d9e)[0x[asterix:14290] [ 0] /lib64/libpthread.so.0(+0x100d0)[0x7f331fac10d0]
>>>> [asterix:14290] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7f331f7288d7]
>>>> [asterix:14290] [ 2] /home/balay/petsc/arch-ompi/lib/libmpi.so.1(ompi_datatype_release_args/lib64/libc.so.6(abort+0x16a)[0x7f331f72a53a]
>>>> [asterix:14290] [ 3] /home/balay/petsc/arch-ompi/lib/libmpi.so.1(/lib64/libc.so.6(+0x77da3)[0x7f331f76bda3]
>>>> [asterix:14290] [ 4] +0x508e3)[0x7f9018c898e3]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x11773/lib64/libc.so.6(cfree+0x5b5)[0x7f331f7779f5]
>>>> [asterix:14290] [ 5] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x12ece)[0x7f900f73aece]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0x862a)[0x7f900ecdb62a]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0x8a15/home/balay/petsc/arch-ompi/lib/libmpi.so.1(ompi_datatype_release_args+0x12b)[0x7f331ff33627]
>>>> [asterix:14290] [ 6] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0xbac7)[0x7f900ecdeac7]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0xc0de)[0x7f900f7340de]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0xc4eb)[0x7f900f7344eb]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x2ed)[0x7f900f734f88]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_btl_vader.so(+0x3876)[0x7f9014009876]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_btl_vader.so(+0x4d83)[0x7f901400ad83]
>>>> /home/balay/petsc/arch-ompi/lib/libopen-pal.so.6(opal_progress+0xa2)[0x7f9017cca9f3]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x770b)[0x7f900f72f70b]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x77f5)[0x7f900f72f7f5]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x1c6)[0x7f900f72ff6a]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_scatter_intra_basic_linear+0x76)[0x7f900e689952]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_scatter_intra_dec_fixed+0x112)[0x7f900e67666e]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_basic.so(mca_coll_basic_reduce_scatter_block_intra+0x188)[0x7f900f319bdc]
>>>> /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_fence+0x125)[0x7f900ecdfc59]
>>>> /home/balay/petsc/arch-ompi/lib/libmpi.so.1(MPI_Win_fence+0x116)[0x7f9018cd1079]
>>>> /home/balay/petsc/arch-ompi/lib/libmpi.so.1(+0x508e3)[0x7f331ff348e3]
>>>> [asterix:14290] [ 7] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x11773)[0x7f3316910773]
>>>> [asterix:14290] [ 8] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x12ece)[0x7f3316911ece]
>>>> [asterix:14290] [ 9] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0x862a)[0x7f3315eb262a]
>>>> [asterix:14290] [10] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0x8a15)[0x7f3315eb2a15]
>>>> [asterix:14290] [11] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0xbac7)[0x7f3315eb5ac7]
>>>> [asterix:14290] [12] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0xc0de)[0x7f331690b0de]
>>>> [asterix:14290] [13] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0xc4eb)[0x7f331690b4eb]
>>>> [asterix:14290] [14] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x2ed)[0x7f331690bf88]
>>>> [asterix:14290] [15] /home/balay/petsc/arch-ompi/lib/openmpi/mca_btl_vader.so(+0x3876)[0x7f3316f4a876]
>>>> [asterix:14290] [16] /home/balay/petsc/arch-ompi/lib/openmpi/mca_btl_vader.so(+0x4d83)[0x7f3316f4bd83]
>>>> [asterix:14290] [17] /home/balay/petsc/arch-ompi/lib/libopen-pal.so.6(opal_progress+0xa2)[0x7f331ef759f3]
>>>> [asterix:14290] [18] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x770b)[0x7f331690670b]
>>>> [asterix:14290] [19] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x77f5)[0x7f33169067f5]
>>>> [asterix:14290] [20] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x1c6)[0x7f3316906f6a]
>>>> [asterix:14290] [21] /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_basic_linear+0x1cb)[0x7f331585c38e]
>>>> [asterix:14290] [22] /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x1a6)[0x7f331584cc27]
>>>> [asterix:14290] [23] /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_basic.so(mca_coll_basic_reduce_scatter_block_intra+0x13e)[0x7f33164f0b92]
>>>> [asterix:14290] [24] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_fence+0x125)[0x7f3315eb6c59]
>>>> [asterix:14290] [25] /home/balay/petsc/arch-ompi/lib/libmpi.so.1(MPI_Win_fence+0x116)[0x7f331ff7c079]
>>>> [asterix:14290] [26] /home/balay/petsc/arch-ompi/lib/libpetsc.so.3.05(+0x2d1d90)[0x7f3322855d90]
>>>> [asterix:14290] [27] /home/balay/petsc/arch-ompi/lib/libpetsc.so.3.05(PetscSFBcastEnd_Window+0x218)[0x7f33228582db]
>>>> [asterix:14290] [28] /home/balay/petsc/arch-ompi/lib/libpetsc.so.3.05(PetscSFBcastEnd+0x4eb)[0x7f332287f3d7]
>>>> [asterix:14290] [29] ./ex2[0x401dd4]
>>>> [asterix:14290] *** End of error message ***
>>>> [asterix:14291] *** Process received signal ***
>>>> [asterix:14291] Signal: Aborted (6)
>>>> [asterix:14291] Signal code: (-6)
>>>> [asterix:14291] [ 0] /lib64/libpthread.so.0(+0x100d0)[0x7f90188160d0]
>>>> [asterix:14291] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7f901847d8d7]
>>>> [asterix:14291] [ 2] /lib64/libc.so.6(abort+0x16a)[0x7f901847f53a]
>>>> [asterix:14291] [ 3] /lib64/libc.so.6(+0x77da3)[0x7f90184c0da3]
>>>> [asterix:14291] [ 4] /lib64/libc.so.6(cfree+0x5b5)[0x7f90184cc9f5]
>>>> [asterix:14291] [ 5] /home/balay/petsc/arch-ompi/lib/libmpi.so.1(ompi_datatype_release_args+0x12b)[0x7f9018c88627]
>>>> [asterix:14291] [ 6] /home/balay/petsc/arch-ompi/lib/libmpi.so.1(+0x508e3)[0x7f9018c898e3]
>>>> [asterix:14291] [ 7] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x11773)[0x7f900f739773]
>>>> [asterix:14291] [ 8] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x12ece)[0x7f900f73aece]
>>>> [asterix:14291] [ 9] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0x862a)[0x7f900ecdb62a]
>>>> [asterix:14291] [10] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0x8a15)[0x7f900ecdba15]
>>>> [asterix:14291] [11] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(+0xbac7)[0x7f900ecdeac7]
>>>> [asterix:14291] [12] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0xc0de)[0x7f900f7340de]
>>>> [asterix:14291] [13] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0xc4eb)[0x7f900f7344eb]
>>>> [asterix:14291] [14] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x2ed)[0x7f900f734f88]
>>>> [asterix:14291] [15] /home/balay/petsc/arch-ompi/lib/openmpi/mca_btl_vader.so(+0x3876)[0x7f9014009876]
>>>> [asterix:14291] [16] /home/balay/petsc/arch-ompi/lib/openmpi/mca_btl_vader.so(+0x4d83)[0x7f901400ad83]
>>>> [asterix:14291] [17] /home/balay/petsc/arch-ompi/lib/libopen-pal.so.6(opal_progress+0xa2)[0x7f9017cca9f3]
>>>> [asterix:14291] [18] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x770b)[0x7f900f72f70b]
>>>> [asterix:14291] [19] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(+0x77f5)[0x7f900f72f7f5]
>>>> [asterix:14291] [20] /home/balay/petsc/arch-ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x1c6)[0x7f900f72ff6a]
>>>> [asterix:14291] [21] /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_scatter_intra_basic_linear+0x76)[0x7f900e689952]
>>>> [asterix:14291] [22] /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_scatter_intra_dec_fixed+0x112)[0x7f900e67666e]
>>>> [asterix:14291] [23] /home/balay/petsc/arch-ompi/lib/openmpi/mca_coll_basic.so(mca_coll_basic_reduce_scatter_block_intra+0x188)[0x7f900f319bdc]
>>>> [asterix:14291] [24] /home/balay/petsc/arch-ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_fence+0x125)[0x7f900ecdfc59]
>>>> [asterix:14291] [25] /home/balay/petsc/arch-ompi/lib/libmpi.so.1(MPI_Win_fence+0x116)[0x7f9018cd1079]
>>>> [asterix:14291] [26] /home/balay/petsc/arch-ompi/lib/libpetsc.so.3.05(+0x2d1d90)[0x7f901b5aad90]
>>>> [asterix:14291] [27] /home/balay/petsc/arch-ompi/lib/libpetsc.so.3.05(PetscSFBcastEnd_Window+0x218)[0x7f901b5ad2db]
>>>> [asterix:14291] [28] /home/balay/petsc/arch-ompi/lib/libpetsc.so.3.05(PetscSFBcastEnd+0x4eb)[0x7f901b5d43d7]
>>>> [asterix:14291] [29] ./ex2[0x401dd4]
>>>> [asterix:14291] *** End of error message ***
>>>> --------------------------------------------------------------------------
>>>> mpiexec noticed that process rank 0 with PID 14290 on node asterix exited on signal 6 (Aborted).
>>>> --------------------------------------------------------------------------
>>>> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
>>>> $ /home/balay/petsc/arch-ompi/bin/mpiexec -n 2 valgrind --tool=memcheck -q ./ex2 -sf_type window
>>>> PetscSF Object: 2 MPI processes
>>>> type: window
>>>> synchronization=FENCE sort=rank-order
>>>> [0] Number of roots=1, leaves=2, remote ranks=2
>>>> [0] 0 <- (0,0)
>>>> [0] 1 <- (1,0)
>>>> [1] Number of roots=1, leaves=2, remote ranks=2
>>>> [1] 0 <- (1,0)
>>>> [1] 1 <- (0,0)
>>>> ==14349== Invalid write of size 2
>>>> ==14349== at 0x4C2E36B: memcpy@@GLIBC_2.14 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==14349== by 0x8AFDABD: ompi_datatype_set_args (ompi_datatype_args.c:167)
>>>> ==14349== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>>>> ==14349== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>>>> ==14349== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>>>> ==14349== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>>>> ==14349== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>>>> ==14349== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>>>> ==14349== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>>>> ==14349== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>>>> ==14349== by 0xECCF0DD: ompi_request_complete (request.h:402)
>>>> ==14349== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>>>> ==14349== Address 0x101bf188 is 0 bytes after a block of size 72 alloc'd
>>>> ==14349== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==14349== by 0x8AFD755: ompi_datatype_set_args (ompi_datatype_args.c:123)
>>>> ==14349== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>>>> ==14349== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>>>> ==14349== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>>>> ==14349== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>>>> ==14349== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>>>> ==14349== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>>>> ==14349== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>>>> ==14349== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>>>> ==14349== by 0xECCF0DD: ompi_request_complete (request.h:402)
>>>> ==14349== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>>>> ==14349==
>>>> ==14348== Invalid write of size 2
>>>> ==14348== at 0x4C2E36B: memcpy@@GLIBC_2.14 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==14348== by 0x8AFDABD: ompi_datatype_set_args (ompi_datatype_args.c:167)
>>>> ==14348== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>>>> ==14348== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>>>> ==14348== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>>>> ==14348== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>>>> ==14348== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>>>> ==14348== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>>>> ==14348== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>>>> ==14348== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>>>> ==14348== by 0xECCF0DD: ompi_request_complete (request.h:402)
>>>> ==14348== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>>>> ==14348== Address 0x101c71b8 is 0 bytes after a block of size 72 alloc'd
>>>> ==14348== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==14348== by 0x8AFD755: ompi_datatype_set_args (ompi_datatype_args.c:123)
>>>> ==14348== by 0x8AFF0F3: __ompi_datatype_create_from_args (ompi_datatype_args.c:718)
>>>> ==14348== by 0x8AFEC0E: __ompi_datatype_create_from_packed_description (ompi_datatype_args.c:649)
>>>> ==14348== by 0x8AFF5D6: ompi_datatype_create_from_packed_description (ompi_datatype_args.c:788)
>>>> ==14348== by 0xF727F0E: ompi_osc_base_datatype_create (osc_base_obj_convert.h:52)
>>>> ==14348== by 0xF728424: datatype_create (osc_rdma_data_move.c:333)
>>>> ==14348== by 0xF72887D: process_get (osc_rdma_data_move.c:536)
>>>> ==14348== by 0xF72A856: process_frag (osc_rdma_data_move.c:1593)
>>>> ==14348== by 0xF72AA35: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
>>>> ==14348== by 0xECCF0DD: ompi_request_complete (request.h:402)
>>>> ==14348== by 0xECCF4EA: recv_request_pml_complete (pml_ob1_recvreq.h:181)
>>>> ==14348==
>>>> Vec Object: 2 MPI processes
>>>> type: mpi
>>>> Process [0]
>>>> 0
>>>> 1
>>>> Process [1]
>>>> 1
>>>> 0
>>>> Vec Object: 2 MPI processes
>>>> type: mpi
>>>> Process [0]
>>>> 10
>>>> 11
>>>> Process [1]
>>>> 11
>>>> 10
>>>> balay at asterix /home/balay/petsc/src/vec/is/sf/examples/tutorials (master=)
>>>> $
>>>
>>>
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150410/5c3ec8a8/attachment.sig>
More information about the petsc-dev
mailing list