[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90

Matthew Knepley knepley at gmail.com
Sat Apr 19 05:48:13 CDT 2014


On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <zonexo at gmail.com> wrote:

> On 19/4/2014 1:17 PM, Barry Smith wrote:
>
>> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>>  On 19/4/2014 12:10 PM, Barry Smith wrote:
>>>
>>>> On Apr 18, 2014, at 9:57 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>
>>>>  On 19/4/2014 3:53 AM, Barry Smith wrote:
>>>>>
>>>>>>    Hmm,
>>>>>>
>>>>>>        Interface DMDAVecGetArrayF90
>>>>>>          Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
>>>>>>            USE_DM_HIDE
>>>>>>            DM_HIDE da1
>>>>>>            VEC_HIDE v
>>>>>>            PetscScalar,pointer :: d1(:,:,:)
>>>>>>            PetscErrorCode ierr
>>>>>>          End Subroutine
>>>>>>
>>>>>>     So the d1 is a F90 POINTER. But your subroutine seems to be
>>>>>> treating it as a “plain old Fortran array”?
>>>>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>>>>>>
>>>>> Hi,
>>>
>>> So d1 is a pointer, and it's different if I declare it as "plain old
>>> Fortran array"? Because I declare it as a Fortran array and it works w/o
>>> any problem if I only call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
>>> with "u".
>>>
>>> But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u",
>>> "v" and "w", error starts to happen. I wonder why...
>>>
>>> Also, supposed I call:
>>>
>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>>     call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>>     call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> u_array ....
>>>
>>> v_array .... etc
>>>
>>> Now to restore the array, does it matter the sequence they are restored?
>>>
>>     No it should not matter. If it matters that is a sign that memory has
>> been written to incorrectly earlier in the code.
>>
>>  Hi,
>
> Hmm, I have been getting different results on different intel compilers.
> I'm not sure if MPI played a part but I'm only using a single processor. In
> the debug mode, things run without problem. In optimized mode, in some
> cases, the code aborts even doing simple initialization:
>
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
>     call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
>     call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
>     call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>
>     u_array = 0.d0
>
>     v_array = 0.d0
>
>     w_array = 0.d0
>
>     p_array = 0.d0
>
>
>     call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>
>
>     call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
>     call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
>     call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr),
> giving segmentation error. But other version of intel compiler passes thru
> this part w/o error. Since the response is different among different
> compilers, is this PETSc or intel 's bug? Or mvapich or openmpi?


We do this is a bunch of examples. Can you reproduce this different
behavior in src/dm/examples/tutorials/ex11f90.F?

   Matt


>  As in w, then v and u?
>>>
>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> thanks
>>>
>>>>     Note also that the beginning and end indices of the u,v,w, are
>>>>>> different for each process see for example
>>>>>> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/
>>>>>> tutorials/ex11f90.F  (and they do not start at 1). This is how to
>>>>>> get the loop bounds.
>>>>>>
>>>>> Hi,
>>>>>
>>>>> In my case, I fixed the u,v,w such that their indices are the same. I
>>>>> also checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem
>>>>> lies in my subroutine treating it as a “plain old Fortran array”.
>>>>>
>>>>> If I declare them as pointers, their indices follow the C 0 start
>>>>> convention, is that so?
>>>>>
>>>>     Not really. It is that in each process you need to access them from
>>>> the indices indicated by DMDAGetCorners() for global vectors and
>>>> DMDAGetGhostCorners() for local vectors.  So really C or Fortran doesn’t
>>>> make any difference.
>>>>
>>>>
>>>>  So my problem now is that in my old MPI code, the u(i,j,k) follow the
>>>>> Fortran 1 start convention. Is there some way to manipulate such that I do
>>>>> not have to change my u(i,j,k) to u(i-1,j-1,k-1)?
>>>>>
>>>>    If you code wishes to access them with indices plus one from the
>>>> values returned by DMDAGetCorners() for global vectors and
>>>> DMDAGetGhostCorners() for local vectors then you need to manually subtract
>>>> off the 1.
>>>>
>>>>    Barry
>>>>
>>>>  Thanks.
>>>>>
>>>>>>    Barry
>>>>>>
>>>>>> On Apr 18, 2014, at 10:58 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>>  Hi,
>>>>>>>
>>>>>>> I tried to pinpoint the problem. I reduced my job size and hence I
>>>>>>> can run on 1 processor. Tried using valgrind but perhaps I'm using the
>>>>>>> optimized version, it didn't catch the error, besides saying "Segmentation
>>>>>>> fault (core dumped)"
>>>>>>>
>>>>>>> However, by re-writing my code, I found out a few things:
>>>>>>>
>>>>>>> 1. if I write my code this way:
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> u_array = ....
>>>>>>>
>>>>>>> v_array = ....
>>>>>>>
>>>>>>> w_array = ....
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> The code runs fine.
>>>>>>>
>>>>>>> 2. if I write my code this way:
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> call uvw_array_change(u_array,v_array,w_array) -> this subroutine
>>>>>>> does the same modification as the above.
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error
>>>>>>>
>>>>>>> where the subroutine is:
>>>>>>>
>>>>>>> subroutine uvw_array_change(u,v,w)
>>>>>>>
>>>>>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>>>>>>>
>>>>>>> u ...
>>>>>>> v...
>>>>>>> w ...
>>>>>>>
>>>>>>> end subroutine uvw_array_change.
>>>>>>>
>>>>>>> The above will give an error at :
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> 3. Same as above, except I change the order of the last 3 lines to:
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> So they are now in reversed order. Now it works.
>>>>>>>
>>>>>>> 4. Same as 2 or 3, except the subroutine is changed to :
>>>>>>>
>>>>>>> subroutine uvw_array_change(u,v,w)
>>>>>>>
>>>>>>> real(8), intent(inout) :: u(start_indices(1):end_
>>>>>>> indices(1),start_indices(2):end_indices(2),start_indices(
>>>>>>> 3):end_indices(3))
>>>>>>>
>>>>>>> real(8), intent(inout) :: v(start_indices(1):end_
>>>>>>> indices(1),start_indices(2):end_indices(2),start_indices(
>>>>>>> 3):end_indices(3))
>>>>>>>
>>>>>>> real(8), intent(inout) :: w(start_indices(1):end_
>>>>>>> indices(1),start_indices(2):end_indices(2),start_indices(
>>>>>>> 3):end_indices(3))
>>>>>>>
>>>>>>> u ...
>>>>>>> v...
>>>>>>> w ...
>>>>>>>
>>>>>>> end subroutine uvw_array_change.
>>>>>>>
>>>>>>> The start_indices and end_indices are simply to shift the 0 indices
>>>>>>> of C convention to that of the 1 indices of the Fortran convention. This is
>>>>>>> necessary in my case because most of my codes start array counting at 1,
>>>>>>> hence the "trick".
>>>>>>>
>>>>>>> However, now no matter which order of the DMDAVecRestoreArrayF90 (as
>>>>>>> in 2 or 3), error will occur at "call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>> "
>>>>>>>
>>>>>>> So did I violate and cause memory corruption due to the trick above?
>>>>>>> But I can't think of any way other than the "trick" to continue using the 1
>>>>>>> indices convention.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>>
>>>>>>> TAY wee-beng
>>>>>>>
>>>>>>> On 15/4/2014 8:00 PM, Barry Smith wrote:
>>>>>>>
>>>>>>>>    Try running under valgrind http://www.mcs.anl.gov/petsc/
>>>>>>>> documentation/faq.html#valgrind
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>
>>>>>>>>  Hi Barry,
>>>>>>>>>
>>>>>>>>> As I mentioned earlier, the code works fine in PETSc debug mode
>>>>>>>>> but fails in non-debug mode.
>>>>>>>>>
>>>>>>>>> I have attached my code.
>>>>>>>>>
>>>>>>>>> Thank you
>>>>>>>>>
>>>>>>>>> Yours sincerely,
>>>>>>>>>
>>>>>>>>> TAY wee-beng
>>>>>>>>>
>>>>>>>>> On 15/4/2014 2:26 AM, Barry Smith wrote:
>>>>>>>>>
>>>>>>>>>>    Please send the code that creates da_w and the declarations of
>>>>>>>>>> w_array
>>>>>>>>>>
>>>>>>>>>>    Barry
>>>>>>>>>>
>>>>>>>>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng
>>>>>>>>>> <zonexo at gmail.com>
>>>>>>>>>>   wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> I'm not too sure how to do it. I'm running mpi. So I run:
>>>>>>>>>>>
>>>>>>>>>>>   mpirun -n 4 ./a.out -start_in_debugger
>>>>>>>>>>>
>>>>>>>>>>> I got the msg below. Before the gdb windows appear (thru x11),
>>>>>>>>>>> the program aborts.
>>>>>>>>>>>
>>>>>>>>>>> Also I tried running in another cluster and it worked. Also
>>>>>>>>>>> tried in the current cluster in debug mode and it worked too.
>>>>>>>>>>>
>>>>>>>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> An MPI process has executed an operation involving a call to the
>>>>>>>>>>> "fork()" system call to create a child process.  Open MPI is
>>>>>>>>>>> currently
>>>>>>>>>>> operating in a condition that could result in memory corruption
>>>>>>>>>>> or
>>>>>>>>>>> other system errors; your MPI job may hang, crash, or produce
>>>>>>>>>>> silent
>>>>>>>>>>> data corruption.  The use of fork() (or system() or other calls
>>>>>>>>>>> that
>>>>>>>>>>> create child processes) is strongly discouraged.
>>>>>>>>>>>
>>>>>>>>>>> The process that invoked fork was:
>>>>>>>>>>>
>>>>>>>>>>>    Local host:          n12-76 (PID 20235)
>>>>>>>>>>>    MPI_COMM_WORLD rank: 2
>>>>>>>>>>>
>>>>>>>>>>> If you are *absolutely sure* that your application will
>>>>>>>>>>> successfully
>>>>>>>>>>> and correctly survive a call to fork(), you may disable this
>>>>>>>>>>> warning
>>>>>>>>>>> by setting the mpi_warn_on_fork MCA parameter to 0.
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [n12-76:20232] 3 more processes have sent help message
>>>>>>>>>>> help-mpi-runtime.txt / mpi_init:warn-fork
>>>>>>>>>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0
>>>>>>>>>>> to see all help / error messages
>>>>>>>>>>>
>>>>>>>>>>> ....
>>>>>>>>>>>
>>>>>>>>>>>   1
>>>>>>>>>>> [1]PETSC ERROR: ------------------------------
>>>>>>>>>>> ------------------------------------------
>>>>>>>>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>>>>>>>> Violation, probably memory access out of range
>>>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>>> -on_error_attach_debugger
>>>>>>>>>>> [1]PETSC ERROR: or see
>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#
>>>>>>>>>>> valgrind[1]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>>>   on GNU/linux and Apple Mac OS X to find memory corruption
>>>>>>>>>>> errors
>>>>>>>>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile,
>>>>>>>>>>> link, and run
>>>>>>>>>>> [1]PETSC ERROR: to get more information on the crash.
>>>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in unknown
>>>>>>>>>>> directory unknown file (null)
>>>>>>>>>>> [3]PETSC ERROR: ------------------------------
>>>>>>>>>>> ------------------------------------------
>>>>>>>>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>>>>>>>> Violation, probably memory access out of range
>>>>>>>>>>> [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>>> -on_error_attach_debugger
>>>>>>>>>>> [3]PETSC ERROR: or see
>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#
>>>>>>>>>>> valgrind[3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>>>   on GNU/linux and Apple Mac OS X to find memory corruption
>>>>>>>>>>> errors
>>>>>>>>>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile,
>>>>>>>>>>> link, and run
>>>>>>>>>>> [3]PETSC ERROR: to get more information on the crash.
>>>>>>>>>>> [3]PETSC ERROR: User provided function() line 0 in unknown
>>>>>>>>>>> directory unknown file (null)
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>> Thank you.
>>>>>>>>>>>
>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>
>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>
>>>>>>>>>>> On 14/4/2014 9:05 PM, Barry Smith wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Because IO doesn’t always get flushed immediately it may not
>>>>>>>>>>>> be hanging at this point.  It is better to use the option
>>>>>>>>>>>> -start_in_debugger then type cont in each debugger window and then when you
>>>>>>>>>>>> think it is “hanging” do a control C in each debugger window and type where
>>>>>>>>>>>> to see where each process is you can also look around in the debugger at
>>>>>>>>>>>> variables to see why it is “hanging” at that point.
>>>>>>>>>>>>
>>>>>>>>>>>>     Barry
>>>>>>>>>>>>
>>>>>>>>>>>>    This routines don’t have any parallel communication in them
>>>>>>>>>>>> so are unlikely to hang.
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng
>>>>>>>>>>>>
>>>>>>>>>>>> <zonexo at gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> My code hangs and I added in mpi_barrier and print to catch
>>>>>>>>>>>>> the bug. I found that it hangs after printing "7". Is it because I'm doing
>>>>>>>>>>>>> something wrong? I need to access the u,v,w array so I use
>>>>>>>>>>>>> DMDAVecGetArrayF90. After access, I use DMDAVecRestoreArrayF90.
>>>>>>>>>>>>>
>>>>>>>>>>>>>          call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>>>>>>>          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0)
>>>>>>>>>>>>> print *,"3"
>>>>>>>>>>>>>          call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>>>>>>>          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0)
>>>>>>>>>>>>> print *,"4"
>>>>>>>>>>>>>          call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>>>>>>>          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0)
>>>>>>>>>>>>> print *,"5"
>>>>>>>>>>>>>          call I_IIB_uv_initial_1st_dm(I_
>>>>>>>>>>>>> cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_
>>>>>>>>>>>>> v1,I_cell_w1,u_array,v_array,w_array)
>>>>>>>>>>>>>          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0)
>>>>>>>>>>>>> print *,"6"
>>>>>>>>>>>>>          call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>>>>>>>  !must be in reverse order
>>>>>>>>>>>>>          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0)
>>>>>>>>>>>>> print *,"7"
>>>>>>>>>>>>>          call DMDAVecRestoreArrayF90(da_v,v_
>>>>>>>>>>>>> local,v_array,ierr)
>>>>>>>>>>>>>          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0)
>>>>>>>>>>>>> print *,"8"
>>>>>>>>>>>>>          call DMDAVecRestoreArrayF90(da_u,u_
>>>>>>>>>>>>> local,u_array,ierr)
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>>>
>>>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  <code.txt>
>>>>>>>>>
>>>>>>>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140419/551fd270/attachment-0001.html>


More information about the petsc-users mailing list