[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
Matthew Knepley
knepley at gmail.com
Sat Apr 19 05:48:13 CDT 2014
On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <zonexo at gmail.com> wrote:
> On 19/4/2014 1:17 PM, Barry Smith wrote:
>
>> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>> On 19/4/2014 12:10 PM, Barry Smith wrote:
>>>
>>>> On Apr 18, 2014, at 9:57 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>
>>>> On 19/4/2014 3:53 AM, Barry Smith wrote:
>>>>>
>>>>>> Hmm,
>>>>>>
>>>>>> Interface DMDAVecGetArrayF90
>>>>>> Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
>>>>>> USE_DM_HIDE
>>>>>> DM_HIDE da1
>>>>>> VEC_HIDE v
>>>>>> PetscScalar,pointer :: d1(:,:,:)
>>>>>> PetscErrorCode ierr
>>>>>> End Subroutine
>>>>>>
>>>>>> So the d1 is a F90 POINTER. But your subroutine seems to be
>>>>>> treating it as a “plain old Fortran array”?
>>>>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>>>>>>
>>>>> Hi,
>>>
>>> So d1 is a pointer, and it's different if I declare it as "plain old
>>> Fortran array"? Because I declare it as a Fortran array and it works w/o
>>> any problem if I only call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
>>> with "u".
>>>
>>> But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u",
>>> "v" and "w", error starts to happen. I wonder why...
>>>
>>> Also, supposed I call:
>>>
>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> u_array ....
>>>
>>> v_array .... etc
>>>
>>> Now to restore the array, does it matter the sequence they are restored?
>>>
>> No it should not matter. If it matters that is a sign that memory has
>> been written to incorrectly earlier in the code.
>>
>> Hi,
>
> Hmm, I have been getting different results on different intel compilers.
> I'm not sure if MPI played a part but I'm only using a single processor. In
> the debug mode, things run without problem. In optimized mode, in some
> cases, the code aborts even doing simple initialization:
>
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>
> u_array = 0.d0
>
> v_array = 0.d0
>
> w_array = 0.d0
>
> p_array = 0.d0
>
>
> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr),
> giving segmentation error. But other version of intel compiler passes thru
> this part w/o error. Since the response is different among different
> compilers, is this PETSc or intel 's bug? Or mvapich or openmpi?
We do this is a bunch of examples. Can you reproduce this different
behavior in src/dm/examples/tutorials/ex11f90.F?
Matt
> As in w, then v and u?
>>>
>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> thanks
>>>
>>>> Note also that the beginning and end indices of the u,v,w, are
>>>>>> different for each process see for example
>>>>>> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/
>>>>>> tutorials/ex11f90.F (and they do not start at 1). This is how to
>>>>>> get the loop bounds.
>>>>>>
>>>>> Hi,
>>>>>
>>>>> In my case, I fixed the u,v,w such that their indices are the same. I
>>>>> also checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem
>>>>> lies in my subroutine treating it as a “plain old Fortran array”.
>>>>>
>>>>> If I declare them as pointers, their indices follow the C 0 start
>>>>> convention, is that so?
>>>>>
>>>> Not really. It is that in each process you need to access them from
>>>> the indices indicated by DMDAGetCorners() for global vectors and
>>>> DMDAGetGhostCorners() for local vectors. So really C or Fortran doesn’t
>>>> make any difference.
>>>>
>>>>
>>>> So my problem now is that in my old MPI code, the u(i,j,k) follow the
>>>>> Fortran 1 start convention. Is there some way to manipulate such that I do
>>>>> not have to change my u(i,j,k) to u(i-1,j-1,k-1)?
>>>>>
>>>> If you code wishes to access them with indices plus one from the
>>>> values returned by DMDAGetCorners() for global vectors and
>>>> DMDAGetGhostCorners() for local vectors then you need to manually subtract
>>>> off the 1.
>>>>
>>>> Barry
>>>>
>>>> Thanks.
>>>>>
>>>>>> Barry
>>>>>>
>>>>>> On Apr 18, 2014, at 10:58 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> I tried to pinpoint the problem. I reduced my job size and hence I
>>>>>>> can run on 1 processor. Tried using valgrind but perhaps I'm using the
>>>>>>> optimized version, it didn't catch the error, besides saying "Segmentation
>>>>>>> fault (core dumped)"
>>>>>>>
>>>>>>> However, by re-writing my code, I found out a few things:
>>>>>>>
>>>>>>> 1. if I write my code this way:
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> u_array = ....
>>>>>>>
>>>>>>> v_array = ....
>>>>>>>
>>>>>>> w_array = ....
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> The code runs fine.
>>>>>>>
>>>>>>> 2. if I write my code this way:
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> call uvw_array_change(u_array,v_array,w_array) -> this subroutine
>>>>>>> does the same modification as the above.
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error
>>>>>>>
>>>>>>> where the subroutine is:
>>>>>>>
>>>>>>> subroutine uvw_array_change(u,v,w)
>>>>>>>
>>>>>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>>>>>>>
>>>>>>> u ...
>>>>>>> v...
>>>>>>> w ...
>>>>>>>
>>>>>>> end subroutine uvw_array_change.
>>>>>>>
>>>>>>> The above will give an error at :
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> 3. Same as above, except I change the order of the last 3 lines to:
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>
>>>>>>> So they are now in reversed order. Now it works.
>>>>>>>
>>>>>>> 4. Same as 2 or 3, except the subroutine is changed to :
>>>>>>>
>>>>>>> subroutine uvw_array_change(u,v,w)
>>>>>>>
>>>>>>> real(8), intent(inout) :: u(start_indices(1):end_
>>>>>>> indices(1),start_indices(2):end_indices(2),start_indices(
>>>>>>> 3):end_indices(3))
>>>>>>>
>>>>>>> real(8), intent(inout) :: v(start_indices(1):end_
>>>>>>> indices(1),start_indices(2):end_indices(2),start_indices(
>>>>>>> 3):end_indices(3))
>>>>>>>
>>>>>>> real(8), intent(inout) :: w(start_indices(1):end_
>>>>>>> indices(1),start_indices(2):end_indices(2),start_indices(
>>>>>>> 3):end_indices(3))
>>>>>>>
>>>>>>> u ...
>>>>>>> v...
>>>>>>> w ...
>>>>>>>
>>>>>>> end subroutine uvw_array_change.
>>>>>>>
>>>>>>> The start_indices and end_indices are simply to shift the 0 indices
>>>>>>> of C convention to that of the 1 indices of the Fortran convention. This is
>>>>>>> necessary in my case because most of my codes start array counting at 1,
>>>>>>> hence the "trick".
>>>>>>>
>>>>>>> However, now no matter which order of the DMDAVecRestoreArrayF90 (as
>>>>>>> in 2 or 3), error will occur at "call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>> "
>>>>>>>
>>>>>>> So did I violate and cause memory corruption due to the trick above?
>>>>>>> But I can't think of any way other than the "trick" to continue using the 1
>>>>>>> indices convention.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>>
>>>>>>> TAY wee-beng
>>>>>>>
>>>>>>> On 15/4/2014 8:00 PM, Barry Smith wrote:
>>>>>>>
>>>>>>>> Try running under valgrind http://www.mcs.anl.gov/petsc/
>>>>>>>> documentation/faq.html#valgrind
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Barry,
>>>>>>>>>
>>>>>>>>> As I mentioned earlier, the code works fine in PETSc debug mode
>>>>>>>>> but fails in non-debug mode.
>>>>>>>>>
>>>>>>>>> I have attached my code.
>>>>>>>>>
>>>>>>>>> Thank you
>>>>>>>>>
>>>>>>>>> Yours sincerely,
>>>>>>>>>
>>>>>>>>> TAY wee-beng
>>>>>>>>>
>>>>>>>>> On 15/4/2014 2:26 AM, Barry Smith wrote:
>>>>>>>>>
>>>>>>>>>> Please send the code that creates da_w and the declarations of
>>>>>>>>>> w_array
>>>>>>>>>>
>>>>>>>>>> Barry
>>>>>>>>>>
>>>>>>>>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng
>>>>>>>>>> <zonexo at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> I'm not too sure how to do it. I'm running mpi. So I run:
>>>>>>>>>>>
>>>>>>>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>>>>>>>>
>>>>>>>>>>> I got the msg below. Before the gdb windows appear (thru x11),
>>>>>>>>>>> the program aborts.
>>>>>>>>>>>
>>>>>>>>>>> Also I tried running in another cluster and it worked. Also
>>>>>>>>>>> tried in the current cluster in debug mode and it worked too.
>>>>>>>>>>>
>>>>>>>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> An MPI process has executed an operation involving a call to the
>>>>>>>>>>> "fork()" system call to create a child process. Open MPI is
>>>>>>>>>>> currently
>>>>>>>>>>> operating in a condition that could result in memory corruption
>>>>>>>>>>> or
>>>>>>>>>>> other system errors; your MPI job may hang, crash, or produce
>>>>>>>>>>> silent
>>>>>>>>>>> data corruption. The use of fork() (or system() or other calls
>>>>>>>>>>> that
>>>>>>>>>>> create child processes) is strongly discouraged.
>>>>>>>>>>>
>>>>>>>>>>> The process that invoked fork was:
>>>>>>>>>>>
>>>>>>>>>>> Local host: n12-76 (PID 20235)
>>>>>>>>>>> MPI_COMM_WORLD rank: 2
>>>>>>>>>>>
>>>>>>>>>>> If you are *absolutely sure* that your application will
>>>>>>>>>>> successfully
>>>>>>>>>>> and correctly survive a call to fork(), you may disable this
>>>>>>>>>>> warning
>>>>>>>>>>> by setting the mpi_warn_on_fork MCA parameter to 0.
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on
>>>>>>>>>>> display localhost:50.0 on machine n12-76
>>>>>>>>>>> [n12-76:20232] 3 more processes have sent help message
>>>>>>>>>>> help-mpi-runtime.txt / mpi_init:warn-fork
>>>>>>>>>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0
>>>>>>>>>>> to see all help / error messages
>>>>>>>>>>>
>>>>>>>>>>> ....
>>>>>>>>>>>
>>>>>>>>>>> 1
>>>>>>>>>>> [1]PETSC ERROR: ------------------------------
>>>>>>>>>>> ------------------------------------------
>>>>>>>>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>>>>>>>> Violation, probably memory access out of range
>>>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>>> -on_error_attach_debugger
>>>>>>>>>>> [1]PETSC ERROR: or see
>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#
>>>>>>>>>>> valgrind[1]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>>> on GNU/linux and Apple Mac OS X to find memory corruption
>>>>>>>>>>> errors
>>>>>>>>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile,
>>>>>>>>>>> link, and run
>>>>>>>>>>> [1]PETSC ERROR: to get more information on the crash.
>>>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in unknown
>>>>>>>>>>> directory unknown file (null)
>>>>>>>>>>> [3]PETSC ERROR: ------------------------------
>>>>>>>>>>> ------------------------------------------
>>>>>>>>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>>>>>>>> Violation, probably memory access out of range
>>>>>>>>>>> [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>>> -on_error_attach_debugger
>>>>>>>>>>> [3]PETSC ERROR: or see
>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#
>>>>>>>>>>> valgrind[3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>>> on GNU/linux and Apple Mac OS X to find memory corruption
>>>>>>>>>>> errors
>>>>>>>>>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile,
>>>>>>>>>>> link, and run
>>>>>>>>>>> [3]PETSC ERROR: to get more information on the crash.
>>>>>>>>>>> [3]PETSC ERROR: User provided function() line 0 in unknown
>>>>>>>>>>> directory unknown file (null)
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>> Thank you.
>>>>>>>>>>>
>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>
>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>
>>>>>>>>>>> On 14/4/2014 9:05 PM, Barry Smith wrote:
>>>>>>>>>>>
>>>>>>>>>>> Because IO doesn’t always get flushed immediately it may not
>>>>>>>>>>>> be hanging at this point. It is better to use the option
>>>>>>>>>>>> -start_in_debugger then type cont in each debugger window and then when you
>>>>>>>>>>>> think it is “hanging” do a control C in each debugger window and type where
>>>>>>>>>>>> to see where each process is you can also look around in the debugger at
>>>>>>>>>>>> variables to see why it is “hanging” at that point.
>>>>>>>>>>>>
>>>>>>>>>>>> Barry
>>>>>>>>>>>>
>>>>>>>>>>>> This routines don’t have any parallel communication in them
>>>>>>>>>>>> so are unlikely to hang.
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng
>>>>>>>>>>>>
>>>>>>>>>>>> <zonexo at gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> My code hangs and I added in mpi_barrier and print to catch
>>>>>>>>>>>>> the bug. I found that it hangs after printing "7". Is it because I'm doing
>>>>>>>>>>>>> something wrong? I need to access the u,v,w array so I use
>>>>>>>>>>>>> DMDAVecGetArrayF90. After access, I use DMDAVecRestoreArrayF90.
>>>>>>>>>>>>>
>>>>>>>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0)
>>>>>>>>>>>>> print *,"3"
>>>>>>>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0)
>>>>>>>>>>>>> print *,"4"
>>>>>>>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0)
>>>>>>>>>>>>> print *,"5"
>>>>>>>>>>>>> call I_IIB_uv_initial_1st_dm(I_
>>>>>>>>>>>>> cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_
>>>>>>>>>>>>> v1,I_cell_w1,u_array,v_array,w_array)
>>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0)
>>>>>>>>>>>>> print *,"6"
>>>>>>>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>>>>>>> !must be in reverse order
>>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0)
>>>>>>>>>>>>> print *,"7"
>>>>>>>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_
>>>>>>>>>>>>> local,v_array,ierr)
>>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0)
>>>>>>>>>>>>> print *,"8"
>>>>>>>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_
>>>>>>>>>>>>> local,u_array,ierr)
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>>>
>>>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <code.txt>
>>>>>>>>>
>>>>>>>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140419/551fd270/attachment-0001.html>
More information about the petsc-users
mailing list