[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
TAY wee-beng
zonexo at gmail.com
Sat Apr 19 10:49:29 CDT 2014
On 19/4/2014 11:39 PM, Matthew Knepley wrote:
> On Sat, Apr 19, 2014 at 10:16 AM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> On 19/4/2014 10:55 PM, Matthew Knepley wrote:
>> On Sat, Apr 19, 2014 at 9:14 AM, TAY wee-beng <zonexo at gmail.com
>> <mailto:zonexo at gmail.com>> wrote:
>>
>> On 19/4/2014 6:48 PM, Matthew Knepley wrote:
>>> On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng
>>> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>
>>> On 19/4/2014 1:17 PM, Barry Smith wrote:
>>>
>>> On Apr 19, 2014, at 12:11 AM, TAY wee-beng
>>> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>
>>> On 19/4/2014 12:10 PM, Barry Smith wrote:
>>>
>>> On Apr 18, 2014, at 9:57 PM, TAY wee-beng
>>> <zonexo at gmail.com <mailto:zonexo at gmail.com>>
>>> wrote:
>>>
>>> On 19/4/2014 3:53 AM, Barry Smith wrote:
>>>
>>> Hmm,
>>>
>>> Interface DMDAVecGetArrayF90
>>> Subroutine
>>> DMDAVecGetArrayF903(da1, v,d1,ierr)
>>> USE_DM_HIDE
>>> DM_HIDE da1
>>> VEC_HIDE v
>>> PetscScalar,pointer :: d1(:,:,:)
>>> PetscErrorCode ierr
>>> End Subroutine
>>>
>>> So the d1 is a F90 POINTER. But
>>> your subroutine seems to be treating
>>> it as a “plain old Fortran array”?
>>> real(8), intent(inout) ::
>>> u(:,:,:),v(:,:,:),w(:,:,:)
>>>
>>> Hi,
>>>
>>> So d1 is a pointer, and it's different if I
>>> declare it as "plain old Fortran array"? Because
>>> I declare it as a Fortran array and it works w/o
>>> any problem if I only call DMDAVecGetArrayF90
>>> and DMDAVecRestoreArrayF90 with "u".
>>>
>>> But if I call DMDAVecGetArrayF90 and
>>> DMDAVecRestoreArrayF90 with "u", "v" and "w",
>>> error starts to happen. I wonder why...
>>>
>>> Also, supposed I call:
>>>
>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call
>>> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> u_array ....
>>>
>>> v_array .... etc
>>>
>>> Now to restore the array, does it matter the
>>> sequence they are restored?
>>>
>>> No it should not matter. If it matters that is a
>>> sign that memory has been written to incorrectly
>>> earlier in the code.
>>>
>>> Hi,
>>>
>>> Hmm, I have been getting different results on different
>>> intel compilers. I'm not sure if MPI played a part but
>>> I'm only using a single processor. In the debug mode,
>>> things run without problem. In optimized mode, in some
>>> cases, the code aborts even doing simple initialization:
>>>
>>>
>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>>>
>>> u_array = 0.d0
>>>
>>> v_array = 0.d0
>>>
>>> w_array = 0.d0
>>>
>>> p_array = 0.d0
>>>
>>>
>>> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>>>
>>>
>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> The code aborts at call
>>> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr),
>>> giving segmentation error. But other version of intel
>>> compiler passes thru this part w/o error. Since the
>>> response is different among different compilers, is this
>>> PETSc or intel 's bug? Or mvapich or openmpi?
>>>
>>>
>>> We do this is a bunch of examples. Can you reproduce this
>>> different behavior in src/dm/examples/tutorials/ex11f90.F?
>>
>> Hi Matt,
>>
>> Do you mean putting the above lines into ex11f90.F and test?
>>
>>
>> It already has DMDAVecGetArray(). Just run it.
>
> Hi,
>
> It worked. The differences between mine and the code is the way
> the fortran modules are defined, and the ex11f90 only uses global
> vectors. Does it make a difference whether global or local vectors
> are used? Because the way it accesses x1 only touches the local
> region.
>
>
> No the global/local difference should not matter.
>
> Also, before using DMDAVecGetArrayF90, DMGetGlobalVector must be
> used 1st, is that so? I can't find the equivalent for local vector
> though.
>
>
> DMGetLocalVector()
Ops, I do not have DMGetLocalVector and DMRestoreLocalVector in my code.
Does it matter?
If so, when should I call them?
Thanks.
>
> Matt
>
> Thanks.
>>
>> Matt
>>
>> Thanks
>>
>> Regards.
>>>
>>> Matt
>>>
>>> As in w, then v and u?
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>> call
>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>> call
>>> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> thanks
>>>
>>> Note also that the beginning and
>>> end indices of the u,v,w, are
>>> different for each process see for
>>> example
>>> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
>>> (and they do not start at 1). This
>>> is how to get the loop bounds.
>>>
>>> Hi,
>>>
>>> In my case, I fixed the u,v,w such that
>>> their indices are the same. I also
>>> checked using DMDAGetCorners and
>>> DMDAGetGhostCorners. Now the problem
>>> lies in my subroutine treating it as a
>>> “plain old Fortran array”.
>>>
>>> If I declare them as pointers, their
>>> indices follow the C 0 start convention,
>>> is that so?
>>>
>>> Not really. It is that in each process
>>> you need to access them from the indices
>>> indicated by DMDAGetCorners() for global
>>> vectors and DMDAGetGhostCorners() for local
>>> vectors. So really C or Fortran doesn’t
>>> make any difference.
>>>
>>>
>>> So my problem now is that in my old MPI
>>> code, the u(i,j,k) follow the Fortran 1
>>> start convention. Is there some way to
>>> manipulate such that I do not have to
>>> change my u(i,j,k) to u(i-1,j-1,k-1)?
>>>
>>> If you code wishes to access them with
>>> indices plus one from the values returned by
>>> DMDAGetCorners() for global vectors and
>>> DMDAGetGhostCorners() for local vectors then
>>> you need to manually subtract off the 1.
>>>
>>> Barry
>>>
>>> Thanks.
>>>
>>> Barry
>>>
>>> On Apr 18, 2014, at 10:58 AM, TAY
>>> wee-beng <zonexo at gmail.com
>>> <mailto:zonexo at gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I tried to pinpoint the problem.
>>> I reduced my job size and hence
>>> I can run on 1 processor. Tried
>>> using valgrind but perhaps I'm
>>> using the optimized version, it
>>> didn't catch the error, besides
>>> saying "Segmentation fault (core
>>> dumped)"
>>>
>>> However, by re-writing my code,
>>> I found out a few things:
>>>
>>> 1. if I write my code this way:
>>>
>>> call
>>> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call
>>> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> u_array = ....
>>>
>>> v_array = ....
>>>
>>> w_array = ....
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> The code runs fine.
>>>
>>> 2. if I write my code this way:
>>>
>>> call
>>> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call
>>> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> call
>>> uvw_array_change(u_array,v_array,w_array)
>>> -> this subroutine does the same
>>> modification as the above.
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>> -> error
>>>
>>> where the subroutine is:
>>>
>>> subroutine uvw_array_change(u,v,w)
>>>
>>> real(8), intent(inout) ::
>>> u(:,:,:),v(:,:,:),w(:,:,:)
>>>
>>> u ...
>>> v...
>>> w ...
>>>
>>> end subroutine uvw_array_change.
>>>
>>> The above will give an error at :
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> 3. Same as above, except I
>>> change the order of the last 3
>>> lines to:
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> So they are now in reversed
>>> order. Now it works.
>>>
>>> 4. Same as 2 or 3, except the
>>> subroutine is changed to :
>>>
>>> subroutine uvw_array_change(u,v,w)
>>>
>>> real(8), intent(inout) ::
>>> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>>>
>>> real(8), intent(inout) ::
>>> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>>>
>>> real(8), intent(inout) ::
>>> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>>>
>>> u ...
>>> v...
>>> w ...
>>>
>>> end subroutine uvw_array_change.
>>>
>>> The start_indices and
>>> end_indices are simply to shift
>>> the 0 indices of C convention to
>>> that of the 1 indices of the
>>> Fortran convention. This is
>>> necessary in my case because
>>> most of my codes start array
>>> counting at 1, hence the "trick".
>>>
>>> However, now no matter which
>>> order of the
>>> DMDAVecRestoreArrayF90 (as in 2
>>> or 3), error will occur at "call
>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>> "
>>>
>>> So did I violate and cause
>>> memory corruption due to the
>>> trick above? But I can't think
>>> of any way other than the
>>> "trick" to continue using the 1
>>> indices convention.
>>>
>>> Thank you.
>>>
>>> Yours sincerely,
>>>
>>> TAY wee-beng
>>>
>>> On 15/4/2014 8:00 PM, Barry
>>> Smith wrote:
>>>
>>> Try running under
>>> valgrind
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>
>>>
>>> On Apr 14, 2014, at 9:47 PM,
>>> TAY wee-beng
>>> <zonexo at gmail.com
>>> <mailto:zonexo at gmail.com>>
>>> wrote:
>>>
>>> Hi Barry,
>>>
>>> As I mentioned earlier,
>>> the code works fine in
>>> PETSc debug mode but
>>> fails in non-debug mode.
>>>
>>> I have attached my code.
>>>
>>> Thank you
>>>
>>> Yours sincerely,
>>>
>>> TAY wee-beng
>>>
>>> On 15/4/2014 2:26 AM,
>>> Barry Smith wrote:
>>>
>>> Please send the
>>> code that creates
>>> da_w and the
>>> declarations of w_array
>>>
>>> Barry
>>>
>>> On Apr 14, 2014, at
>>> 9:40 AM, TAY wee-beng
>>> <zonexo at gmail.com
>>> <mailto:zonexo at gmail.com>>
>>> wrote:
>>>
>>>
>>> Hi Barry,
>>>
>>> I'm not too sure
>>> how to do it.
>>> I'm running mpi.
>>> So I run:
>>>
>>> mpirun -n 4
>>> ./a.out
>>> -start_in_debugger
>>>
>>> I got the msg
>>> below. Before
>>> the gdb windows
>>> appear (thru
>>> x11), the
>>> program aborts.
>>>
>>> Also I tried
>>> running in
>>> another cluster
>>> and it worked.
>>> Also tried in
>>> the current
>>> cluster in debug
>>> mode and it
>>> worked too.
>>>
>>> mpirun -n 4
>>> ./a.out
>>> -start_in_debugger
>>> --------------------------------------------------------------------------
>>> An MPI process
>>> has executed an
>>> operation
>>> involving a call
>>> to the
>>> "fork()" system
>>> call to create a
>>> child process.
>>> Open MPI is
>>> currently
>>> operating in a
>>> condition that
>>> could result in
>>> memory corruption or
>>> other system
>>> errors; your MPI
>>> job may hang,
>>> crash, or
>>> produce silent
>>> data corruption.
>>> The use of
>>> fork() (or
>>> system() or
>>> other calls that
>>> create child
>>> processes) is
>>> strongly
>>> discouraged.
>>>
>>> The process that
>>> invoked fork was:
>>>
>>> Local host:
>>> n12-76 (PID 20235)
>>> MPI_COMM_WORLD
>>> rank: 2
>>>
>>> If you are
>>> *absolutely
>>> sure* that your
>>> application will
>>> successfully
>>> and correctly
>>> survive a call
>>> to fork(), you
>>> may disable this
>>> warning
>>> by setting the
>>> mpi_warn_on_fork
>>> MCA parameter to 0.
>>> --------------------------------------------------------------------------
>>> [2]PETSC ERROR:
>>> PETSC: Attaching
>>> gdb to ./a.out
>>> of pid 20235 on
>>> display
>>> localhost:50.0
>>> on machine n12-76
>>> [0]PETSC ERROR:
>>> PETSC: Attaching
>>> gdb to ./a.out
>>> of pid 20233 on
>>> display
>>> localhost:50.0
>>> on machine n12-76
>>> [1]PETSC ERROR:
>>> PETSC: Attaching
>>> gdb to ./a.out
>>> of pid 20234 on
>>> display
>>> localhost:50.0
>>> on machine n12-76
>>> [3]PETSC ERROR:
>>> PETSC: Attaching
>>> gdb to ./a.out
>>> of pid 20236 on
>>> display
>>> localhost:50.0
>>> on machine n12-76
>>> [n12-76:20232] 3
>>> more processes
>>> have sent help
>>> message
>>> help-mpi-runtime.txt
>>> / mpi_init:warn-fork
>>> [n12-76:20232]
>>> Set MCA
>>> parameter
>>> "orte_base_help_aggregate"
>>> to 0 to see all
>>> help / error
>>> messages
>>>
>>> ....
>>>
>>> 1
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR:
>>> Caught signal
>>> number 11 SEGV:
>>> Segmentation
>>> Violation,
>>> probably memory
>>> access out of range
>>> [1]PETSC ERROR:
>>> Try option
>>> -start_in_debugger
>>> or
>>> -on_error_attach_debugger
>>> [1]PETSC ERROR:
>>> or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
>>> ERROR: or try
>>> http://valgrind.org
>>> on GNU/linux
>>> and Apple Mac OS
>>> X to find memory
>>> corruption errors
>>> [1]PETSC ERROR:
>>> configure using
>>> --with-debugging=yes,
>>> recompile, link,
>>> and run
>>> [1]PETSC ERROR:
>>> to get more
>>> information on
>>> the crash.
>>> [1]PETSC ERROR:
>>> User provided
>>> function() line
>>> 0 in unknown
>>> directory
>>> unknown file (null)
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR:
>>> Caught signal
>>> number 11 SEGV:
>>> Segmentation
>>> Violation,
>>> probably memory
>>> access out of range
>>> [3]PETSC ERROR:
>>> Try option
>>> -start_in_debugger
>>> or
>>> -on_error_attach_debugger
>>> [3]PETSC ERROR:
>>> or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
>>> ERROR: or try
>>> http://valgrind.org
>>> on GNU/linux
>>> and Apple Mac OS
>>> X to find memory
>>> corruption errors
>>> [3]PETSC ERROR:
>>> configure using
>>> --with-debugging=yes,
>>> recompile, link,
>>> and run
>>> [3]PETSC ERROR:
>>> to get more
>>> information on
>>> the crash.
>>> [3]PETSC ERROR:
>>> User provided
>>> function() line
>>> 0 in unknown
>>> directory
>>> unknown file (null)
>>>
>>> ...
>>> Thank you.
>>>
>>> Yours sincerely,
>>>
>>> TAY wee-beng
>>>
>>> On 14/4/2014
>>> 9:05 PM, Barry
>>> Smith wrote:
>>>
>>> Because
>>> IO doesn’t
>>> always get
>>> flushed
>>> immediately
>>> it may not
>>> be hanging
>>> at this
>>> point. It
>>> is better to
>>> use the
>>> option
>>> -start_in_debugger
>>> then type
>>> cont in each
>>> debugger
>>> window and
>>> then when
>>> you think it
>>> is “hanging”
>>> do a control
>>> C in each
>>> debugger
>>> window and
>>> type where
>>> to see where
>>> each process
>>> is you can
>>> also look
>>> around in
>>> the debugger
>>> at variables
>>> to see why
>>> it is
>>> “hanging” at
>>> that point.
>>>
>>> Barry
>>>
>>> This
>>> routines
>>> don’t have
>>> any parallel
>>> communication in
>>> them so are
>>> unlikely to
>>> hang.
>>>
>>> On Apr 14,
>>> 2014, at
>>> 6:52 AM, TAY
>>> wee-beng
>>>
>>> <zonexo at gmail.com
>>> <mailto:zonexo at gmail.com>>
>>>
>>> wrote:
>>>
>>>
>>>
>>> Hi,
>>>
>>> My code
>>> hangs
>>> and I
>>> added in
>>> mpi_barrier
>>> and
>>> print to
>>> catch
>>> the bug.
>>> I found
>>> that it
>>> hangs
>>> after
>>> printing
>>> "7". Is
>>> it
>>> because
>>> I'm
>>> doing
>>> something wrong?
>>> I need
>>> to
>>> access
>>> the
>>> u,v,w
>>> array so
>>> I use
>>> DMDAVecGetArrayF90.
>>> After
>>> access,
>>> I use
>>> DMDAVecRestoreArrayF90.
>>>
>>>
>>> call
>>> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>
>>> call
>>> MPI_Barrier(MPI_COMM_WORLD,ierr);
>>> if
>>> (myid==0) print
>>> *,"3"
>>>
>>> call
>>> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> MPI_Barrier(MPI_COMM_WORLD,ierr);
>>> if
>>> (myid==0) print
>>> *,"4"
>>>
>>> call
>>> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>
>>> call
>>> MPI_Barrier(MPI_COMM_WORLD,ierr);
>>> if
>>> (myid==0) print
>>> *,"5"
>>>
>>> call
>>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>>>
>>> call
>>> MPI_Barrier(MPI_COMM_WORLD,ierr);
>>> if
>>> (myid==0) print
>>> *,"6"
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>> !must
>>> be in
>>> reverse
>>> order
>>>
>>> call
>>> MPI_Barrier(MPI_COMM_WORLD,ierr);
>>> if
>>> (myid==0) print
>>> *,"7"
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>
>>> call
>>> MPI_Barrier(MPI_COMM_WORLD,ierr);
>>> if
>>> (myid==0) print
>>> *,"8"
>>>
>>> call
>>> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>> --
>>> Thank you.
>>>
>>> Yours
>>> sincerely,
>>>
>>> TAY wee-beng
>>>
>>>
>>>
>>> <code.txt>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin
>>> their experiments is infinitely more interesting than any
>>> results to which their experiments lead.
>>> -- Norbert Wiener
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to
>> which their experiments lead.
>> -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140419/c1fd808d/attachment-0001.html>
More information about the petsc-users
mailing list