[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90

TAY wee-beng zonexo at gmail.com
Sat Apr 19 09:14:56 CDT 2014


On 19/4/2014 6:48 PM, Matthew Knepley wrote:
> On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     On 19/4/2014 1:17 PM, Barry Smith wrote:
>
>         On Apr 19, 2014, at 12:11 AM, TAY wee-beng <zonexo at gmail.com
>         <mailto:zonexo at gmail.com>> wrote:
>
>             On 19/4/2014 12:10 PM, Barry Smith wrote:
>
>                 On Apr 18, 2014, at 9:57 PM, TAY wee-beng
>                 <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>
>                     On 19/4/2014 3:53 AM, Barry Smith wrote:
>
>                            Hmm,
>
>                                Interface DMDAVecGetArrayF90
>                                  Subroutine DMDAVecGetArrayF903(da1,
>                         v,d1,ierr)
>                                    USE_DM_HIDE
>                                    DM_HIDE da1
>                                    VEC_HIDE v
>                                    PetscScalar,pointer :: d1(:,:,:)
>                                    PetscErrorCode ierr
>                                  End Subroutine
>
>                             So the d1 is a F90 POINTER. But your
>                         subroutine seems to be treating it as a “plain
>                         old Fortran array”?
>                         real(8), intent(inout) ::
>                         u(:,:,:),v(:,:,:),w(:,:,:)
>
>             Hi,
>
>             So d1 is a pointer, and it's different if I declare it as
>             "plain old Fortran array"? Because I declare it as a
>             Fortran array and it works w/o any problem if I only call
>             DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u".
>
>             But if I call DMDAVecGetArrayF90 and
>             DMDAVecRestoreArrayF90 with "u", "v" and "w", error starts
>             to happen. I wonder why...
>
>             Also, supposed I call:
>
>             call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
>                 call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
>                 call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
>             u_array ....
>
>             v_array .... etc
>
>             Now to restore the array, does it matter the sequence they
>             are restored?
>
>             No it should not matter. If it matters that is a sign that
>         memory has been written to incorrectly earlier in the code.
>
>     Hi,
>
>     Hmm, I have been getting different results on different intel
>     compilers. I'm not sure if MPI played a part but I'm only using a
>     single processor. In the debug mode, things run without problem.
>     In optimized mode, in some cases, the code aborts even doing
>     simple initialization:
>
>
>     call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
>         call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
>         call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
>         call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>
>         u_array = 0.d0
>
>         v_array = 0.d0
>
>         w_array = 0.d0
>
>         p_array = 0.d0
>
>
>         call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>
>
>         call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
>         call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
>         call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
>     The code aborts at call
>     DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), giving
>     segmentation error. But other version of intel compiler passes
>     thru this part w/o error. Since the response is different among
>     different compilers, is this PETSc or intel 's bug? Or mvapich or
>     openmpi?
>
>
> We do this is a bunch of examples. Can you reproduce this different 
> behavior in src/dm/examples/tutorials/ex11f90.F?

Hi Matt,

Do you mean putting the above lines into ex11f90.F and test?

Thanks

Regards.
>
>    Matt
>
>             As in w, then v and u?
>
>             call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>             call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>             call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
>             thanks
>
>                             Note also that the beginning and end
>                         indices of the u,v,w, are different for each
>                         process see for example
>                         http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
>                          (and they do not start at 1). This is how to
>                         get the loop bounds.
>
>                     Hi,
>
>                     In my case, I fixed the u,v,w such that their
>                     indices are the same. I also checked using
>                     DMDAGetCorners and DMDAGetGhostCorners. Now the
>                     problem lies in my subroutine treating it as a
>                     “plain old Fortran array”.
>
>                     If I declare them as pointers, their indices
>                     follow the C 0 start convention, is that so?
>
>                     Not really. It is that in each process you need to
>                 access them from the indices indicated by
>                 DMDAGetCorners() for global vectors and
>                 DMDAGetGhostCorners() for local vectors.  So really C
>                 or Fortran doesn’t make any difference.
>
>
>                     So my problem now is that in my old MPI code, the
>                     u(i,j,k) follow the Fortran 1 start convention. Is
>                     there some way to manipulate such that I do not
>                     have to change my u(i,j,k) to u(i-1,j-1,k-1)?
>
>                    If you code wishes to access them with indices plus
>                 one from the values returned by DMDAGetCorners() for
>                 global vectors and DMDAGetGhostCorners() for local
>                 vectors then you need to manually subtract off the 1.
>
>                    Barry
>
>                     Thanks.
>
>                            Barry
>
>                         On Apr 18, 2014, at 10:58 AM, TAY wee-beng
>                         <zonexo at gmail.com <mailto:zonexo at gmail.com>>
>                         wrote:
>
>                             Hi,
>
>                             I tried to pinpoint the problem. I reduced
>                             my job size and hence I can run on 1
>                             processor. Tried using valgrind but
>                             perhaps I'm using the optimized version,
>                             it didn't catch the error, besides saying
>                             "Segmentation fault (core dumped)"
>
>                             However, by re-writing my code, I found
>                             out a few things:
>
>                             1. if I write my code this way:
>
>                             call
>                             DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
>                             call
>                             DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
>                             call
>                             DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
>                             u_array = ....
>
>                             v_array = ....
>
>                             w_array = ....
>
>                             call
>                             DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
>                             call
>                             DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
>                             call
>                             DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
>                             The code runs fine.
>
>                             2. if I write my code this way:
>
>                             call
>                             DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
>                             call
>                             DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
>                             call
>                             DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
>                             call
>                             uvw_array_change(u_array,v_array,w_array)
>                             -> this subroutine does the same
>                             modification as the above.
>
>                             call
>                             DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
>                             call
>                             DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
>                             call
>                             DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>                             -> error
>
>                             where the subroutine is:
>
>                             subroutine uvw_array_change(u,v,w)
>
>                             real(8), intent(inout) ::
>                             u(:,:,:),v(:,:,:),w(:,:,:)
>
>                             u ...
>                             v...
>                             w ...
>
>                             end subroutine uvw_array_change.
>
>                             The above will give an error at :
>
>                             call
>                             DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
>                             3. Same as above, except I change the
>                             order of the last 3 lines to:
>
>                             call
>                             DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
>                             call
>                             DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
>                             call
>                             DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
>                             So they are now in reversed order. Now it
>                             works.
>
>                             4. Same as 2 or 3, except the subroutine
>                             is changed to :
>
>                             subroutine uvw_array_change(u,v,w)
>
>                             real(8), intent(inout) ::
>                             u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
>                             real(8), intent(inout) ::
>                             v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
>                             real(8), intent(inout) ::
>                             w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
>                             u ...
>                             v...
>                             w ...
>
>                             end subroutine uvw_array_change.
>
>                             The start_indices and end_indices are
>                             simply to shift the 0 indices of C
>                             convention to that of the 1 indices of the
>                             Fortran convention. This is necessary in
>                             my case because most of my codes start
>                             array counting at 1, hence the "trick".
>
>                             However, now no matter which order of the
>                             DMDAVecRestoreArrayF90 (as in 2 or 3),
>                             error will occur at "call
>                             DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>                             "
>
>                             So did I violate and cause memory
>                             corruption due to the trick above? But I
>                             can't think of any way other than the
>                             "trick" to continue using the 1 indices
>                             convention.
>
>                             Thank you.
>
>                             Yours sincerely,
>
>                             TAY wee-beng
>
>                             On 15/4/2014 8:00 PM, Barry Smith wrote:
>
>                                    Try running under valgrind
>                                 http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>
>                                 On Apr 14, 2014, at 9:47 PM, TAY
>                                 wee-beng <zonexo at gmail.com
>                                 <mailto:zonexo at gmail.com>> wrote:
>
>                                     Hi Barry,
>
>                                     As I mentioned earlier, the code
>                                     works fine in PETSc debug mode but
>                                     fails in non-debug mode.
>
>                                     I have attached my code.
>
>                                     Thank you
>
>                                     Yours sincerely,
>
>                                     TAY wee-beng
>
>                                     On 15/4/2014 2:26 AM, Barry Smith
>                                     wrote:
>
>                                            Please send the code that
>                                         creates da_w and the
>                                         declarations of w_array
>
>                                            Barry
>
>                                         On Apr 14, 2014, at 9:40 AM,
>                                         TAY wee-beng
>                                         <zonexo at gmail.com
>                                         <mailto:zonexo at gmail.com>>
>                                           wrote:
>
>
>                                             Hi Barry,
>
>                                             I'm not too sure how to do
>                                             it. I'm running mpi. So I run:
>
>                                               mpirun -n 4 ./a.out
>                                             -start_in_debugger
>
>                                             I got the msg below.
>                                             Before the gdb windows
>                                             appear (thru x11), the
>                                             program aborts.
>
>                                             Also I tried running in
>                                             another cluster and it
>                                             worked. Also tried in the
>                                             current cluster in debug
>                                             mode and it worked too.
>
>                                             mpirun -n 4 ./a.out
>                                             -start_in_debugger
>                                             --------------------------------------------------------------------------
>                                             An MPI process has
>                                             executed an operation
>                                             involving a call to the
>                                             "fork()" system call to
>                                             create a child process.
>                                              Open MPI is currently
>                                             operating in a condition
>                                             that could result in
>                                             memory corruption or
>                                             other system errors; your
>                                             MPI job may hang, crash,
>                                             or produce silent
>                                             data corruption.  The use
>                                             of fork() (or system() or
>                                             other calls that
>                                             create child processes) is
>                                             strongly discouraged.
>
>                                             The process that invoked
>                                             fork was:
>
>                                                Local host:        
>                                              n12-76 (PID 20235)
>                                                MPI_COMM_WORLD rank: 2
>
>                                             If you are *absolutely
>                                             sure* that your
>                                             application will successfully
>                                             and correctly survive a
>                                             call to fork(), you may
>                                             disable this warning
>                                             by setting the
>                                             mpi_warn_on_fork MCA
>                                             parameter to 0.
>                                             --------------------------------------------------------------------------
>                                             [2]PETSC ERROR: PETSC:
>                                             Attaching gdb to ./a.out
>                                             of pid 20235 on display
>                                             localhost:50.0 on machine
>                                             n12-76
>                                             [0]PETSC ERROR: PETSC:
>                                             Attaching gdb to ./a.out
>                                             of pid 20233 on display
>                                             localhost:50.0 on machine
>                                             n12-76
>                                             [1]PETSC ERROR: PETSC:
>                                             Attaching gdb to ./a.out
>                                             of pid 20234 on display
>                                             localhost:50.0 on machine
>                                             n12-76
>                                             [3]PETSC ERROR: PETSC:
>                                             Attaching gdb to ./a.out
>                                             of pid 20236 on display
>                                             localhost:50.0 on machine
>                                             n12-76
>                                             [n12-76:20232] 3 more
>                                             processes have sent help
>                                             message
>                                             help-mpi-runtime.txt /
>                                             mpi_init:warn-fork
>                                             [n12-76:20232] Set MCA
>                                             parameter
>                                             "orte_base_help_aggregate"
>                                             to 0 to see all help /
>                                             error messages
>
>                                             ....
>
>                                               1
>                                             [1]PETSC ERROR:
>                                             ------------------------------------------------------------------------
>                                             [1]PETSC ERROR: Caught
>                                             signal number 11 SEGV:
>                                             Segmentation Violation,
>                                             probably memory access out
>                                             of range
>                                             [1]PETSC ERROR: Try option
>                                             -start_in_debugger or
>                                             -on_error_attach_debugger
>                                             [1]PETSC ERROR: or see
>                                             http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
>                                             ERROR: or try
>                                             http://valgrind.org
>                                               on GNU/linux and Apple
>                                             Mac OS X to find memory
>                                             corruption errors
>                                             [1]PETSC ERROR: configure
>                                             using
>                                             --with-debugging=yes,
>                                             recompile, link, and run
>                                             [1]PETSC ERROR: to get
>                                             more information on the crash.
>                                             [1]PETSC ERROR: User
>                                             provided function() line 0
>                                             in unknown directory
>                                             unknown file (null)
>                                             [3]PETSC ERROR:
>                                             ------------------------------------------------------------------------
>                                             [3]PETSC ERROR: Caught
>                                             signal number 11 SEGV:
>                                             Segmentation Violation,
>                                             probably memory access out
>                                             of range
>                                             [3]PETSC ERROR: Try option
>                                             -start_in_debugger or
>                                             -on_error_attach_debugger
>                                             [3]PETSC ERROR: or see
>                                             http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
>                                             ERROR: or try
>                                             http://valgrind.org
>                                               on GNU/linux and Apple
>                                             Mac OS X to find memory
>                                             corruption errors
>                                             [3]PETSC ERROR: configure
>                                             using
>                                             --with-debugging=yes,
>                                             recompile, link, and run
>                                             [3]PETSC ERROR: to get
>                                             more information on the crash.
>                                             [3]PETSC ERROR: User
>                                             provided function() line 0
>                                             in unknown directory
>                                             unknown file (null)
>
>                                             ...
>                                             Thank you.
>
>                                             Yours sincerely,
>
>                                             TAY wee-beng
>
>                                             On 14/4/2014 9:05 PM,
>                                             Barry Smith wrote:
>
>                                                    Because IO doesn’t
>                                                 always get flushed
>                                                 immediately it may not
>                                                 be hanging at this
>                                                 point.  It is better
>                                                 to use the option
>                                                 -start_in_debugger
>                                                 then type cont in each
>                                                 debugger window and
>                                                 then when you think it
>                                                 is “hanging” do a
>                                                 control C in each
>                                                 debugger window and
>                                                 type where to see
>                                                 where each process is
>                                                 you can also look
>                                                 around in the debugger
>                                                 at variables to see
>                                                 why it is “hanging” at
>                                                 that point.
>
>                                                     Barry
>
>                                                    This routines don’t
>                                                 have any parallel
>                                                 communication in them
>                                                 so are unlikely to hang.
>
>                                                 On Apr 14, 2014, at
>                                                 6:52 AM, TAY wee-beng
>
>                                                 <zonexo at gmail.com
>                                                 <mailto:zonexo at gmail.com>>
>
>                                                   wrote:
>
>
>
>                                                     Hi,
>
>                                                     My code hangs and
>                                                     I added in
>                                                     mpi_barrier and
>                                                     print to catch the
>                                                     bug. I found that
>                                                     it hangs after
>                                                     printing "7". Is
>                                                     it because I'm
>                                                     doing something
>                                                     wrong? I need to
>                                                     access the u,v,w
>                                                     array so I use
>                                                     DMDAVecGetArrayF90. After
>                                                     access, I use
>                                                     DMDAVecRestoreArrayF90.
>
>                                                              call
>                                                     DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>                                                              call
>                                                     MPI_Barrier(MPI_COMM_WORLD,ierr);
>                                                      if (myid==0)
>                                                     print *,"3"
>                                                              call
>                                                     DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>                                                              call
>                                                     MPI_Barrier(MPI_COMM_WORLD,ierr);
>                                                      if (myid==0)
>                                                     print *,"4"
>                                                              call
>                                                     DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>                                                              call
>                                                     MPI_Barrier(MPI_COMM_WORLD,ierr);
>                                                      if (myid==0)
>                                                     print *,"5"
>                                                              call
>                                                     I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>                                                              call
>                                                     MPI_Barrier(MPI_COMM_WORLD,ierr);
>                                                      if (myid==0)
>                                                     print *,"6"
>                                                              call
>                                                     DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>                                                      !must be in
>                                                     reverse order
>                                                              call
>                                                     MPI_Barrier(MPI_COMM_WORLD,ierr);
>                                                      if (myid==0)
>                                                     print *,"7"
>                                                              call
>                                                     DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>                                                              call
>                                                     MPI_Barrier(MPI_COMM_WORLD,ierr);
>                                                      if (myid==0)
>                                                     print *,"8"
>                                                              call
>                                                     DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>                                                     -- 
>                                                     Thank you.
>
>                                                     Yours sincerely,
>
>                                                     TAY wee-beng
>
>
>
>                                     <code.txt>
>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140419/6e974cf8/attachment-0001.html>


More information about the petsc-users mailing list