<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 19/4/2014 6:48 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote
cite="mid:CAMYG4Gmg4=e7AaKiWAeEGxdjJ1pThZA5XFcBWhj5uOajqq4nig@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Sat, Apr 19, 2014 at 4:59 AM, TAY
wee-beng <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb">
<div class="h5">On 19/4/2014 1:17 PM, Barry Smith wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
On Apr 19, 2014, at 12:11 AM, TAY wee-beng <<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
On 19/4/2014 12:10 PM, Barry Smith wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Apr 18, 2014, at 9:57 PM, TAY wee-beng <<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
On 19/4/2014 3:53 AM, Barry Smith wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Hmm,<br>
<br>
Interface DMDAVecGetArrayF90<br>
Subroutine DMDAVecGetArrayF903(da1,
v,d1,ierr)<br>
USE_DM_HIDE<br>
DM_HIDE da1<br>
VEC_HIDE v<br>
PetscScalar,pointer :: d1(:,:,:)<br>
PetscErrorCode ierr<br>
End Subroutine<br>
<br>
So the d1 is a F90 POINTER. But your
subroutine seems to be treating it as a
“plain old Fortran array”?<br>
real(8), intent(inout) ::
u(:,:,:),v(:,:,:),w(:,:,:)<br>
</blockquote>
</blockquote>
</blockquote>
Hi,<br>
<br>
So d1 is a pointer, and it's different if I
declare it as "plain old Fortran array"? Because I
declare it as a Fortran array and it works w/o any
problem if I only call DMDAVecGetArrayF90 and
DMDAVecRestoreArrayF90 with "u".<br>
<br>
But if I call DMDAVecGetArrayF90 and
DMDAVecRestoreArrayF90 with "u", "v" and "w",
error starts to happen. I wonder why...<br>
<br>
Also, supposed I call:<br>
<br>
call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
u_array ....<br>
<br>
v_array .... etc<br>
<br>
Now to restore the array, does it matter the
sequence they are restored?<br>
</blockquote>
No it should not matter. If it matters that is a
sign that memory has been written to incorrectly
earlier in the code.<br>
<br>
</blockquote>
</div>
</div>
Hi,<br>
<br>
Hmm, I have been getting different results on different
intel compilers. I'm not sure if MPI played a part but I'm
only using a single processor. In the debug mode, things
run without problem. In optimized mode, in some cases, the
code aborts even doing simple initialization:
<div class="">
<br>
<br>
call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
</div>
call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)<br>
<br>
u_array = 0.d0<br>
<br>
v_array = 0.d0<br>
<br>
w_array = 0.d0<br>
<br>
p_array = 0.d0<br>
<br>
<br>
call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
<div class=""><br>
<br>
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
</div>
The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr),
giving segmentation error. But other version of intel
compiler passes thru this part w/o error. Since the
response is different among different compilers, is this
PETSc or intel 's bug? Or mvapich or openmpi?</blockquote>
<div><br>
</div>
<div>We do this is a bunch of examples. Can you reproduce
this different behavior in
src/dm/examples/tutorials/ex11f90.F?</div>
</div>
</div>
</div>
</blockquote>
<br>
Hi Matt,<br>
<br>
Do you mean putting the above lines into ex11f90.F and test?<br>
<br>
Thanks<br>
<br>
Regards.<br>
<blockquote
cite="mid:CAMYG4Gmg4=e7AaKiWAeEGxdjJ1pThZA5XFcBWhj5uOajqq4nig@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb">
<div class="h5">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
As in w, then v and u?<br>
<br>
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)<br>
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)<br>
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
thanks<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Note also that the beginning and end
indices of the u,v,w, are different for each
process see for example <a
moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F"
target="_blank">http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F</a>
(and they do not start at 1). This is how
to get the loop bounds.<br>
</blockquote>
Hi,<br>
<br>
In my case, I fixed the u,v,w such that their
indices are the same. I also checked using
DMDAGetCorners and DMDAGetGhostCorners. Now
the problem lies in my subroutine treating it
as a “plain old Fortran array”.<br>
<br>
If I declare them as pointers, their indices
follow the C 0 start convention, is that so?<br>
</blockquote>
Not really. It is that in each process you
need to access them from the indices indicated
by DMDAGetCorners() for global vectors and
DMDAGetGhostCorners() for local vectors. So
really C or Fortran doesn’t make any difference.<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
So my problem now is that in my old MPI code,
the u(i,j,k) follow the Fortran 1 start
convention. Is there some way to manipulate
such that I do not have to change my u(i,j,k)
to u(i-1,j-1,k-1)?<br>
</blockquote>
If you code wishes to access them with
indices plus one from the values returned by
DMDAGetCorners() for global vectors and
DMDAGetGhostCorners() for local vectors then you
need to manually subtract off the 1.<br>
<br>
Barry<br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
Thanks.<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Barry<br>
<br>
On Apr 18, 2014, at 10:58 AM, TAY wee-beng
<<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Hi,<br>
<br>
I tried to pinpoint the problem. I reduced
my job size and hence I can run on 1
processor. Tried using valgrind but
perhaps I'm using the optimized version,
it didn't catch the error, besides saying
"Segmentation fault (core dumped)"<br>
<br>
However, by re-writing my code, I found
out a few things:<br>
<br>
1. if I write my code this way:<br>
<br>
call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
u_array = ....<br>
<br>
v_array = ....<br>
<br>
w_array = ....<br>
<br>
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
The code runs fine.<br>
<br>
2. if I write my code this way:<br>
<br>
call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
call uvw_array_change(u_array,v_array,w_array)
-> this subroutine does the same
modification as the above.<br>
<br>
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
-> error<br>
<br>
where the subroutine is:<br>
<br>
subroutine uvw_array_change(u,v,w)<br>
<br>
real(8), intent(inout) ::
u(:,:,:),v(:,:,:),w(:,:,:)<br>
<br>
u ...<br>
v...<br>
w ...<br>
<br>
end subroutine uvw_array_change.<br>
<br>
The above will give an error at :<br>
<br>
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
3. Same as above, except I change the
order of the last 3 lines to:<br>
<br>
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)<br>
<br>
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)<br>
<br>
So they are now in reversed order. Now it
works.<br>
<br>
4. Same as 2 or 3, except the subroutine
is changed to :<br>
<br>
subroutine uvw_array_change(u,v,w)<br>
<br>
real(8), intent(inout) ::
u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))<br>
<br>
real(8), intent(inout) ::
v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))<br>
<br>
real(8), intent(inout) ::
w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))<br>
<br>
u ...<br>
v...<br>
w ...<br>
<br>
end subroutine uvw_array_change.<br>
<br>
The start_indices and end_indices are
simply to shift the 0 indices of C
convention to that of the 1 indices of the
Fortran convention. This is necessary in
my case because most of my codes start
array counting at 1, hence the "trick".<br>
<br>
However, now no matter which order of the
DMDAVecRestoreArrayF90 (as in 2 or 3),
error will occur at "call
DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
"<br>
<br>
So did I violate and cause memory
corruption due to the trick above? But I
can't think of any way other than the
"trick" to continue using the 1 indices
convention.<br>
<br>
Thank you.<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 15/4/2014 8:00 PM, Barry Smith wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Try running under valgrind <a
moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind"
target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br>
<br>
<br>
On Apr 14, 2014, at 9:47 PM, TAY
wee-beng <<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Hi Barry,<br>
<br>
As I mentioned earlier, the code works
fine in PETSc debug mode but fails in
non-debug mode.<br>
<br>
I have attached my code.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 15/4/2014 2:26 AM, Barry Smith
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Please send the code that creates
da_w and the declarations of w_array<br>
<br>
Barry<br>
<br>
On Apr 14, 2014, at 9:40 AM, TAY
wee-beng<br>
<<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>><br>
wrote:<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Hi Barry,<br>
<br>
I'm not too sure how to do it. I'm
running mpi. So I run:<br>
<br>
mpirun -n 4 ./a.out
-start_in_debugger<br>
<br>
I got the msg below. Before the
gdb windows appear (thru x11), the
program aborts.<br>
<br>
Also I tried running in another
cluster and it worked. Also tried
in the current cluster in debug
mode and it worked too.<br>
<br>
mpirun -n 4 ./a.out
-start_in_debugger<br>
--------------------------------------------------------------------------<br>
An MPI process has executed an
operation involving a call to the<br>
"fork()" system call to create a
child process. Open MPI is
currently<br>
operating in a condition that
could result in memory corruption
or<br>
other system errors; your MPI job
may hang, crash, or produce silent<br>
data corruption. The use of
fork() (or system() or other calls
that<br>
create child processes) is
strongly discouraged.<br>
<br>
The process that invoked fork was:<br>
<br>
Local host: n12-76
(PID 20235)<br>
MPI_COMM_WORLD rank: 2<br>
<br>
If you are *absolutely sure* that
your application will successfully<br>
and correctly survive a call to
fork(), you may disable this
warning<br>
by setting the mpi_warn_on_fork
MCA parameter to 0.<br>
--------------------------------------------------------------------------<br>
[2]PETSC ERROR: PETSC: Attaching
gdb to ./a.out of pid 20235 on
display localhost:50.0 on machine
n12-76<br>
[0]PETSC ERROR: PETSC: Attaching
gdb to ./a.out of pid 20233 on
display localhost:50.0 on machine
n12-76<br>
[1]PETSC ERROR: PETSC: Attaching
gdb to ./a.out of pid 20234 on
display localhost:50.0 on machine
n12-76<br>
[3]PETSC ERROR: PETSC: Attaching
gdb to ./a.out of pid 20236 on
display localhost:50.0 on machine
n12-76<br>
[n12-76:20232] 3 more processes
have sent help message
help-mpi-runtime.txt /
mpi_init:warn-fork<br>
[n12-76:20232] Set MCA parameter
"orte_base_help_aggregate" to 0 to
see all help / error messages<br>
<br>
....<br>
<br>
1<br>
[1]PETSC ERROR:
------------------------------------------------------------------------<br>
[1]PETSC ERROR: Caught signal
number 11 SEGV: Segmentation
Violation, probably memory access
out of range<br>
[1]PETSC ERROR: Try option
-start_in_debugger or
-on_error_attach_debugger<br>
[1]PETSC ERROR: or see<br>
<a moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC"
target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC</a>
ERROR: or try <a
moz-do-not-send="true"
href="http://valgrind.org"
target="_blank">http://valgrind.org</a><br>
on GNU/linux and Apple Mac OS X
to find memory corruption errors<br>
[1]PETSC ERROR: configure using
--with-debugging=yes, recompile,
link, and run<br>
[1]PETSC ERROR: to get more
information on the crash.<br>
[1]PETSC ERROR: User provided
function() line 0 in unknown
directory unknown file (null)<br>
[3]PETSC ERROR:
------------------------------------------------------------------------<br>
[3]PETSC ERROR: Caught signal
number 11 SEGV: Segmentation
Violation, probably memory access
out of range<br>
[3]PETSC ERROR: Try option
-start_in_debugger or
-on_error_attach_debugger<br>
[3]PETSC ERROR: or see<br>
<a moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC"
target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC</a>
ERROR: or try <a
moz-do-not-send="true"
href="http://valgrind.org"
target="_blank">http://valgrind.org</a><br>
on GNU/linux and Apple Mac OS X
to find memory corruption errors<br>
[3]PETSC ERROR: configure using
--with-debugging=yes, recompile,
link, and run<br>
[3]PETSC ERROR: to get more
information on the crash.<br>
[3]PETSC ERROR: User provided
function() line 0 in unknown
directory unknown file (null)<br>
<br>
...<br>
Thank you.<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 14/4/2014 9:05 PM, Barry Smith
wrote:<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Because IO doesn’t always get
flushed immediately it may not
be hanging at this point. It is
better to use the option
-start_in_debugger then type
cont in each debugger window and
then when you think it is
“hanging” do a control C in each
debugger window and type where
to see where each process is you
can also look around in the
debugger at variables to see why
it is “hanging” at that point.<br>
<br>
Barry<br>
<br>
This routines don’t have any
parallel communication in them
so are unlikely to hang.<br>
<br>
On Apr 14, 2014, at 6:52 AM, TAY
wee-beng<br>
<br>
<<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>><br>
<br>
wrote:<br>
<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Hi,<br>
<br>
My code hangs and I added in
mpi_barrier and print to catch
the bug. I found that it hangs
after printing "7". Is it
because I'm doing something
wrong? I need to access the
u,v,w array so I use
DMDAVecGetArrayF90. After
access, I use
DMDAVecRestoreArrayF90.<br>
<br>
call
DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)<br>
call
MPI_Barrier(MPI_COMM_WORLD,ierr);
if (myid==0) print *,"3"<br>
call
DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)<br>
call
MPI_Barrier(MPI_COMM_WORLD,ierr);
if (myid==0) print *,"4"<br>
call
DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)<br>
call
MPI_Barrier(MPI_COMM_WORLD,ierr);
if (myid==0) print *,"5"<br>
call
I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)<br>
call
MPI_Barrier(MPI_COMM_WORLD,ierr);
if (myid==0) print *,"6"<br>
call
DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
!must be in reverse order<br>
call
MPI_Barrier(MPI_COMM_WORLD,ierr);
if (myid==0) print *,"7"<br>
call
DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)<br>
call
MPI_Barrier(MPI_COMM_WORLD,ierr);
if (myid==0) print *,"8"<br>
call
DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)<br>
-- <br>
Thank you.<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
<br>
<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<code.txt><br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for granted before they begin
their experiments is infinitely more interesting than any
results to which their experiments lead.<br>
-- Norbert Wiener
</div>
</div>
</blockquote>
<br>
</body>
</html>