[petsc-users] error running parallel on cluster
Sepideh Kavousi
skavou1 at lsu.edu
Tue Apr 17 12:50:11 CDT 2018
Mathew,
I previously use DMDATSSetIFunctionLocal(user.da,INSERT_VALUES,(DMDATSIFunctionLocal) FormFunction,&user) in my code. If I want to use your solution I can not use it because in the FormFunction definition I must use arrays, not vectors.So to solve this issue I followed two methods where none were able to solve it.
1- in first method I decided to use TSSetIFunction instead of DMDATSSetIFunctionLocal
for this means first in the main function, I use TSSetDM and my form function variables were as:
PetscErrorCode FormFunction(TS ts,PetscScalar t,Vec Y,Vec Ydot,Vec F, struct VAR_STRUCT *user) {
.
.
.
.
ierr = TSGetDM(ts,&dmda);CHKERRQ(ierr);
ierr= DMDAGetLocalInfo(dmda,&info2) ;CHKERRQ(ierr);
ierr = DMGetLocalVector(dmda,&Ydot_local);CHKERRQ(ierr);
ierr = DMGlobalToLocalBegin(dmda,Ydot,INSERT_VALUES,Ydot_local);CHKERRQ(ierr);
ierr = DMGlobalToLocalEnd(dmda,Ydot,INSERT_VALUES,Ydot_local);CHKERRQ(ierr);
.
.
.
}
But still, it does not consider vectors y,ydot,f related to dmda (problem executing DMDAVecGetArray)
2- in second method I decided to use DMTSSetIFunction
but still, FormFunction is in form of TSIFunction where we do not define dm object and I think it does not understand dm and da are connected, although I have used TSSetDM in the main function.
Can you please help me what should I do?
Regards,
Sepideh
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Monday, April 16, 2018 9:34:01 PM
To: Sepideh Kavousi
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] error running parallel on cluster
On Mon, Apr 16, 2018 at 10:22 PM, Sepideh Kavousi <skavou1 at lsu.edu<mailto:skavou1 at lsu.edu>> wrote:
I run with valgrind with the following command:
/work/skavou1/petsc/debug/bin/mpiexec -np 2 valgrind --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p ./one.out -malloc off -ts_monitor -snes_fd_color -ts_max_snes_failures -1 -ts_type beuler -pc_type bjacobi -snes_linesearch_type nleqerr -snes_type newtontr -ksp_gmres_restart 31 -sub_pc_type ilu -snes_monitor -snes_converged_reason -ksp_monitor
Okay, I think this is coded wrong. The DMDA stuff ignores DMLocal and only chooses to ghost the input vector X and not X_t
https://bitbucket.org/petsc/petsc/annotate/be3efd428a942676a0189b3273b3c582694ff011/src/ts/utils/dmdats.c?at=master&fileviewer=file-view-default#dmdats.c-68
Unfortunately, this was done by Peter in 2012 and he is gone, so we can't blame him.
It is true that very few people use spatial derivatives of X_t, but it does make sense.
Right there, you want to add
ierr = DMGetLocalVector(dm,&Xdotloc);CHKERRQ(ierr);
ierr = DMGlobalToLocalBegin(dm,Xdot,INSERT_VALUES,Xdotloc);CHKERRQ(ierr);
ierr = DMGlobalToLocalEnd(dm,Xdot,INSERT_VALUES,Xdotloc);CHKERRQ(ierr);
ierr = DMDAVecGetArray(dm,Xdotloc,&xdot);CHKERRQ(ierr);
and it should work.
Matt
the valgringoutput is as following:
==39396== Invalid read of size 8
==39396== at 0x402679: FormFunction (one.c:212)
==39396== by 0x611AA5F: TSComputeIFunction_DMDA (dmdats.c:79)
==39396== by 0x61349BD: TSComputeIFunction (ts.c:830)
==39396== by 0x623DD1C: SNESTSFormFunction_Theta (theta.c:649)
==39396== by 0x615E599: SNESTSFormFunction (ts.c:4569)
==39396== by 0x600C84E: SNESComputeFunction (snes.c:2203)
==39396== by 0x60A34B6: SNESSolve_NEWTONTR (tr.c:105)
==39396== by 0x60246C0: SNESSolve (snes.c:4312)
==39396== by 0x6237346: TSTheta_SNESSolve (theta.c:176)
==39396== by 0x6237CA8: TSStep_Theta (theta.c:216)
==39396== by 0x6153380: TSStep (ts.c:3548)
==39396== by 0x615559D: TSSolve (ts.c:3731)
==39396== by 0x403C95: main (one.c:303)
==39396== Address 0x83f5158 is 8 bytes before a block of size 7,200 alloc'd
==39396== at 0x4A05588: memalign (vg_replace_malloc.c:727)
==39396== by 0x4D87516: PetscMallocAlign (mal.c:42)
==39396== by 0x4D88DE0: PetscMallocA (mal.c:397)
==39396== by 0x50AB230: VecGetArray2d (rvector.c:2167)
==39396== by 0x589437E: DMDAVecGetArray (dagetarray.c:73)
==39396== by 0x611A8ED: TSComputeIFunction_DMDA (dmdats.c:74)
==39396== by 0x61349BD: TSComputeIFunction (ts.c:830)
==39396== by 0x623DD1C: SNESTSFormFunction_Theta (theta.c:649)
==39396== by 0x615E599: SNESTSFormFunction (ts.c:4569)
==39396== by 0x600C84E: SNESComputeFunction (snes.c:2203)
==39396== by 0x60A34B6: SNESSolve_NEWTONTR (tr.c:105)
==39396== by 0x60246C0: SNESSolve (snes.c:4312)
==39396== by 0x6237346: TSTheta_SNESSolve (theta.c:176)
==39396== by 0x6237CA8: TSStep_Theta (theta.c:216)
==39396== by 0x6153380: TSStep (ts.c:3548)
==39396== by 0x615559D: TSSolve (ts.c:3731)
==39396== by 0x403C95: main (one.c:303)
==39396==
==39396== Invalid read of size 8
==39396== at 0x402689: FormFunction (one.c:212)
==39396== by 0x611AA5F: TSComputeIFunction_DMDA (dmdats.c:79)
==39396== by 0x61349BD: TSComputeIFunction (ts.c:830)
==39396== by 0x623DD1C: SNESTSFormFunction_Theta (theta.c:649)
==39396== by 0x615E599: SNESTSFormFunction (ts.c:4569)
==39396== by 0x600C84E: SNESComputeFunction (snes.c:2203)
==39396== by 0x60A34B6: SNESSolve_NEWTONTR (tr.c:105)
==39396== by 0x60246C0: SNESSolve (snes.c:4312)
==39396== by 0x6237346: TSTheta_SNESSolve (theta.c:176)
==39396== by 0x6237CA8: TSStep_Theta (theta.c:216)
==39396== by 0x6153380: TSStep (ts.c:3548)
==39396== by 0x615559D: TSSolve (ts.c:3731)
==39396== by 0x403C95: main (one.c:303)
==39396== Address 0x18 is not stack'd, malloc'd or (recently) free'd
==39396==
Thanks,
Sepideh
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Monday, April 16, 2018 8:44:43 PM
To: Sepideh Kavousi
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] error running parallel on cluster
On Mon, Apr 16, 2018 at 9:39 PM, Sepideh Kavousi <skavou1 at lsu.edu<mailto:skavou1 at lsu.edu>> wrote:
this is not used forghost points . my code is :
1) You are using ghost points, since you use i-1 as an index
2) I was wrong. X_t should also be ghosted.
Run under valgrind.
Thanks,
Matt
for (j=info2->ys;j<info2->ys+info2->ym;j++){
for (i=info2->xs;i<info2->xs+info2->xm;i++){
if (i==0) {aF[j][i].U=aY[j][i+1].U-aY[j][i].U ; aF[j][i].p=aY[j][i+1].p-aY[j][i].p ;}
else if (i==info2->mx-1) {aF[j][i].U=aY[j][i-1].U-aY[j][i].U ; aF[j][i].p=aY[j][i].p-aY[j][i-1].p ;}
else if (j==0) {aF[j][i].U=aY[j+1][i].U-aY[j][i].U; aF[j][i].p=aY[j+1][i].p-aY[j][i].p ;}
else if (j==info2->my-1) {aF[j][i].U=aY[j-1][i].U-aY[j][i].U ; aF[j][i].p=aY[j][i].p-aY[j-1][i].p ;}
else {
//derivatives of U and p
Px=(aY[j][i+1].p-aY[j][i-1].p)/(2.0*user->hx);
Py=(aY[j+1][i].p-aY[j-1][i].p)/(2.0*user->hy);
Pxx=((aY[j][i+1].p+aY[j][i-1].p-2.0*aY[j][i].p)/(hx2));
Pxy=((aY[j+1][i+1].p+aY[j-1][i-1].p-aY[j+1][i-1].p-aY[j-1][i+1].p)/(4.0*hxy));
Pyy=((aY[j+1][i].p+aY[j-1][i].p-2.0*aY[j][i].p)/hy2);
Pxt=(aYdot[j][i+1].p-aYdot[j][i-1].p)/(2.0*user->hx);
Pyt=(aYdot[j+1][i].p-aYdot[j-1][i].p)/(2.0*user->hy);
if (((Px*Px)+(Py*Py))>0.0 ) { //
theta=PetscAtanReal(Py/Px);
thetax=(Px*Pxy-Py*Pxx)/((Px*Px)+(Py*Py));
thetay=(Px*Pyy-Py*Pxy)/((Px*Px)+(Py*Py));
}
else {
theta=PETSC_PI*0.5;
thetax=0.0;
thetay=0.0;
}
e=(1.0+user->epsilon*cos(4.0*theta));;
ep=-(4.0*user->epsilon*sin(4.0*theta));
epp=-(4.0*user->epsilon*4.0*cos(4.0*theta));
Ux=(aY[j][i+1].U-aY[j][i-1].U)/(2.0*user->hx);
Uy=(aY[j+1][i].U-aY[j-1][i].U)/(2.0*user->hy);
Uxx=((aY[j][i+1].U+aY[j][i-1].U-2.0*aY[j][i].U)/(hx2));
Uyy=((aY[j+1][i].U+aY[j-1][i].U-2.0*aY[j][i].U)/hy2);
U1=user->D*user->tau_0/(user->W*user->W);
U2= ((Px*Px+Py*Py)>0.0) ? user->a_p/sqrt(Px*Px+Py*Py) : 0.0 ;
//U2=0.0;
P1= (user->W*user->hy*j-user->Vp*user->tau_0*t)/user->lT;
aF[j][i].U=(U1*(-0.5*(Ux*Px+Uy*Py)+0.5*(1-aY[j][i].p)*(Uxx+Uyy))+U2*(1+(1-user->k)*aY[j][i].U)*(Pxt*Px+Pyt*Py+aYdot[j][i].p*(Pxx+Pyy))+U2*(1-user->k)*aYdot[j][i].p*(Ux*Px+Uy*Py)+0.5*(1+(1-user->k)*aY[j][i].U)*aYdot[j][i].p-0.5*(1+user->k-(1-user->k)*aY[j][i].p)*aYdot[j][i].U)*hx2;
aF[j][i].p=( e*e*(Pxx+Pyy)+2.0*e*ep*(thetax*Px+thetay*Py)+(e*epp+ep*ep)*(thetay*Px-thetax*Py)+(aY[j][i].p-aY[j][i].p*aY[j][i].p*aY[j][i].p)-user->lambda*(aY[j][i].p*aY[j][i].p-1)*(aY[j][i].p*aY[j][i].p-1)*(aY[j][i].U+P1)-(1-(1-user->k)*P1)*(1.0+4.0*user->epsilon*cos(4.0*theta))*(1.0+4.0*user->epsilon*cos(4.0*theta))*aYdot[j][i].p )*hx2 ;
}
}
Sepideh
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Monday, April 16, 2018 8:36:37 PM
To: Sepideh Kavousi
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] error running parallel on cluster
On Mon, Apr 16, 2018 at 9:20 PM, Sepideh Kavousi <skavou1 at lsu.edu<mailto:skavou1 at lsu.edu>> wrote:
Hello,
I am solving a PDE where I have the spacial derivtive of the time derivative of a variable. In the DMDATSSetIFunctionLocal function I defined d(dY/dt)/dx as:
(aYdot[j][i+1].p-aYdot[j][i-1].p)/(2.0*user->hx)
I do not think that you get ghosted Ydot in that function.
Matt
on my workstation, it is working but on the cluster it gives an error. I can not run with debugger on the cluster but I have checked and the error is related to this part and I do not have any problem on cluster when running in series. Can you please help me about it.
the error is:
0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: [0] TSComputeIFunction_DMDA line 63 /ddnB/work/skavou1/petsc/src/ts/utils/dmdats.c
[0]PETSC ERROR: [0] TS user implicit function line 829 /ddnB/work/skavou1/petsc/src/ts/interface/ts.c
[0]PETSC ERROR: [0] TSComputeIFunction line 815 /ddnB/work/skavou1/petsc/src/ts/interface/ts.c
[0]PETSC ERROR: [0] SNESTSFormFunction_Theta line 640 /ddnB/work/skavou1/petsc/src/ts/impls/implicit/theta/theta.c
[0]PETSC ERROR: [0] SNESTSFormFunction line 4564 /ddnB/work/skavou1/petsc/src/ts/interface/ts.c
[0]PETSC ERROR: [0] SNES user function line 2202 /ddnB/work/skavou1/petsc/src/snes/interface/snes.c
[0]PETSC ERROR: [0] SNESComputeFunction line 2187 /ddnB/work/skavou1/petsc/src/snes/interface/snes.c
[0]PETSC ERROR: [0] SNESSolve_NEWTONTR line 90 /ddnB/work/skavou1/petsc/src/snes/impls/tr/tr.c
[0]PETSC ERROR: [0] SNESSolve line 4203 /ddnB/work/skavou1/petsc/src/snes/interface/snes.c
[0]PETSC ERROR: [0] TSTheta_SNESSolve line 175 /ddnB/work/skavou1/petsc/src/ts/impls/implicit/theta/theta.c
[0]PETSC ERROR: [0] TSStep_Theta line 191 /ddnB/work/skavou1/petsc/src/ts/impls/implicit/theta/theta.c
[0]PETSC ERROR: [0] TSStep line 3526 /ddnB/work/skavou1/petsc/src/ts/interface/ts.c
[0]PETSC ERROR: [0] TSSolve line 3668 /ddnB/work/skavou1/petsc/src/ts/interface/ts.c
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.9.0, unknown
[0]PETSC ERROR: ./one.out on a debug named mike1 by skavou1 Mon Apr 16 20:11:44 2018
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack
[0]PETSC ERROR: #1 User provided function() line 0 in unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
==29057==
==29057== HEAP SUMMARY:
==29057== in use at exit: 560,151 bytes in 383 blocks
==29057== total heap usage: 8,734 allocs, 8,351 frees, 4,449,350 bytes allocated
==29057==
==29057== LEAK SUMMARY:
==29057== definitely lost: 0 bytes in 0 blocks
==29057== indirectly lost: 0 bytes in 0 blocks
==29057== possibly lost: 0 bytes in 0 blocks
==29057== still reachable: 560,151 bytes in 383 blocks
==29057== suppressed: 0 bytes in 0 blocks
==29057== Rerun with --leak-check=full to see details of leaked memory
==29057==
==29057== For counts of detected and suppressed errors, rerun with: -v
==29057== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)
Thanks,
Sepideh
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.caam.rice.edu/~mk51/>
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.caam.rice.edu/~mk51/>
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.caam.rice.edu/~mk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180417/1e19e201/attachment-0001.html>
More information about the petsc-users
mailing list