[petsc-users] Question about KSP, and makefile linking MPICH

Smith, Barry F. bsmith at mcs.anl.gov
Sun Apr 14 00:34:48 CDT 2019



> On Apr 14, 2019, at 12:00 AM, Yuyun Yang <yyang85 at stanford.edu> wrote:
> 
> I tried using another linear solver (via command line options) and the binaryoutput file was produced, but nothing was produced for MumpsCholesky. Does that mean it didn't even do a single linear solve?

   This is only a guess but since the program crashes in the MUMPS Cholesky case perhaps the program crashed before the file was closed so the file output was lost. 

   Anyways send the file output that was generated with the different solver (since presumably it is the same matrix as the MUMPS case) to petsc-maint at mcs.anl.gov and we'll see if we can get MUMPS to crash on it.

  Barry


> 
> Best regards,
> Yuyun
> 
> -----Original Message-----
> From: Smith, Barry F. <bsmith at mcs.anl.gov> 
> Sent: Saturday, April 13, 2019 7:08 PM
> To: Yuyun Yang <yyang85 at stanford.edu>
> Cc: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] Question about KSP, and makefile linking MPICH
> 
> 
>  It will be in the directory where the program is run. 
> 
>  Perhaps you are not calling KSPSetFromOptions()? This is where it is checked.
> 
>   Barry
> 
> 
>> On Apr 13, 2019, at 7:24 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>> 
>> I tried doing -ksp_view_mat binary, but I don't see the binaryoutput file being produced in my output or source file directories. Is it located somewhere else?
>> 
>> Best regards,
>> Yuyun
>> 
>> -----Original Message-----
>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>> Sent: Thursday, April 11, 2019 10:21 PM
>> To: Yuyun Yang <yyang85 at stanford.edu>
>> Cc: petsc-users at mcs.anl.gov
>> Subject: Re: [petsc-users] Question about KSP, and makefile linking 
>> MPICH
>> 
>> 
>> Ok, still a little odd. PCSetOperators() which is called internally by KSPSetOperators() checks if the matrix has changed size and generates an error. Similar if you set a different matrix from before it resets the computation of the preconditioner. So, in theory, your situation should never occur.
>> 
>>  Barry
>> 
>> 
>>> On Apr 12, 2019, at 12:01 AM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>>> 
>>> I think this problem arose because I did not reset the ksp for 
>>> solving a different problem! It's not giving me an error anymore now 
>>> that I added the reset, so it's all good :)
>>> 
>>> Thanks,
>>> Yuyun
>>> 
>>> Get Outlook for iOS
>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>> Sent: Thursday, April 11, 2019 9:21:11 PM
>>> To: Yuyun Yang
>>> Cc: petsc-users at mcs.anl.gov
>>> Subject: Re: [petsc-users] Question about KSP, and makefile linking 
>>> MPICH
>>> 
>>> 
>>>  Ahh, I just realized one other thing we can try. Run the program that crashes with -ksp_mat_view binary  this will produce a file called binaryoutput, send that file to petsc-maint at mcs.anl.gov and we'll see if we can get MUMPS to mis-behave with it also.
>>> 
>>>  Barry
>>> 
>>> 
>>> 
>>>> On Apr 11, 2019, at 11:17 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>>>> 
>>>> Thanks Barry for the detailed answers!
>>>> 
>>>> Regarding the problem with valgrind, this is the only error produced, and if I allow it to run further, the program would break (at a later function I get NaN for some of the values being calculated, and I've put an assert to prevent NaN results). I will take a look at it in the debugger. This is for testing, but for bigger problems I won't end up using Cholesky, so it's not really a big issue.
>>>> 
>>>> Thanks again for the timely help!
>>>> Yuyun
>>>> 
>>>> Get Outlook for iOS
>>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>>> Sent: Thursday, April 11, 2019 6:44:54 PM
>>>> To: Yuyun Yang
>>>> Cc: petsc-users at mcs.anl.gov
>>>> Subject: Re: [petsc-users] Question about KSP, and makefile linking 
>>>> MPICH
>>>> 
>>>> 
>>>> 
>>>>> On Apr 11, 2019, at 5:44 PM, Yuyun Yang via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>> 
>>>>> Hello team,
>>>>> 
>>>>> I’d like to check if it’s ok to use the same ksp object and change its operator (the matrix A) later on in the code to solve a different problem?
>>>> 
>>>>  Do you mean call KSPSetOperators() with one matrix and then later 
>>>> call it with a different matrix? This is ok if the two matrices are 
>>>> the same size and have the same parallel layout. But if the matrices 
>>>> are different size, have different parallel layout then you need to 
>>>> destroy the KSP and create a new one or call KSPReset() in between 
>>>> for example
>>>> 
>>>> KSPSetFromOptions(ksp);
>>>> KSPSetOperators(ksp,A,A);
>>>> KSPSolve(ksp,b,x);
>>>> KSPReset(ksp);
>>>> KSPSetOperators(ksp,B,B);
>>>> KSPSolve(ksp,newb,newx);
>>>> 
>>>>> 
>>>>> Also, I know I’ve asked this before about linking to MPICH when I 
>>>>> call mpirun, instead of using my computer’s default MPI, but I want 
>>>>> to check again. The same problem was solved on my cluster by using 
>>>>> a different CLINKER (called mpiicc) in the Makefile and a different 
>>>>> intel compiler, which will link my compiled code with MPICH. Is 
>>>>> there a similar thing I can do on my own computer, instead of 
>>>>> having to use a very long path to locate the MPICH I configured 
>>>>> with PETSc, and then calling the executable? (I tried making 
>>>>> CLINKER = mpiicc on my own computer but that didn’t work.)
>>>> 
>>>>   Are you asking how you can avoid something like
>>>> 
>>>>     /home/me/petsc/arch-myarch/bin/mpiexec -n 2 ./mycode ?
>>>> 
>>>>  You can add /home/me/petsc/arch-myarch/bin to the beginning of 
>>>> your PATH, for example with bash put the following into your 
>>>> ~/.bashrc file
>>>> 
>>>>     export PATH=/home/me/petsc/arch-myarch/bin:$PATH
>>>>     mpiexec -n 2 ./mycode
>>>> 
>>>>> 
>>>>> The final question is related to valgrind. I have defined a setupKSP function to do all the solver/pc setup. It seems like there is a problem with memory allocation but I don’t really understand why. This only happens for MUMPSCHOLESKY though (running CG, AMG etc. was fine):
>>>>> 
>>>>> ==830== Invalid read of size 8
>>>>> ==830==    at 0x6977C95: dmumps_ana_o_ (dana_aux.F:2054)
>>>>> ==830==    by 0x6913B5A: dmumps_ana_driver_ (dana_driver.F:390)
>>>>> ==830==    by 0x68C152C: dmumps_ (dmumps_driver.F:1213)
>>>>> ==830==    by 0x68BBE1C: dmumps_f77_ (dmumps_f77.F:267)
>>>>> ==830==    by 0x68BA4EB: dmumps_c (mumps_c.c:417)
>>>>> ==830==    by 0x5A070D6: MatCholeskyFactorSymbolic_MUMPS (mumps.c:1654)
>>>>> ==830==    by 0x54071F2: MatCholeskyFactorSymbolic (matrix.c:3179)
>>>>> ==830==    by 0x614AFE9: PCSetUp_Cholesky (cholesky.c:88)
>>>>> ==830==    by 0x62BA574: PCSetUp (precon.c:932)
>>>>> ==830==    by 0x640BB29: KSPSetUp (itfunc.c:391)
>>>>> ==830==    by 0x4A1192: PressureEq::setupKSP(_p_KSP*&, _p_PC*&, _p_Mat*&) (pressureEq.cpp:834)
>>>>> ==830==    by 0x4A1258: PressureEq::computeInitialSteadyStatePressure(Domain&) (pressureEq.cpp:862)
>>>>> 
>>>>> ==830==  Address 0xb8149c0 is 0 bytes after a block of size 7,872 
>>>>> alloc'd
>>>>> 
>>>>> ==830==    at 0x4C2FFC6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>> ==830==    by 0x500E7E0: PetscMallocAlign (mal.c:41)
>>>>> ==830==    by 0x59F8A16: MatConvertToTriples_seqaij_seqsbaij (mumps.c:402)
>>>>> ==830==    by 0x5A06B53: MatCholeskyFactorSymbolic_MUMPS (mumps.c:1618)
>>>>> ==830==    by 0x54071F2: MatCholeskyFactorSymbolic (matrix.c:3179)
>>>>> ==830==    by 0x614AFE9: PCSetUp_Cholesky (cholesky.c:88)
>>>>> ==830==    by 0x62BA574: PCSetUp (precon.c:932)
>>>>> ==830==    by 0x640BB29: KSPSetUp (itfunc.c:391)
>>>>> ==830==    by 0x4A1192: PressureEq::setupKSP(_p_KSP*&, _p_PC*&, _p_Mat*&) (pressureEq.cpp:834)
>>>>> ==830==    by 0x4A1258: PressureEq::computeInitialSteadyStatePressure(Domain&) (pressureEq.cpp:862)
>>>>> ==830==    by 0x49B809: PressureEq::PressureEq(Domain&) (pressureEq.cpp:62)
>>>>> ==830==    by 0x4A88E9: StrikeSlip_LinearElastic_qd::StrikeSlip_LinearElastic_qd(Domain&) (strikeSlip_linearElastic_qd.cpp:57)
>>>> 
>>>>  This is curious. The line in the MUMPS code where valgrind detects 
>>>> a problem is
>>>> 
>>>>           K = 1_8
>>>>           THEMIN = ZERO
>>>>           DO
>>>>              IF(THEMIN .NE. ZERO) EXIT
>>>>              THEMIN = abs(id%A(K))                               <<<<<<< this line
>>>>              K = K+1_8
>>>> 
>>>>  So it has a problem accessing id%A(1)  the very first entry in 
>>>> numerical values of the sparse matrix. Meanwhile it states
>>>>> 0 bytes after a block of size 7,872 alloc'd 
>>>>> MatConvertToTriples_seqaij_seqsbaij (mumps.c:402)  which is where 
>>>>> PETSc allocates
>>>> the values passed to MUMPS. So it almost as if MUMPS never allocated 
>>>> any space for id%A(), I can't imagine why that would ever happen 
>>>> (the problem size is super small so its not like it might have run 
>>>> out of memory)
>>>> 
>>>>   What happens if you allow the valgrind to continue? Do you get more valgrind errors?
>>>> 
>>>>   What happens if run without valgrind? Does it crash at this point 
>>>> in the code? At some later point? Does it run to completion and seem 
>>>> to produce the correct answer? If it crashes, you could run it in the debugger and when it crashes print the value of id, id%A etc and see if they look reasonable.
>>>> 
>>>>  Barry
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Thank you!
>>>>> Yuyun
>> 
> 



More information about the petsc-users mailing list