From ksi2443 at gmail.com Mon Jan 2 03:15:42 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Mon, 2 Jan 2023 18:15:42 +0900 Subject: [petsc-users] Question - about the 'Hint for performance tuning' Message-ID: Hello, Happy new year!! I have some questions about ?Hint for performance tuning? in user guide of petsc. 1. In the ?Performance Pitfalls and Advice? section, there are 2 modes ?debug? and ?optimized builds. My current setup is debug mode. So I want to change for test the performance the optimized build mode. However, if I configure again, does the existing debug mode disappear? Is there any way to coexist the 2 modes and use them in the run the application? 2. In the guide, there are some paragraphs about optimization level of compiler. To control the optimization level of compiler, I put the ?-O3? as below. Is this right?? CFLAGS = -O3 FFLAGS = CPPFLAGS = FPPFLAGS = app : a1.o a2.o a3.o a4.o $(LINK.C) -o $@ $^ $(LDLIBS) include ${PETSC_DIR}/lib/petsc/conf/rules include ${PETSC_DIR}/lib/petsc/conf/test 3. In the guide, user should put ?-march=native? for using AVX2 or AVX-512. Where should I put the ?-march=native? for using AVX? 4. After read the ?Hint for performance tuning? I understood that for good performance and scalability user should use the multiple node and multiple socket . However, before composing cluster system, many users just can use desktop system. In that case, between intel 13th i9 and amd ryzen 7950x, can the 7950x, which has an arichitecture similar to the server processor, be more suitable for petsc? (Because the architecture of intel desktop cpu is big.little.) Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 2 08:03:31 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 2 Jan 2023 09:03:31 -0500 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: > Hello, > > Happy new year!! > > > > I have some questions about ?Hint for performance tuning? in user guide of > petsc. > > > > 1. In the ?Performance Pitfalls and Advice? section, there are 2 > modes ?debug? and ?optimized builds. My current setup is debug mode. So I > want to change for test the performance the optimized build mode. However, > if I configure again, does the existing debug mode disappear? Is there any > way to coexist the 2 modes and use them in the run the application? > Suppose your current arch is named "arch-main-debug". Then you can make an optimized version using cd $PETSC_DIR ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py --with-debugging=0 --PETSC_ARCH=arch-main-opt > 2. In the guide, there are some paragraphs about optimization level > of compiler. To control the optimization level of compiler, I put the ?-O3? > as below. Is this right?? > > CFLAGS = -O3 > > FFLAGS = > > CPPFLAGS = > > FPPFLAGS = > > > > app : a1.o a2.o a3.o a4.o > > $(LINK.C) -o $@ $^ $(LDLIBS) > > > > include ${PETSC_DIR}/lib/petsc/conf/rules > > include ${PETSC_DIR}/lib/petsc/conf/test > You could dp this, but that only changes it for that directory. It is best to do it by reconfiguring. > 3. In the guide, user should put ?-march=native? for using AVX2 or > AVX-512. Where should I put the ?-march=native? for using AVX? > You can add --COPTFLAGS="" with any flags you want to the configure. 4. After read the ?Hint for performance tuning? I understood that for > good performance and scalability user should use the multiple node and > multiple socket . However, before composing cluster system, many users just > can use desktop system. > In that case, between intel 13th i9 and amd ryzen 7950x, can the 7950x, > which has an arichitecture similar to the server processor, be more > suitable for petsc? (Because the architecture of intel desktop cpu is > big.little.) > A good guide is to run the STREAMS benchmark on the processor. PETSc performance closely tracks that. Thanks, Matt > > > Thanks, > > Hyung Kim > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Mon Jan 2 08:27:38 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Mon, 2 Jan 2023 23:27:38 +0900 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: There are more questions. 1. Following your comments, but there is an error as below. [image: image.png] How can I fix this? 2. After changing the optimized build, then how can I set the debug mode again? 3.Following your comments, the new makefile is as below. Is it right? CFLAGS = -O3 FFLAGS = CPPFLAGS = FPPFLAGS = COPTFLAGS = -march=native app : a1.o a2.o a3.o a4.o $(LINK.C) -o $@ $^ $(LDLIBS) include ${PETSC_DIR}/lib/petsc/conf/rules include ${PETSC_DIR}/lib/petsc/conf/test 4. I have no such processors. Where can I find benchmark information about STREAMS? Thanks, Hyung Kim 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: > On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: > >> Hello, >> >> Happy new year!! >> >> >> >> I have some questions about ?Hint for performance tuning? in user guide >> of petsc. >> >> >> >> 1. In the ?Performance Pitfalls and Advice? section, there are 2 >> modes ?debug? and ?optimized builds. My current setup is debug mode. So I >> want to change for test the performance the optimized build mode. However, >> if I configure again, does the existing debug mode disappear? Is there any >> way to coexist the 2 modes and use them in the run the application? >> > Suppose your current arch is named "arch-main-debug". Then you can make an > optimized version using > > cd $PETSC_DIR > ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py > --with-debugging=0 --PETSC_ARCH=arch-main-opt > > >> 2. In the guide, there are some paragraphs about optimization level >> of compiler. To control the optimization level of compiler, I put the ?-O3? >> as below. Is this right?? >> >> CFLAGS = -O3 >> >> FFLAGS = >> >> CPPFLAGS = >> >> FPPFLAGS = >> >> >> >> app : a1.o a2.o a3.o a4.o >> >> $(LINK.C) -o $@ $^ $(LDLIBS) >> >> >> >> include ${PETSC_DIR}/lib/petsc/conf/rules >> >> include ${PETSC_DIR}/lib/petsc/conf/test >> > > You could dp this, but that only changes it for that directory. It is best > to do it by reconfiguring. > > >> 3. In the guide, user should put ?-march=native? for using AVX2 or >> AVX-512. Where should I put the ?-march=native? for using AVX? >> > You can add --COPTFLAGS="" with any flags you want to the > configure. > > 4. After read the ?Hint for performance tuning? I understood that for >> good performance and scalability user should use the multiple node and >> multiple socket . However, before composing cluster system, many users just >> can use desktop system. >> In that case, between intel 13th i9 and amd ryzen 7950x, can the 7950x, >> which has an arichitecture similar to the server processor, be more >> suitable for petsc? (Because the architecture of intel desktop cpu is >> big.little.) >> > > A good guide is to run the STREAMS benchmark on the processor. PETSc > performance closely tracks that. > > Thanks, > > Matt > > >> >> >> Thanks, >> >> Hyung Kim >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 90630 bytes Desc: not available URL: From mfadams at lbl.gov Mon Jan 2 12:17:25 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 2 Jan 2023 13:17:25 -0500 Subject: [petsc-users] puzzling arkimex logic Message-ID: I am using arkimex and the logic with a failed KSP solve is puzzling. This step starts with a dt of ~.005, the linear solver fails and cuts the time step by 1/4. So far, so good. The step then works but the next time step the time step goes to ~0.006. TS seems to have forgotten that it had to cut the time step back. Perhaps that logic is missing or my parameters need work? Thanks, Mark -ts_adapt_dt_max 0.01 # (source: command line) -ts_adapt_monitor # (source: file) -ts_arkimex_type 1bee # (source: file) -ts_dt .001 # (source: command line) -ts_max_reject 10 # (source: file) -ts_max_snes_failures -1 # (source: file) -ts_max_steps 8000 # (source: command line) -ts_max_time 14 # (source: command line) -ts_monitor # (source: file) -ts_rtol 1e-6 # (source: command line) -ts_type arkimex # (source: file) Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 TSAdapt basic arkimex 0:1bee step 1 accepted t=0.001 + 2.497e-03 dt=5.404e-03 wlte=0.173 wltea= -1 wlter= -1 2 TS dt 0.00540401 time 0.00349731 0 SNES Function norm 1.358886930084e-05 Linear solve did not converge due to DIVERGED_ITS iterations 100 Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0 TSAdapt basic step 2 stage rejected (DIVERGED_LINEAR_SOLVE) t=0.00349731 + 5.404e-03 retrying with dt=1.351e-03 0 SNES Function norm 1.358886930084e-05 Linear solve converged due to CONVERGED_RTOL iterations 19 1 SNES Function norm 4.412110425362e-10 Linear solve converged due to CONVERGED_RTOL iterations 6 2 SNES Function norm 4.978968053066e-13 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 0 SNES Function norm 8.549322067920e-06 Linear solve converged due to CONVERGED_RTOL iterations 14 1 SNES Function norm 8.357075378456e-11 Linear solve converged due to CONVERGED_RTOL iterations 4 2 SNES Function norm 4.983138402512e-13 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 0 SNES Function norm 1.044832467924e-05 Linear solve converged due to CONVERGED_RTOL iterations 13 1 SNES Function norm 1.036101875301e-10 Linear solve converged due to CONVERGED_RTOL iterations 4 2 SNES Function norm 4.984888077288e-13 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 TSAdapt basic arkimex 0:1bee step 2 accepted t=0.00349731 + 1.351e-03 dt=6.305e-03 wlte=0.0372 wltea= -1 wlter= -1 3 TS dt 0.00630456 time 0.00484832 0 SNES Function norm 8.116559104264e-06 Linear solve did not converge due to DIVERGED_ITS iterations 100 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Jan 2 13:05:06 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 2 Jan 2023 14:05:06 -0500 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: > On Jan 2, 2023, at 9:27 AM, ??? wrote: > > There are more questions. > > 1. Following your comments, but there is an error as below. > > How can I fix this? The previous value of PETSC_ARCH was not arch-main-debug do ls $PETSC_DIR and locate the directory whose name begins with arch- that will tell you the previous value used for PETSC_ARCH. > > 2. After changing the optimized build, then how can I set the debug mode again? export PETSC_ARCH=arch- whatever the previous value was and then recompile your code (note you do not need to recompile PETSc just your executable). > > 3.Following your comments, the new makefile is as below. Is it right? > CFLAGS = -O3 > FFLAGS = > CPPFLAGS = > FPPFLAGS = > COPTFLAGS = -march=native > > app : a1.o a2.o a3.o a4.o > $(LINK.C) -o $@ $^ $(LDLIBS) > > include ${PETSC_DIR}/lib/petsc/conf/rules > include ${PETSC_DIR}/lib/petsc/conf/test Best not to set these values in the makefile at all because they will affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 -march=native" > > > 4. I have no such processors. Where can I find benchmark information about STREAMS? do make mpistreams in PETSC_DIR > > > Thanks, > Hyung Kim > > > > > > > > > 2023? 1? 2? (?) ?? 11:03, Matthew Knepley >?? ??: >> On Mon, Jan 2, 2023 at 4:16 AM ??? > wrote: >>> Hello, >>> >>> Happy new year!! >>> >>> >>> I have some questions about ?Hint for performance tuning? in user guide of petsc. >>> >>> >>> 1. In the ?Performance Pitfalls and Advice? section, there are 2 modes ?debug? and ?optimized builds. My current setup is debug mode. So I want to change for test the performance the optimized build mode. However, if I configure again, does the existing debug mode disappear? Is there any way to coexist the 2 modes and use them in the run the application? >>> >> Suppose your current arch is named "arch-main-debug". Then you can make an optimized version using >> >> cd $PETSC_DIR >> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py --with-debugging=0 --PETSC_ARCH=arch-main-opt >> >>> 2. In the guide, there are some paragraphs about optimization level of compiler. To control the optimization level of compiler, I put the ?-O3? as below. Is this right?? >>> >>> CFLAGS = -O3 >>> FFLAGS = >>> CPPFLAGS = >>> FPPFLAGS = >>> >>> app : a1.o a2.o a3.o a4.o >>> $(LINK.C) -o $@ $^ $(LDLIBS) >>> >>> include ${PETSC_DIR}/lib/petsc/conf/rules >>> include ${PETSC_DIR}/lib/petsc/conf/test >> >> You could dp this, but that only changes it for that directory. It is best to do it by reconfiguring. >> >>> 3. In the guide, user should put ?-march=native? for using AVX2 or AVX-512. Where should I put the ?-march=native? for using AVX? >>> >> You can add --COPTFLAGS="" with any flags you want to the configure. >> >>> 4. After read the ?Hint for performance tuning? I understood that for good performance and scalability user should use the multiple node and multiple socket . However, before composing cluster system, many users just can use desktop system. >>> In that case, between intel 13th i9 and amd ryzen 7950x, can the 7950x, which has an arichitecture similar to the server processor, be more suitable for petsc? (Because the architecture of intel desktop cpu is big.little.) >>> >> >> A good guide is to run the STREAMS benchmark on the processor. PETSc performance closely tracks that. >> >> Thanks, >> >> Matt >> >>> >>> Thanks, >>> >>> Hyung Kim >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Mon Jan 2 23:19:41 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Tue, 3 Jan 2023 14:19:41 +0900 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: After reconfigure by ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py --with-debugging=0 --PETSC_ARCH=arch-main-opt , I tried to build my code. But I got an error as below [image: image.png] How can I fix this? 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: > > > On Jan 2, 2023, at 9:27 AM, ??? wrote: > > There are more questions. > > 1. Following your comments, but there is an error as below. > > How can I fix this? > > > The previous value of PETSC_ARCH was not arch-main-debug do ls > $PETSC_DIR and locate the directory whose name begins with arch- that will > tell you the previous value used for PETSC_ARCH. > > > 2. After changing the optimized build, then how can I set the debug mode > again? > > > export PETSC_ARCH=arch- whatever the previous value was and then > recompile your code (note you do not need to recompile PETSc just your > executable). > > > 3.Following your comments, the new makefile is as below. Is it right? > CFLAGS = -O3 > FFLAGS = > CPPFLAGS = > FPPFLAGS = > COPTFLAGS = -march=native > > > app : a1.o a2.o a3.o a4.o > $(LINK.C) -o $@ $^ $(LDLIBS) > > > include ${PETSC_DIR}/lib/petsc/conf/rules > include ${PETSC_DIR}/lib/petsc/conf/test > > > Best not to set these values in the makefile at all because they will > affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 > -march=native" > > > > 4. I have no such processors. Where can I find benchmark information about > STREAMS? > > > do make mpistreams in PETSC_DIR > > > > > Thanks, > Hyung Kim > > > > > > > > > 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: > >> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >> >>> Hello, >>> >>> Happy new year!! >>> >>> >>> I have some questions about ?Hint for performance tuning? in user guide >>> of petsc. >>> >>> >>> 1. In the ?Performance Pitfalls and Advice? section, there are 2 >>> modes ?debug? and ?optimized builds. My current setup is debug mode. So I >>> want to change for test the performance the optimized build mode. However, >>> if I configure again, does the existing debug mode disappear? Is there any >>> way to coexist the 2 modes and use them in the run the application? >>> >> Suppose your current arch is named "arch-main-debug". Then you can make >> an optimized version using >> >> cd $PETSC_DIR >> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >> --with-debugging=0 --PETSC_ARCH=arch-main-opt >> >> >>> 2. In the guide, there are some paragraphs about optimization level >>> of compiler. To control the optimization level of compiler, I put the ?-O3? >>> as below. Is this right?? >>> CFLAGS = -O3 >>> FFLAGS = >>> CPPFLAGS = >>> FPPFLAGS = >>> >>> >>> app : a1.o a2.o a3.o a4.o >>> $(LINK.C) -o $@ $^ $(LDLIBS) >>> >>> >>> include ${PETSC_DIR}/lib/petsc/conf/rules >>> include ${PETSC_DIR}/lib/petsc/conf/test >>> >> >> You could dp this, but that only changes it for that directory. It is >> best to do it by reconfiguring. >> >> >>> 3. In the guide, user should put ?-march=native? for using AVX2 or >>> AVX-512. Where should I put the ?-march=native? for using AVX? >>> >> You can add --COPTFLAGS="" with any flags you want to the >> configure. >> >> 4. After read the ?Hint for performance tuning? I understood that >>> for good performance and scalability user should use the multiple node and >>> multiple socket . However, before composing cluster system, many users just >>> can use desktop system. >>> In that case, between intel 13th i9 and amd ryzen 7950x, can the 7950x, >>> which has an arichitecture similar to the server processor, be more >>> suitable for petsc? (Because the architecture of intel desktop cpu is >>> big.little.) >>> >> >> A good guide is to run the STREAMS benchmark on the processor. PETSc >> performance closely tracks that. >> >> Thanks, >> >> Matt >> >> >>> >>> >>> Thanks, >>> >>> Hyung Kim >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: From ksi2443 at gmail.com Mon Jan 2 23:48:35 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Tue, 3 Jan 2023 14:48:35 +0900 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: I got more questions so I resend this email with more questions. 1.After reconfigure by ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py --with-debugging=0 --PETSC_ARCH=arch-main-opt , I tried to build my code. But I got an error as below [image: image.png] How can I fix this? 2. CFLAGS = -O3 FFLAGS = CPPFLAGS = FPPFLAGS = COPTFLAGS = -march=native app : a1.o a2.o a3.o a4.o $(LINK.C) -o $@ $^ $(LDLIBS) include ${PETSC_DIR}/lib/petsc/conf/rules include ${PETSC_DIR}/lib/petsc/conf/test Upper makefile is just for build my own code not for src of petsc. In this case, also has problem as your comments? 3. I have no such processors. Where can I find benchmark information about STREAMS? I mean I don't have such various processors. So I just wondering the results of STREAMS each processors (already existing results of various processors). Can anyone have results of benchmark of various desktop systems? Thanks, Hyung Kim 2023? 1? 3? (?) ?? 2:19, ??? ?? ??: > After reconfigure by > ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py > --with-debugging=0 --PETSC_ARCH=arch-main-opt , > I tried to build my code. > > But I got an error as below > [image: image.png] > How can I fix this? > > > > > 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: > >> >> >> On Jan 2, 2023, at 9:27 AM, ??? wrote: >> >> There are more questions. >> >> 1. Following your comments, but there is an error as below. >> >> How can I fix this? >> >> >> The previous value of PETSC_ARCH was not arch-main-debug do ls >> $PETSC_DIR and locate the directory whose name begins with arch- that will >> tell you the previous value used for PETSC_ARCH. >> >> >> 2. After changing the optimized build, then how can I set the debug mode >> again? >> >> >> export PETSC_ARCH=arch- whatever the previous value was and then >> recompile your code (note you do not need to recompile PETSc just your >> executable). >> >> >> 3.Following your comments, the new makefile is as below. Is it right? >> CFLAGS = -O3 >> FFLAGS = >> CPPFLAGS = >> FPPFLAGS = >> COPTFLAGS = -march=native >> >> >> app : a1.o a2.o a3.o a4.o >> $(LINK.C) -o $@ $^ $(LDLIBS) >> >> >> include ${PETSC_DIR}/lib/petsc/conf/rules >> include ${PETSC_DIR}/lib/petsc/conf/test >> >> >> Best not to set these values in the makefile at all because they will >> affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 >> -march=native" >> >> >> >> 4. I have no such processors. Where can I find benchmark information >> about STREAMS? >> >> >> do make mpistreams in PETSC_DIR >> >> >> >> >> Thanks, >> Hyung Kim >> >> >> >> >> >> >> >> >> 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: >> >>> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >>> >>>> Hello, >>>> >>>> Happy new year!! >>>> >>>> >>>> I have some questions about ?Hint for performance tuning? in user guide >>>> of petsc. >>>> >>>> >>>> 1. In the ?Performance Pitfalls and Advice? section, there are 2 >>>> modes ?debug? and ?optimized builds. My current setup is debug mode. So I >>>> want to change for test the performance the optimized build mode. However, >>>> if I configure again, does the existing debug mode disappear? Is there any >>>> way to coexist the 2 modes and use them in the run the application? >>>> >>> Suppose your current arch is named "arch-main-debug". Then you can make >>> an optimized version using >>> >>> cd $PETSC_DIR >>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>> --with-debugging=0 --PETSC_ARCH=arch-main-opt >>> >>> >>>> 2. In the guide, there are some paragraphs about optimization >>>> level of compiler. To control the optimization level of compiler, I put the >>>> ?-O3? as below. Is this right?? >>>> CFLAGS = -O3 >>>> FFLAGS = >>>> CPPFLAGS = >>>> FPPFLAGS = >>>> >>>> >>>> app : a1.o a2.o a3.o a4.o >>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>> >>>> >>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>> >>> >>> You could dp this, but that only changes it for that directory. It is >>> best to do it by reconfiguring. >>> >>> >>>> 3. In the guide, user should put ?-march=native? for using AVX2 or >>>> AVX-512. Where should I put the ?-march=native? for using AVX? >>>> >>> You can add --COPTFLAGS="" with any flags you want to the >>> configure. >>> >>> 4. After read the ?Hint for performance tuning? I understood that >>>> for good performance and scalability user should use the multiple node and >>>> multiple socket . However, before composing cluster system, many users just >>>> can use desktop system. >>>> In that case, between intel 13th i9 and amd ryzen 7950x, can the 7950x, >>>> which has an arichitecture similar to the server processor, be more >>>> suitable for petsc? (Because the architecture of intel desktop cpu is >>>> big.little.) >>>> >>> >>> A good guide is to run the STREAMS benchmark on the processor. PETSc >>> performance closely tracks that. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> >>>> >>>> Thanks, >>>> >>>> Hyung Kim >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: From knepley at gmail.com Tue Jan 3 05:54:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Jan 2023 06:54:56 -0500 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: On Tue, Jan 3, 2023 at 12:48 AM ??? wrote: > I got more questions so I resend this email with more questions. > > 1.After reconfigure by > ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py > --with-debugging=0 --PETSC_ARCH=arch-main-opt , > I tried to build my code. > > But I got an error as below > [image: image.png] > How can I fix this? > You reconfigured with PETSC_ARCH=arch-main-opt, but your build is using "arch-linux-c-opt". Why are you changing it? > > 2. > > CFLAGS = -O3 > FFLAGS = > CPPFLAGS = > FPPFLAGS = > COPTFLAGS = -march=native > > > app : a1.o a2.o a3.o a4.o > $(LINK.C) -o $@ $^ $(LDLIBS) > > > include ${PETSC_DIR}/lib/petsc/conf/rules > include ${PETSC_DIR}/lib/petsc/conf/test > > Upper makefile is just for build my own code not for src of petsc. > In this case, also has problem as your comments? > I do not understand your question. > > 3. I have no such processors. Where can I find benchmark information > about STREAMS? > I mean I don't have such various processors. So I just wondering the > results of STREAMS each processors (already existing results of various > processors). > Can anyone have results of benchmark of various desktop systems? > There are catalogues of STREAMS results online. Thanks, Matt > Thanks, > Hyung Kim > > > > > > 2023? 1? 3? (?) ?? 2:19, ??? ?? ??: > >> After reconfigure by >> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >> I tried to build my code. >> >> But I got an error as below >> [image: image.png] >> How can I fix this? >> >> >> >> >> 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: >> >>> >>> >>> On Jan 2, 2023, at 9:27 AM, ??? wrote: >>> >>> There are more questions. >>> >>> 1. Following your comments, but there is an error as below. >>> >>> How can I fix this? >>> >>> >>> The previous value of PETSC_ARCH was not arch-main-debug do ls >>> $PETSC_DIR and locate the directory whose name begins with arch- that will >>> tell you the previous value used for PETSC_ARCH. >>> >>> >>> 2. After changing the optimized build, then how can I set the debug mode >>> again? >>> >>> >>> export PETSC_ARCH=arch- whatever the previous value was and then >>> recompile your code (note you do not need to recompile PETSc just your >>> executable). >>> >>> >>> 3.Following your comments, the new makefile is as below. Is it right? >>> CFLAGS = -O3 >>> FFLAGS = >>> CPPFLAGS = >>> FPPFLAGS = >>> COPTFLAGS = -march=native >>> >>> >>> app : a1.o a2.o a3.o a4.o >>> $(LINK.C) -o $@ $^ $(LDLIBS) >>> >>> >>> include ${PETSC_DIR}/lib/petsc/conf/rules >>> include ${PETSC_DIR}/lib/petsc/conf/test >>> >>> >>> Best not to set these values in the makefile at all because they will >>> affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 >>> -march=native" >>> >>> >>> >>> 4. I have no such processors. Where can I find benchmark information >>> about STREAMS? >>> >>> >>> do make mpistreams in PETSC_DIR >>> >>> >>> >>> >>> Thanks, >>> Hyung Kim >>> >>> >>> >>> >>> >>> >>> >>> >>> 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: >>> >>>> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >>>> >>>>> Hello, >>>>> >>>>> Happy new year!! >>>>> >>>>> >>>>> I have some questions about ?Hint for performance tuning? in user >>>>> guide of petsc. >>>>> >>>>> >>>>> 1. In the ?Performance Pitfalls and Advice? section, there are 2 >>>>> modes ?debug? and ?optimized builds. My current setup is debug mode. So I >>>>> want to change for test the performance the optimized build mode. However, >>>>> if I configure again, does the existing debug mode disappear? Is there any >>>>> way to coexist the 2 modes and use them in the run the application? >>>>> >>>> Suppose your current arch is named "arch-main-debug". Then you can make >>>> an optimized version using >>>> >>>> cd $PETSC_DIR >>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt >>>> >>>> >>>>> 2. In the guide, there are some paragraphs about optimization >>>>> level of compiler. To control the optimization level of compiler, I put the >>>>> ?-O3? as below. Is this right?? >>>>> CFLAGS = -O3 >>>>> FFLAGS = >>>>> CPPFLAGS = >>>>> FPPFLAGS = >>>>> >>>>> >>>>> app : a1.o a2.o a3.o a4.o >>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>> >>>>> >>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>> >>>> >>>> You could dp this, but that only changes it for that directory. It is >>>> best to do it by reconfiguring. >>>> >>>> >>>>> 3. In the guide, user should put ?-march=native? for using AVX2 >>>>> or AVX-512. Where should I put the ?-march=native? for using AVX? >>>>> >>>> You can add --COPTFLAGS="" with any flags you want to the >>>> configure. >>>> >>>> 4. After read the ?Hint for performance tuning? I understood that >>>>> for good performance and scalability user should use the multiple node and >>>>> multiple socket . However, before composing cluster system, many users just >>>>> can use desktop system. >>>>> In that case, between intel 13th i9 and amd ryzen 7950x, can the >>>>> 7950x, which has an arichitecture similar to the server processor, be more >>>>> suitable for petsc? (Because the architecture of intel desktop cpu is >>>>> big.little.) >>>>> >>>> >>>> A good guide is to run the STREAMS benchmark on the processor. PETSc >>>> performance closely tracks that. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Hyung Kim >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: From ksi2443 at gmail.com Tue Jan 3 06:22:42 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Tue, 3 Jan 2023 21:22:42 +0900 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: 2023? 1? 3? (?) ?? 8:55, Matthew Knepley ?? ??: > On Tue, Jan 3, 2023 at 12:48 AM ??? wrote: > >> I got more questions so I resend this email with more questions. >> >> 1.After reconfigure by >> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >> I tried to build my code. >> >> But I got an error as below >> [image: image.png] >> How can I fix this? >> > > You reconfigured with PETSC_ARCH=arch-main-opt, but your build is using > "arch-linux-c-opt". Why are you changing it? > At first Matthew told me use './arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py --with-debugging=0 --PETSC_ARCH=arch-main-opt' for build optimized. However, I got an error the error is as below. [image: image.png] So Barry told me : "The previous value of PETSC_ARCH was not arch-main-debug do ls $PETSC_DIR and locate the directory whose name begins with arch- that will tell you the previous value used for PETSC_ARCH." So I used "./arch-linux-c-debug/lib/petsc/conf/reconfigure-arch-linux-c-debug.py --with-debugging=0 --PETSC_ARCH=arch-linux-c-opt" That command worked. However I got an error when I build my code and the error is as below. [image: image.png] > >> >> 2. >> >> CFLAGS = -O3 >> FFLAGS = >> CPPFLAGS = >> FPPFLAGS = >> COPTFLAGS = -march=native >> >> >> app : a1.o a2.o a3.o a4.o >> $(LINK.C) -o $@ $^ $(LDLIBS) >> >> >> include ${PETSC_DIR}/lib/petsc/conf/rules >> include ${PETSC_DIR}/lib/petsc/conf/test >> >> Upper makefile is just for build my own code not for src of petsc. >> In this case, also has problem as your comments? >> > > I do not understand your question. > My questions is : When building my own code without reconfigure of PETSc, I wonder if there will be performance improvement by building with the high optimization level of compiler even if the makefile is configured as above. > > >> >> 3. I have no such processors. Where can I find benchmark information >> about STREAMS? >> I mean I don't have such various processors. So I just wondering the >> results of STREAMS each processors (already existing results of various >> processors). >> Can anyone have results of benchmark of various desktop systems? >> > > There are catalogues of STREAMS results online. > Sorry, I can't find proper catalogues. What's the keyword for googling or could you please give a link of catalogues? Thanks, Hyung Kim > > Thanks, > > Matt > > >> Thanks, >> Hyung Kim >> >> >> >> >> >> 2023? 1? 3? (?) ?? 2:19, ??? ?? ??: >> >>> After reconfigure by >>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>> I tried to build my code. >>> >>> But I got an error as below >>> [image: image.png] >>> How can I fix this? >>> >>> >>> >>> >>> 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: >>> >>>> >>>> >>>> On Jan 2, 2023, at 9:27 AM, ??? wrote: >>>> >>>> There are more questions. >>>> >>>> 1. Following your comments, but there is an error as below. >>>> >>>> How can I fix this? >>>> >>>> >>>> The previous value of PETSC_ARCH was not arch-main-debug do ls >>>> $PETSC_DIR and locate the directory whose name begins with arch- that will >>>> tell you the previous value used for PETSC_ARCH. >>>> >>>> >>>> 2. After changing the optimized build, then how can I set the debug >>>> mode again? >>>> >>>> >>>> export PETSC_ARCH=arch- whatever the previous value was and then >>>> recompile your code (note you do not need to recompile PETSc just your >>>> executable). >>>> >>>> >>>> 3.Following your comments, the new makefile is as below. Is it right? >>>> CFLAGS = -O3 >>>> FFLAGS = >>>> CPPFLAGS = >>>> FPPFLAGS = >>>> COPTFLAGS = -march=native >>>> >>>> >>>> app : a1.o a2.o a3.o a4.o >>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>> >>>> >>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>> >>>> >>>> Best not to set these values in the makefile at all because they will >>>> affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 >>>> -march=native" >>>> >>>> >>>> >>>> 4. I have no such processors. Where can I find benchmark information >>>> about STREAMS? >>>> >>>> >>>> do make mpistreams in PETSC_DIR >>>> >>>> >>>> >>>> >>>> Thanks, >>>> Hyung Kim >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: >>>> >>>>> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Happy new year!! >>>>>> >>>>>> >>>>>> I have some questions about ?Hint for performance tuning? in user >>>>>> guide of petsc. >>>>>> >>>>>> >>>>>> 1. In the ?Performance Pitfalls and Advice? section, there are 2 >>>>>> modes ?debug? and ?optimized builds. My current setup is debug mode. So I >>>>>> want to change for test the performance the optimized build mode. However, >>>>>> if I configure again, does the existing debug mode disappear? Is there any >>>>>> way to coexist the 2 modes and use them in the run the application? >>>>>> >>>>> Suppose your current arch is named "arch-main-debug". Then you can >>>>> make an optimized version using >>>>> >>>>> cd $PETSC_DIR >>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt >>>>> >>>>> >>>>>> 2. In the guide, there are some paragraphs about optimization >>>>>> level of compiler. To control the optimization level of compiler, I put the >>>>>> ?-O3? as below. Is this right?? >>>>>> CFLAGS = -O3 >>>>>> FFLAGS = >>>>>> CPPFLAGS = >>>>>> FPPFLAGS = >>>>>> >>>>>> >>>>>> app : a1.o a2.o a3.o a4.o >>>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>>> >>>>>> >>>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>>> >>>>> >>>>> You could dp this, but that only changes it for that directory. It is >>>>> best to do it by reconfiguring. >>>>> >>>>> >>>>>> 3. In the guide, user should put ?-march=native? for using AVX2 >>>>>> or AVX-512. Where should I put the ?-march=native? for using AVX? >>>>>> >>>>> You can add --COPTFLAGS="" with any flags you want to the >>>>> configure. >>>>> >>>>> 4. After read the ?Hint for performance tuning? I understood that >>>>>> for good performance and scalability user should use the multiple node and >>>>>> multiple socket . However, before composing cluster system, many users just >>>>>> can use desktop system. >>>>>> In that case, between intel 13th i9 and amd ryzen 7950x, can the >>>>>> 7950x, which has an arichitecture similar to the server processor, be more >>>>>> suitable for petsc? (Because the architecture of intel desktop cpu is >>>>>> big.little.) >>>>>> >>>>> >>>>> A good guide is to run the STREAMS benchmark on the processor. PETSc >>>>> performance closely tracks that. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Hyung Kim >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 75744 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 192554 bytes Desc: not available URL: From knepley at gmail.com Tue Jan 3 06:45:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Jan 2023 07:45:56 -0500 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: On Tue, Jan 3, 2023 at 7:22 AM ??? wrote: > > > 2023? 1? 3? (?) ?? 8:55, Matthew Knepley ?? ??: > >> On Tue, Jan 3, 2023 at 12:48 AM ??? wrote: >> >>> I got more questions so I resend this email with more questions. >>> >>> 1.After reconfigure by >>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>> I tried to build my code. >>> >>> But I got an error as below >>> [image: image.png] >>> How can I fix this? >>> >> >> You reconfigured with PETSC_ARCH=arch-main-opt, but your build is using >> "arch-linux-c-opt". Why are you changing it? >> > > At first Matthew told me use > './arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py > --with-debugging=0 --PETSC_ARCH=arch-main-opt' for build optimized. > However, I got an error the error is as below. > No, I did not say that. Here is the quote: Suppose your current arch is named "arch-main-debug". But your arch was not arch-imain-debug. Instead it was arch-linux-c-debug, which you used. [image: image.png] > So Barry told me : "The previous value of PETSC_ARCH was not > arch-main-debug do ls $PETSC_DIR and locate the directory whose name begins > with arch- that will tell you the previous value used for PETSC_ARCH." > So I used > "./arch-linux-c-debug/lib/petsc/conf/reconfigure-arch-linux-c-debug.py > --with-debugging=0 --PETSC_ARCH=arch-linux-c-opt" > That command worked. However I got an error when I build my code and the > error is as below. > Your build is saying that the library does not exist. After configuration, did you build the library? cd $PETSC_DIR PETSC_ARCH=arch-linux-c-opt make all > [image: image.png] > > >> >>> >>> 2. >>> >>> CFLAGS = -O3 >>> FFLAGS = >>> CPPFLAGS = >>> FPPFLAGS = >>> COPTFLAGS = -march=native >>> >>> >>> app : a1.o a2.o a3.o a4.o >>> $(LINK.C) -o $@ $^ $(LDLIBS) >>> >>> >>> include ${PETSC_DIR}/lib/petsc/conf/rules >>> include ${PETSC_DIR}/lib/petsc/conf/test >>> >>> Upper makefile is just for build my own code not for src of petsc. >>> In this case, also has problem as your comments? >>> >> >> I do not understand your question. >> > > My questions is : > When building my own code without reconfigure of PETSc, I wonder if there > will be performance improvement by building with the high optimization > level of compiler even if the makefile is configured as above. > If significant time is spent in your code rather than PETSc, it is possible. > >> >>> >>> 3. I have no such processors. Where can I find benchmark information >>> about STREAMS? >>> I mean I don't have such various processors. So I just wondering the >>> results of STREAMS each processors (already existing results of various >>> processors). >>> Can anyone have results of benchmark of various desktop systems? >>> >> >> There are catalogues of STREAMS results online. >> > > Sorry, I can't find proper catalogues. What's the keyword for googling or > could you please give a link of catalogues? > https://www.cs.virginia.edu/stream/ Thanks, Matt > Thanks, > Hyung Kim > > > >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Hyung Kim >>> >>> >>> >>> >>> >>> 2023? 1? 3? (?) ?? 2:19, ??? ?? ??: >>> >>>> After reconfigure by >>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>>> I tried to build my code. >>>> >>>> But I got an error as below >>>> [image: image.png] >>>> How can I fix this? >>>> >>>> >>>> >>>> >>>> 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: >>>> >>>>> >>>>> >>>>> On Jan 2, 2023, at 9:27 AM, ??? wrote: >>>>> >>>>> There are more questions. >>>>> >>>>> 1. Following your comments, but there is an error as below. >>>>> >>>>> How can I fix this? >>>>> >>>>> >>>>> The previous value of PETSC_ARCH was not arch-main-debug do ls >>>>> $PETSC_DIR and locate the directory whose name begins with arch- that will >>>>> tell you the previous value used for PETSC_ARCH. >>>>> >>>>> >>>>> 2. After changing the optimized build, then how can I set the debug >>>>> mode again? >>>>> >>>>> >>>>> export PETSC_ARCH=arch- whatever the previous value was and then >>>>> recompile your code (note you do not need to recompile PETSc just your >>>>> executable). >>>>> >>>>> >>>>> 3.Following your comments, the new makefile is as below. Is it right? >>>>> CFLAGS = -O3 >>>>> FFLAGS = >>>>> CPPFLAGS = >>>>> FPPFLAGS = >>>>> COPTFLAGS = -march=native >>>>> >>>>> >>>>> app : a1.o a2.o a3.o a4.o >>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>> >>>>> >>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>> >>>>> >>>>> Best not to set these values in the makefile at all because they will >>>>> affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 >>>>> -march=native" >>>>> >>>>> >>>>> >>>>> 4. I have no such processors. Where can I find benchmark information >>>>> about STREAMS? >>>>> >>>>> >>>>> do make mpistreams in PETSC_DIR >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Hyung Kim >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: >>>>> >>>>>> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Happy new year!! >>>>>>> >>>>>>> >>>>>>> I have some questions about ?Hint for performance tuning? in user >>>>>>> guide of petsc. >>>>>>> >>>>>>> >>>>>>> 1. In the ?Performance Pitfalls and Advice? section, there are >>>>>>> 2 modes ?debug? and ?optimized builds. My current setup is debug mode. So I >>>>>>> want to change for test the performance the optimized build mode. However, >>>>>>> if I configure again, does the existing debug mode disappear? Is there any >>>>>>> way to coexist the 2 modes and use them in the run the application? >>>>>>> >>>>>> Suppose your current arch is named "arch-main-debug". Then you can >>>>>> make an optimized version using >>>>>> >>>>>> cd $PETSC_DIR >>>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt >>>>>> >>>>>> >>>>>>> 2. In the guide, there are some paragraphs about optimization >>>>>>> level of compiler. To control the optimization level of compiler, I put the >>>>>>> ?-O3? as below. Is this right?? >>>>>>> CFLAGS = -O3 >>>>>>> FFLAGS = >>>>>>> CPPFLAGS = >>>>>>> FPPFLAGS = >>>>>>> >>>>>>> >>>>>>> app : a1.o a2.o a3.o a4.o >>>>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>>>> >>>>>>> >>>>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>>>> >>>>>> >>>>>> You could dp this, but that only changes it for that directory. It is >>>>>> best to do it by reconfiguring. >>>>>> >>>>>> >>>>>>> 3. In the guide, user should put ?-march=native? for using AVX2 >>>>>>> or AVX-512. Where should I put the ?-march=native? for using AVX? >>>>>>> >>>>>> You can add --COPTFLAGS="" with any flags you want to the >>>>>> configure. >>>>>> >>>>>> 4. After read the ?Hint for performance tuning? I understood >>>>>>> that for good performance and scalability user should use the multiple node >>>>>>> and multiple socket . However, before composing cluster system, many users >>>>>>> just can use desktop system. >>>>>>> In that case, between intel 13th i9 and amd ryzen 7950x, can the >>>>>>> 7950x, which has an arichitecture similar to the server processor, be more >>>>>>> suitable for petsc? (Because the architecture of intel desktop cpu is >>>>>>> big.little.) >>>>>>> >>>>>> >>>>>> A good guide is to run the STREAMS benchmark on the processor. PETSc >>>>>> performance closely tracks that. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Hyung Kim >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 75744 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 192554 bytes Desc: not available URL: From ksi2443 at gmail.com Tue Jan 3 09:09:03 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Wed, 4 Jan 2023 00:09:03 +0900 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: 2023? 1? 3? (?) ?? 9:46, Matthew Knepley ?? ??: > On Tue, Jan 3, 2023 at 7:22 AM ??? wrote: > >> >> >> 2023? 1? 3? (?) ?? 8:55, Matthew Knepley ?? ??: >> >>> On Tue, Jan 3, 2023 at 12:48 AM ??? wrote: >>> >>>> I got more questions so I resend this email with more questions. >>>> >>>> 1.After reconfigure by >>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>>> I tried to build my code. >>>> >>>> But I got an error as below >>>> [image: image.png] >>>> How can I fix this? >>>> >>> >>> You reconfigured with PETSC_ARCH=arch-main-opt, but your build is using >>> "arch-linux-c-opt". Why are you changing it? >>> >> >> At first Matthew told me use >> './arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >> --with-debugging=0 --PETSC_ARCH=arch-main-opt' for build optimized. >> However, I got an error the error is as below. >> > > No, I did not say that. Here is the quote: > > Suppose your current arch is named "arch-main-debug". > > But your arch was not arch-imain-debug. Instead it was arch-linux-c-debug, > which you used. > > [image: image.png] >> So Barry told me : "The previous value of PETSC_ARCH was not >> arch-main-debug do ls $PETSC_DIR and locate the directory whose name begins >> with arch- that will tell you the previous value used for PETSC_ARCH." >> So I used >> "./arch-linux-c-debug/lib/petsc/conf/reconfigure-arch-linux-c-debug.py >> --with-debugging=0 --PETSC_ARCH=arch-linux-c-opt" >> That command worked. However I got an error when I build my code and the >> error is as below. >> > > Your build is saying that the library does not exist. After configuration, > did you build the library? > > cd $PETSC_DIR > PETSC_ARCH=arch-linux-c-opt make all > You are Right I didn't do build. So following your comments, I did build. But, there are some errors when build my own code. It didn't happen when in debug mode, but the following error occurred. [image: image.png] What could be the reason and how to fix it? > >> [image: image.png] >> >> >>> >>>> >>>> 2. >>>> >>>> CFLAGS = -O3 >>>> FFLAGS = >>>> CPPFLAGS = >>>> FPPFLAGS = >>>> COPTFLAGS = -march=native >>>> >>>> >>>> app : a1.o a2.o a3.o a4.o >>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>> >>>> >>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>> >>>> Upper makefile is just for build my own code not for src of petsc. >>>> In this case, also has problem as your comments? >>>> >>> >>> I do not understand your question. >>> >> >> My questions is : >> When building my own code without reconfigure of PETSc, I wonder if there >> will be performance improvement by building with the high optimization >> level of compiler even if the makefile is configured as above. >> > > If significant time is spent in your code rather than PETSc, it is > possible. > > >> >>> >>>> >>>> 3. I have no such processors. Where can I find benchmark information >>>> about STREAMS? >>>> I mean I don't have such various processors. So I just wondering the >>>> results of STREAMS each processors (already existing results of various >>>> processors). >>>> Can anyone have results of benchmark of various desktop systems? >>>> >>> >>> There are catalogues of STREAMS results online. >>> >> >> Sorry, I can't find proper catalogues. What's the keyword for googling or >> could you please give a link of catalogues? >> > > https://www.cs.virginia.edu/stream/ > > Thanks, > > Matt > > >> Thanks, >> Hyung Kim >> >> >> >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Hyung Kim >>>> >>>> >>>> >>>> >>>> >>>> 2023? 1? 3? (?) ?? 2:19, ??? ?? ??: >>>> >>>>> After reconfigure by >>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>>>> I tried to build my code. >>>>> >>>>> But I got an error as below >>>>> [image: image.png] >>>>> How can I fix this? >>>>> >>>>> >>>>> >>>>> >>>>> 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: >>>>> >>>>>> >>>>>> >>>>>> On Jan 2, 2023, at 9:27 AM, ??? wrote: >>>>>> >>>>>> There are more questions. >>>>>> >>>>>> 1. Following your comments, but there is an error as below. >>>>>> >>>>>> How can I fix this? >>>>>> >>>>>> >>>>>> The previous value of PETSC_ARCH was not arch-main-debug do ls >>>>>> $PETSC_DIR and locate the directory whose name begins with arch- that will >>>>>> tell you the previous value used for PETSC_ARCH. >>>>>> >>>>>> >>>>>> 2. After changing the optimized build, then how can I set the debug >>>>>> mode again? >>>>>> >>>>>> >>>>>> export PETSC_ARCH=arch- whatever the previous value was and then >>>>>> recompile your code (note you do not need to recompile PETSc just your >>>>>> executable). >>>>>> >>>>>> >>>>>> 3.Following your comments, the new makefile is as below. Is it right? >>>>>> CFLAGS = -O3 >>>>>> FFLAGS = >>>>>> CPPFLAGS = >>>>>> FPPFLAGS = >>>>>> COPTFLAGS = -march=native >>>>>> >>>>>> >>>>>> app : a1.o a2.o a3.o a4.o >>>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>>> >>>>>> >>>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>>> >>>>>> >>>>>> Best not to set these values in the makefile at all because they will >>>>>> affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 >>>>>> -march=native" >>>>>> >>>>>> >>>>>> >>>>>> 4. I have no such processors. Where can I find benchmark information >>>>>> about STREAMS? >>>>>> >>>>>> >>>>>> do make mpistreams in PETSC_DIR >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Hyung Kim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: >>>>>> >>>>>>> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Happy new year!! >>>>>>>> >>>>>>>> >>>>>>>> I have some questions about ?Hint for performance tuning? in user >>>>>>>> guide of petsc. >>>>>>>> >>>>>>>> >>>>>>>> 1. In the ?Performance Pitfalls and Advice? section, there are >>>>>>>> 2 modes ?debug? and ?optimized builds. My current setup is debug mode. So I >>>>>>>> want to change for test the performance the optimized build mode. However, >>>>>>>> if I configure again, does the existing debug mode disappear? Is there any >>>>>>>> way to coexist the 2 modes and use them in the run the application? >>>>>>>> >>>>>>> Suppose your current arch is named "arch-main-debug". Then you can >>>>>>> make an optimized version using >>>>>>> >>>>>>> cd $PETSC_DIR >>>>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt >>>>>>> >>>>>>> >>>>>>>> 2. In the guide, there are some paragraphs about optimization >>>>>>>> level of compiler. To control the optimization level of compiler, I put the >>>>>>>> ?-O3? as below. Is this right?? >>>>>>>> CFLAGS = -O3 >>>>>>>> FFLAGS = >>>>>>>> CPPFLAGS = >>>>>>>> FPPFLAGS = >>>>>>>> >>>>>>>> >>>>>>>> app : a1.o a2.o a3.o a4.o >>>>>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>>>>> >>>>>>>> >>>>>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>>>>> >>>>>>> >>>>>>> You could dp this, but that only changes it for that directory. It >>>>>>> is best to do it by reconfiguring. >>>>>>> >>>>>>> >>>>>>>> 3. In the guide, user should put ?-march=native? for using >>>>>>>> AVX2 or AVX-512. Where should I put the ?-march=native? for using AVX? >>>>>>>> >>>>>>> You can add --COPTFLAGS="" with any flags you want to >>>>>>> the configure. >>>>>>> >>>>>>> 4. After read the ?Hint for performance tuning? I understood >>>>>>>> that for good performance and scalability user should use the multiple node >>>>>>>> and multiple socket . However, before composing cluster system, many users >>>>>>>> just can use desktop system. >>>>>>>> In that case, between intel 13th i9 and amd ryzen 7950x, can the >>>>>>>> 7950x, which has an arichitecture similar to the server processor, be more >>>>>>>> suitable for petsc? (Because the architecture of intel desktop cpu is >>>>>>>> big.little.) >>>>>>>> >>>>>>> >>>>>>> A good guide is to run the STREAMS benchmark on the processor. PETSc >>>>>>> performance closely tracks that. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Hyung Kim >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 75744 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 192554 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 42231 bytes Desc: not available URL: From knepley at gmail.com Tue Jan 3 09:18:33 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Jan 2023 10:18:33 -0500 Subject: [petsc-users] Question - about the 'Hint for performance tuning' In-Reply-To: References: Message-ID: On Tue, Jan 3, 2023 at 10:09 AM ??? wrote: > 2023? 1? 3? (?) ?? 9:46, Matthew Knepley ?? ??: > >> On Tue, Jan 3, 2023 at 7:22 AM ??? wrote: >> >>> >>> >>> 2023? 1? 3? (?) ?? 8:55, Matthew Knepley ?? ??: >>> >>>> On Tue, Jan 3, 2023 at 12:48 AM ??? wrote: >>>> >>>>> I got more questions so I resend this email with more questions. >>>>> >>>>> 1.After reconfigure by >>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>>>> I tried to build my code. >>>>> >>>>> But I got an error as below >>>>> [image: image.png] >>>>> How can I fix this? >>>>> >>>> >>>> You reconfigured with PETSC_ARCH=arch-main-opt, but your build is using >>>> "arch-linux-c-opt". Why are you changing it? >>>> >>> >>> At first Matthew told me use >>> './arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>> --with-debugging=0 --PETSC_ARCH=arch-main-opt' for build optimized. >>> However, I got an error the error is as below. >>> >> >> No, I did not say that. Here is the quote: >> >> Suppose your current arch is named "arch-main-debug". >> >> But your arch was not arch-imain-debug. Instead it was >> arch-linux-c-debug, which you used. >> >> [image: image.png] >>> So Barry told me : "The previous value of PETSC_ARCH was not >>> arch-main-debug do ls $PETSC_DIR and locate the directory whose name begins >>> with arch- that will tell you the previous value used for PETSC_ARCH." >>> So I used >>> "./arch-linux-c-debug/lib/petsc/conf/reconfigure-arch-linux-c-debug.py >>> --with-debugging=0 --PETSC_ARCH=arch-linux-c-opt" >>> That command worked. However I got an error when I build my code and the >>> error is as below. >>> >> >> Your build is saying that the library does not exist. After >> configuration, did you build the library? >> >> cd $PETSC_DIR >> PETSC_ARCH=arch-linux-c-opt make all >> > > You are Right I didn't do build. So following your comments, I did build. > But, there are some errors when build my own code. > It didn't happen when in debug mode, but the following error occurred. > [image: image.png] > What could be the reason and how to fix it? > It sounds like you may have had an old *.o file which you compiled with the debug architecture. Remove all those and rebuild your project. Thanks, Matt > >> >>> [image: image.png] >>> >>> >>>> >>>>> >>>>> 2. >>>>> >>>>> CFLAGS = -O3 >>>>> FFLAGS = >>>>> CPPFLAGS = >>>>> FPPFLAGS = >>>>> COPTFLAGS = -march=native >>>>> >>>>> >>>>> app : a1.o a2.o a3.o a4.o >>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>> >>>>> >>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>> >>>>> Upper makefile is just for build my own code not for src of petsc. >>>>> In this case, also has problem as your comments? >>>>> >>>> >>>> I do not understand your question. >>>> >>> >>> My questions is : >>> When building my own code without reconfigure of PETSc, I wonder if >>> there will be performance improvement by building with the high >>> optimization level of compiler even if the makefile is configured as above. >>> >> >> If significant time is spent in your code rather than PETSc, it is >> possible. >> >> >>> >>>> >>>>> >>>>> 3. I have no such processors. Where can I find benchmark information >>>>> about STREAMS? >>>>> I mean I don't have such various processors. So I just wondering the >>>>> results of STREAMS each processors (already existing results of various >>>>> processors). >>>>> Can anyone have results of benchmark of various desktop systems? >>>>> >>>> >>>> There are catalogues of STREAMS results online. >>>> >>> >>> Sorry, I can't find proper catalogues. What's the keyword for googling >>> or could you please give a link of catalogues? >>> >> >> https://www.cs.virginia.edu/stream/ >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Hyung Kim >>> >>> >>> >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Hyung Kim >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2023? 1? 3? (?) ?? 2:19, ??? ?? ??: >>>>> >>>>>> After reconfigure by >>>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt , >>>>>> I tried to build my code. >>>>>> >>>>>> But I got an error as below >>>>>> [image: image.png] >>>>>> How can I fix this? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2023? 1? 3? (?) ?? 4:05, Barry Smith ?? ??: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Jan 2, 2023, at 9:27 AM, ??? wrote: >>>>>>> >>>>>>> There are more questions. >>>>>>> >>>>>>> 1. Following your comments, but there is an error as below. >>>>>>> >>>>>>> How can I fix this? >>>>>>> >>>>>>> >>>>>>> The previous value of PETSC_ARCH was not arch-main-debug do ls >>>>>>> $PETSC_DIR and locate the directory whose name begins with arch- that will >>>>>>> tell you the previous value used for PETSC_ARCH. >>>>>>> >>>>>>> >>>>>>> 2. After changing the optimized build, then how can I set the debug >>>>>>> mode again? >>>>>>> >>>>>>> >>>>>>> export PETSC_ARCH=arch- whatever the previous value was and then >>>>>>> recompile your code (note you do not need to recompile PETSc just your >>>>>>> executable). >>>>>>> >>>>>>> >>>>>>> 3.Following your comments, the new makefile is as below. Is it right? >>>>>>> CFLAGS = -O3 >>>>>>> FFLAGS = >>>>>>> CPPFLAGS = >>>>>>> FPPFLAGS = >>>>>>> COPTFLAGS = -march=native >>>>>>> >>>>>>> >>>>>>> app : a1.o a2.o a3.o a4.o >>>>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>>>> >>>>>>> >>>>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>>>> >>>>>>> >>>>>>> Best not to set these values in the makefile at all because they >>>>>>> will affect all compilers. Just set them with ./configure CCOPTFLAGS="-O3 >>>>>>> -march=native" >>>>>>> >>>>>>> >>>>>>> >>>>>>> 4. I have no such processors. Where can I find benchmark information >>>>>>> about STREAMS? >>>>>>> >>>>>>> >>>>>>> do make mpistreams in PETSC_DIR >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Hyung Kim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2023? 1? 2? (?) ?? 11:03, Matthew Knepley ?? ??: >>>>>>> >>>>>>>> On Mon, Jan 2, 2023 at 4:16 AM ??? wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Happy new year!! >>>>>>>>> >>>>>>>>> >>>>>>>>> I have some questions about ?Hint for performance tuning? in user >>>>>>>>> guide of petsc. >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. In the ?Performance Pitfalls and Advice? section, there >>>>>>>>> are 2 modes ?debug? and ?optimized builds. My current setup is debug mode. >>>>>>>>> So I want to change for test the performance the optimized build mode. >>>>>>>>> However, if I configure again, does the existing debug mode disappear? Is >>>>>>>>> there any way to coexist the 2 modes and use them in the run the >>>>>>>>> application? >>>>>>>>> >>>>>>>> Suppose your current arch is named "arch-main-debug". Then you can >>>>>>>> make an optimized version using >>>>>>>> >>>>>>>> cd $PETSC_DIR >>>>>>>> ./arch-main-debug/lib/petsc/conf/reconfigure-arch-main-debug.py >>>>>>>> --with-debugging=0 --PETSC_ARCH=arch-main-opt >>>>>>>> >>>>>>>> >>>>>>>>> 2. In the guide, there are some paragraphs about optimization >>>>>>>>> level of compiler. To control the optimization level of compiler, I put the >>>>>>>>> ?-O3? as below. Is this right?? >>>>>>>>> CFLAGS = -O3 >>>>>>>>> FFLAGS = >>>>>>>>> CPPFLAGS = >>>>>>>>> FPPFLAGS = >>>>>>>>> >>>>>>>>> >>>>>>>>> app : a1.o a2.o a3.o a4.o >>>>>>>>> $(LINK.C) -o $@ $^ $(LDLIBS) >>>>>>>>> >>>>>>>>> >>>>>>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>>>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>>>>>> >>>>>>>> >>>>>>>> You could dp this, but that only changes it for that directory. It >>>>>>>> is best to do it by reconfiguring. >>>>>>>> >>>>>>>> >>>>>>>>> 3. In the guide, user should put ?-march=native? for using >>>>>>>>> AVX2 or AVX-512. Where should I put the ?-march=native? for using AVX? >>>>>>>>> >>>>>>>> You can add --COPTFLAGS="" with any flags you want to >>>>>>>> the configure. >>>>>>>> >>>>>>>> 4. After read the ?Hint for performance tuning? I understood >>>>>>>>> that for good performance and scalability user should use the multiple node >>>>>>>>> and multiple socket . However, before composing cluster system, many users >>>>>>>>> just can use desktop system. >>>>>>>>> In that case, between intel 13th i9 and amd ryzen 7950x, can the >>>>>>>>> 7950x, which has an arichitecture similar to the server processor, be more >>>>>>>>> suitable for petsc? (Because the architecture of intel desktop cpu is >>>>>>>>> big.little.) >>>>>>>>> >>>>>>>> >>>>>>>> A good guide is to run the STREAMS benchmark on the processor. >>>>>>>> PETSc performance closely tracks that. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Hyung Kim >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 320891 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 75744 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 192554 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 42231 bytes Desc: not available URL: From venugovh at mail.uc.edu Tue Jan 3 15:58:16 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Tue, 3 Jan 2023 21:58:16 +0000 Subject: [petsc-users] Sequential to Parallel vector using VecScatter Message-ID: Hi, Suppose I have a vector 'V' of global size m divided into 2 processes (making local size m/2). This vector V is derived from a DM object using DMCreateGlobalVector. I am using VecScatterCreateToAll to get a vector V_SEQ. Is there a way to distribute the V_SEQ to V (where each V has a local size of m/2)? I am happy to explain if my question is not clear. Thank you! --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Jan 3 17:24:03 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 3 Jan 2023 17:24:03 -0600 Subject: [petsc-users] Sequential to Parallel vector using VecScatter In-Reply-To: References: Message-ID: On Tue, Jan 3, 2023 at 4:01 PM Venugopal, Vysakh (venugovh) via petsc-users wrote: > Hi, > > > > Suppose I have a vector ?V? of global size m divided into 2 processes > (making local size m/2). This vector V is derived from a DM object using > DMCreateGlobalVector. > > > > I am using VecScatterCreateToAll to get a vector V_SEQ. > V_SEQ on each process has a size m > > > Is there a way to distribute the V_SEQ to V (where each V has a local size > of m/2)? > I assume you want to sum V_SEQ into V, since each entry of V will receive two contributions from the two V_SEQ. If that's the case, you could use VecScatterBegin/End(toall, V_SEQ, V, ADD_VALUES, SCATTER_REVERSE), where 'toall' is the VecScatter you created with VecScatterCreateToAll. > > > I am happy to explain if my question is not clear. Thank you! > > > > --- > > Vysakh Venugopal > > Ph.D. Candidate > > Department of Mechanical Engineering > > University of Cincinnati, Cincinnati, OH 45221-0072 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From venugovh at mail.uc.edu Wed Jan 4 09:47:41 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Wed, 4 Jan 2023 15:47:41 +0000 Subject: [petsc-users] Getting global indices of vector distributed among different processes. Message-ID: Hello, Is there a way to get the global indices from a vector created from DMCreateGlobalVector? Example: If global vector V (of size 10) has indices {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and they are divided into 2 processes. Is there a way to get information such as (process 1: {0,1,2,3,4}, process 2: {5,6,7,8,9})? The reason I need this information is that I need to query the values of a different vector Q of size 10 and place those values in V. Example: Q(1) --- V(1) @ process 1, Q(7) - V(7) @ process 2, etc.. If there are smarter ways to do this, I am happy to pursue that. Thank you, Vysakh V. --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 4 09:52:04 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 4 Jan 2023 10:52:04 -0500 Subject: [petsc-users] Getting global indices of vector distributed among different processes. In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 10:48 AM Venugopal, Vysakh (venugovh) via petsc-users wrote: > Hello, > > > > Is there a way to get the global indices from a vector created from > DMCreateGlobalVector? Example: > > > > If global vector V (of size 10) has indices {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} > and they are divided into 2 processes. Is there a way to get information > such as (process 1: {0,1,2,3,4}, process 2: {5,6,7,8,9})? > https://petsc.org/main/docs/manualpages/Vec/VecGetOwnershipRange/ Thanks, Matt > The reason I need this information is that I need to query the values of a > different vector Q of size 10 and place those values in V. Example: Q(1) > --- V(1) @ process 1, Q(7) ? V(7) @ process 2, etc.. If there are smarter > ways to do this, I am happy to pursue that. > > > > Thank you, > > > > Vysakh V. > > > > --- > > Vysakh Venugopal > > Ph.D. Candidate > > Department of Mechanical Engineering > > University of Cincinnati, Cincinnati, OH 45221-0072 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Wed Jan 4 11:26:50 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 4 Jan 2023 17:26:50 +0000 Subject: [petsc-users] puzzling arkimex logic In-Reply-To: References: Message-ID: <8B653214-6E1F-4A13-962D-78B39BBDFA70@anl.gov> Hi Mark, You might want to try -ts_adapt_time_step_increase_delay to delay increasing the time step after it has been decreased due to a failed solve. Hong (Mr.) > On Jan 2, 2023, at 12:17 PM, Mark Adams wrote: > > I am using arkimex and the logic with a failed KSP solve is puzzling. This step starts with a dt of ~.005, the linear solver fails and cuts the time step by 1/4. So far, so good. The step then works but the next time step the time step goes to ~0.006. > TS seems to have forgotten that it had to cut the time step back. > Perhaps that logic is missing or my parameters need work? > > Thanks, > Mark > > -ts_adapt_dt_max 0.01 # (source: command line) > -ts_adapt_monitor # (source: file) > -ts_arkimex_type 1bee # (source: file) > -ts_dt .001 # (source: command line) > -ts_max_reject 10 # (source: file) > -ts_max_snes_failures -1 # (source: file) > -ts_max_steps 8000 # (source: command line) > -ts_max_time 14 # (source: command line) > -ts_monitor # (source: file) > -ts_rtol 1e-6 # (source: command line) > -ts_type arkimex # (source: file) > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > TSAdapt basic arkimex 0:1bee step 1 accepted t=0.001 + 2.497e-03 dt=5.404e-03 wlte=0.173 wltea= -1 wlter= > -1 > 2 TS dt 0.00540401 time 0.00349731 > 0 SNES Function norm 1.358886930084e-05 > Linear solve did not converge due to DIVERGED_ITS iterations 100 > Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0 > TSAdapt basic step 2 stage rejected (DIVERGED_LINEAR_SOLVE) t=0.00349731 + 5.404e-03 retrying with dt=1.351e-03 > 0 SNES Function norm 1.358886930084e-05 > Linear solve converged due to CONVERGED_RTOL iterations 19 > 1 SNES Function norm 4.412110425362e-10 > Linear solve converged due to CONVERGED_RTOL iterations 6 > 2 SNES Function norm 4.978968053066e-13 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > 0 SNES Function norm 8.549322067920e-06 > Linear solve converged due to CONVERGED_RTOL iterations 14 > 1 SNES Function norm 8.357075378456e-11 > Linear solve converged due to CONVERGED_RTOL iterations 4 > 2 SNES Function norm 4.983138402512e-13 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > 0 SNES Function norm 1.044832467924e-05 > Linear solve converged due to CONVERGED_RTOL iterations 13 > 1 SNES Function norm 1.036101875301e-10 > Linear solve converged due to CONVERGED_RTOL iterations 4 > 2 SNES Function norm 4.984888077288e-13 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > TSAdapt basic arkimex 0:1bee step 2 accepted t=0.00349731 + 1.351e-03 dt=6.305e-03 wlte=0.0372 wltea= -1 wlter= > -1 > 3 TS dt 0.00630456 time 0.00484832 > 0 SNES Function norm 8.116559104264e-06 > Linear solve did not converge due to DIVERGED_ITS iterations 100 From jed at jedbrown.org Wed Jan 4 12:12:36 2023 From: jed at jedbrown.org (Jed Brown) Date: Wed, 04 Jan 2023 11:12:36 -0700 Subject: [petsc-users] puzzling arkimex logic In-Reply-To: <8B653214-6E1F-4A13-962D-78B39BBDFA70@anl.gov> References: <8B653214-6E1F-4A13-962D-78B39BBDFA70@anl.gov> Message-ID: <87lemicgob.fsf@jedbrown.org> This default probably shouldn't be zero, and probably lengthening steps should be more gentle after a recent failure. But Mark, please let us know if what's there works for you. "Zhang, Hong via petsc-users" writes: > Hi Mark, > > You might want to try -ts_adapt_time_step_increase_delay to delay increasing the time step after it has been decreased due to a failed solve. > > Hong (Mr.) > >> On Jan 2, 2023, at 12:17 PM, Mark Adams wrote: >> >> I am using arkimex and the logic with a failed KSP solve is puzzling. This step starts with a dt of ~.005, the linear solver fails and cuts the time step by 1/4. So far, so good. The step then works but the next time step the time step goes to ~0.006. >> TS seems to have forgotten that it had to cut the time step back. >> Perhaps that logic is missing or my parameters need work? >> >> Thanks, >> Mark >> >> -ts_adapt_dt_max 0.01 # (source: command line) >> -ts_adapt_monitor # (source: file) >> -ts_arkimex_type 1bee # (source: file) >> -ts_dt .001 # (source: command line) >> -ts_max_reject 10 # (source: file) >> -ts_max_snes_failures -1 # (source: file) >> -ts_max_steps 8000 # (source: command line) >> -ts_max_time 14 # (source: command line) >> -ts_monitor # (source: file) >> -ts_rtol 1e-6 # (source: command line) >> -ts_type arkimex # (source: file) >> >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 >> TSAdapt basic arkimex 0:1bee step 1 accepted t=0.001 + 2.497e-03 dt=5.404e-03 wlte=0.173 wltea= -1 wlter= >> -1 >> 2 TS dt 0.00540401 time 0.00349731 >> 0 SNES Function norm 1.358886930084e-05 >> Linear solve did not converge due to DIVERGED_ITS iterations 100 >> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0 >> TSAdapt basic step 2 stage rejected (DIVERGED_LINEAR_SOLVE) t=0.00349731 + 5.404e-03 retrying with dt=1.351e-03 >> 0 SNES Function norm 1.358886930084e-05 >> Linear solve converged due to CONVERGED_RTOL iterations 19 >> 1 SNES Function norm 4.412110425362e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 6 >> 2 SNES Function norm 4.978968053066e-13 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 >> 0 SNES Function norm 8.549322067920e-06 >> Linear solve converged due to CONVERGED_RTOL iterations 14 >> 1 SNES Function norm 8.357075378456e-11 >> Linear solve converged due to CONVERGED_RTOL iterations 4 >> 2 SNES Function norm 4.983138402512e-13 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 >> 0 SNES Function norm 1.044832467924e-05 >> Linear solve converged due to CONVERGED_RTOL iterations 13 >> 1 SNES Function norm 1.036101875301e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 4 >> 2 SNES Function norm 4.984888077288e-13 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 >> TSAdapt basic arkimex 0:1bee step 2 accepted t=0.00349731 + 1.351e-03 dt=6.305e-03 wlte=0.0372 wltea= -1 wlter= >> -1 >> 3 TS dt 0.00630456 time 0.00484832 >> 0 SNES Function norm 8.116559104264e-06 >> Linear solve did not converge due to DIVERGED_ITS iterations 100 From mfadams at lbl.gov Wed Jan 4 13:07:41 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 4 Jan 2023 14:07:41 -0500 Subject: [petsc-users] puzzling arkimex logic In-Reply-To: <87lemicgob.fsf@jedbrown.org> References: <8B653214-6E1F-4A13-962D-78B39BBDFA70@anl.gov> <87lemicgob.fsf@jedbrown.org> Message-ID: Thanks, it is working fine. Mark On Wed, Jan 4, 2023 at 1:12 PM Jed Brown wrote: > This default probably shouldn't be zero, and probably lengthening steps > should be more gentle after a recent failure. But Mark, please let us know > if what's there works for you. > > "Zhang, Hong via petsc-users" writes: > > > Hi Mark, > > > > You might want to try -ts_adapt_time_step_increase_delay to delay > increasing the time step after it has been decreased due to a failed solve. > > > > Hong (Mr.) > > > >> On Jan 2, 2023, at 12:17 PM, Mark Adams wrote: > >> > >> I am using arkimex and the logic with a failed KSP solve is puzzling. > This step starts with a dt of ~.005, the linear solver fails and cuts the > time step by 1/4. So far, so good. The step then works but the next time > step the time step goes to ~0.006. > >> TS seems to have forgotten that it had to cut the time step back. > >> Perhaps that logic is missing or my parameters need work? > >> > >> Thanks, > >> Mark > >> > >> -ts_adapt_dt_max 0.01 # (source: command line) > >> -ts_adapt_monitor # (source: file) > >> -ts_arkimex_type 1bee # (source: file) > >> -ts_dt .001 # (source: command line) > >> -ts_max_reject 10 # (source: file) > >> -ts_max_snes_failures -1 # (source: file) > >> -ts_max_steps 8000 # (source: command line) > >> -ts_max_time 14 # (source: command line) > >> -ts_monitor # (source: file) > >> -ts_rtol 1e-6 # (source: command line) > >> -ts_type arkimex # (source: file) > >> > >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > >> TSAdapt basic arkimex 0:1bee step 1 accepted t=0.001 + > 2.497e-03 dt=5.404e-03 wlte=0.173 wltea= -1 wlter= > >> -1 > >> 2 TS dt 0.00540401 time 0.00349731 > >> 0 SNES Function norm 1.358886930084e-05 > >> Linear solve did not converge due to DIVERGED_ITS iterations 100 > >> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE > iterations 0 > >> TSAdapt basic step 2 stage rejected (DIVERGED_LINEAR_SOLVE) > t=0.00349731 + 5.404e-03 retrying with dt=1.351e-03 > >> 0 SNES Function norm 1.358886930084e-05 > >> Linear solve converged due to CONVERGED_RTOL iterations 19 > >> 1 SNES Function norm 4.412110425362e-10 > >> Linear solve converged due to CONVERGED_RTOL iterations 6 > >> 2 SNES Function norm 4.978968053066e-13 > >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > >> 0 SNES Function norm 8.549322067920e-06 > >> Linear solve converged due to CONVERGED_RTOL iterations 14 > >> 1 SNES Function norm 8.357075378456e-11 > >> Linear solve converged due to CONVERGED_RTOL iterations 4 > >> 2 SNES Function norm 4.983138402512e-13 > >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > >> 0 SNES Function norm 1.044832467924e-05 > >> Linear solve converged due to CONVERGED_RTOL iterations 13 > >> 1 SNES Function norm 1.036101875301e-10 > >> Linear solve converged due to CONVERGED_RTOL iterations 4 > >> 2 SNES Function norm 4.984888077288e-13 > >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2 > >> TSAdapt basic arkimex 0:1bee step 2 accepted t=0.00349731 + > 1.351e-03 dt=6.305e-03 wlte=0.0372 wltea= -1 wlter= > >> -1 > >> 3 TS dt 0.00630456 time 0.00484832 > >> 0 SNES Function norm 8.116559104264e-06 > >> Linear solve did not converge due to DIVERGED_ITS iterations 100 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Jan 4 14:22:26 2023 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 4 Jan 2023 15:22:26 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse Message-ID: I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I call MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac); which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into the already allocated CSR, (2) I can write directly into the original data structures and the Mat just "knows" about it, although it still needs a call to MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this works great with GAMG. I have the same CSR representation filled in GPU data allocated with cudaMalloc and filled on-device. Is there an equivalent Mat constructor for GPU arrays, or some other way to avoid unnecessary copies? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Jan 4 17:02:20 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 4 Jan 2023 17:02:20 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ... The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so. A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. We recently added routines to support matrix assembly on GPUs, see if MatSetValuesCOO helps --Junchao Zhang On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: > I have a sparse matrix constructed in non-petsc code using a standard CSR > representation where I compute the Jacobian to be used in an implicit TS > context. In the CPU world I call > > MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, > colidxptr, valptr, Jac); > > which as I understand it -- (1) never copies/allocates that information, > and the matrix Jac is just a non-owning view into the already allocated > CSR, (2) I can write directly into the original data structures and the Mat > just "knows" about it, although it still needs a call to > MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this > works great with GAMG. > > I have the same CSR representation filled in GPU data allocated with > cudaMalloc and filled on-device. Is there an equivalent Mat constructor for > GPU arrays, or some other way to avoid unnecessary copies? > > Thanks, > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Jan 4 17:19:06 2023 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 4 Jan 2023 18:19:06 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: > > Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we > would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD > GPUs, ... Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases. On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang wrote: > No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. > Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would > need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ... > > The real problem I think is to deal with multiple MPI ranks. Providing the > split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for > users to do so. > > A workaround is to let petsc build the matrix and allocate the memory, > then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. > > We recently added routines to support matrix assembly on GPUs, see if > MatSetValuesCOO > helps > > --Junchao Zhang > > > On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: > >> I have a sparse matrix constructed in non-petsc code using a standard CSR >> representation where I compute the Jacobian to be used in an implicit TS >> context. In the CPU world I call >> >> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, >> colidxptr, valptr, Jac); >> >> which as I understand it -- (1) never copies/allocates that information, >> and the matrix Jac is just a non-owning view into the already allocated >> CSR, (2) I can write directly into the original data structures and the Mat >> just "knows" about it, although it still needs a call to >> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >> works great with GAMG. >> >> I have the same CSR representation filled in GPU data allocated with >> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for >> GPU arrays, or some other way to avoid unnecessary copies? >> >> Thanks, >> Mark >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Jan 4 17:27:08 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 4 Jan 2023 17:27:08 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: > Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >> GPUs, ... > > > Wouldn't one function suffice? Assuming these are contiguous arrays in CSR > format, they're just raw device pointers in all cases. > But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend) > > On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang > wrote: > >> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. >> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would >> need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ... >> >> The real problem I think is to deal with multiple MPI ranks. Providing >> the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged >> for users to do so. >> >> A workaround is to let petsc build the matrix and allocate the memory, >> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >> >> We recently added routines to support matrix assembly on GPUs, see if >> MatSetValuesCOO >> helps >> >> --Junchao Zhang >> >> >> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >> >>> I have a sparse matrix constructed in non-petsc code using a standard >>> CSR representation where I compute the Jacobian to be used in an implicit >>> TS context. In the CPU world I call >>> >>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, >>> colidxptr, valptr, Jac); >>> >>> which as I understand it -- (1) never copies/allocates that information, >>> and the matrix Jac is just a non-owning view into the already allocated >>> CSR, (2) I can write directly into the original data structures and the Mat >>> just "knows" about it, although it still needs a call to >>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>> works great with GAMG. >>> >>> I have the same CSR representation filled in GPU data allocated with >>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for >>> GPU arrays, or some other way to avoid unnecessary copies? >>> >>> Thanks, >>> Mark >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Jan 4 17:49:07 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 4 Jan 2023 17:49:07 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: > Oh, is the device backend not known at compile time? > Currently it is known at compile time. Or multiple backends can be alive at once? > Some petsc developers (Jed and Barry) want to support this, but we are incapable now. > > On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang > wrote: > >> >> >> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >> >>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>> GPUs, ... >>> >>> >>> Wouldn't one function suffice? Assuming these are contiguous arrays in >>> CSR format, they're just raw device pointers in all cases. >>> >> But we need to know what device it is (to dispatch to either petsc-CUDA >> or petsc-HIP backend) >> >> >>> >>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang >>> wrote: >>> >>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>> GPUs, ... >>>> >>>> The real problem I think is to deal with multiple MPI ranks. Providing >>>> the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged >>>> for users to do so. >>>> >>>> A workaround is to let petsc build the matrix and allocate the memory, >>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >>>> >>>> We recently added routines to support matrix assembly on GPUs, see if >>>> MatSetValuesCOO >>>> helps >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>> >>>>> I have a sparse matrix constructed in non-petsc code using a standard >>>>> CSR representation where I compute the Jacobian to be used in an implicit >>>>> TS context. In the CPU world I call >>>>> >>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, >>>>> colidxptr, valptr, Jac); >>>>> >>>>> which as I understand it -- (1) never copies/allocates that >>>>> information, and the matrix Jac is just a non-owning view into the already >>>>> allocated CSR, (2) I can write directly into the original data structures >>>>> and the Mat just "knows" about it, although it still needs a call to >>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>> works great with GAMG. >>>>> >>>>> I have the same CSR representation filled in GPU data allocated with >>>>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for >>>>> GPU arrays, or some other way to avoid unnecessary copies? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 4 18:02:01 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 4 Jan 2023 19:02:01 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang wrote: > > On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: > >> Oh, is the device backend not known at compile time? >> > Currently it is known at compile time. > Are you sure? I don't think it is known at compile time. Thanks, Matt > Or multiple backends can be alive at once? >> > > Some petsc developers (Jed and Barry) want to support this, but we are > incapable now. > > >> >> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >> wrote: >> >>> >>> >>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>> >>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>> GPUs, ... >>>> >>>> >>>> Wouldn't one function suffice? Assuming these are contiguous arrays in >>>> CSR format, they're just raw device pointers in all cases. >>>> >>> But we need to know what device it is (to dispatch to either petsc-CUDA >>> or petsc-HIP backend) >>> >>> >>>> >>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang >>>> wrote: >>>> >>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>> GPUs, ... >>>>> >>>>> The real problem I think is to deal with multiple MPI ranks. Providing >>>>> the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged >>>>> for users to do so. >>>>> >>>>> A workaround is to let petsc build the matrix and allocate the memory, >>>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >>>>> >>>>> We recently added routines to support matrix assembly on GPUs, see if >>>>> MatSetValuesCOO >>>>> >>>>> helps >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>>> >>>>>> I have a sparse matrix constructed in non-petsc code using a standard >>>>>> CSR representation where I compute the Jacobian to be used in an implicit >>>>>> TS context. In the CPU world I call >>>>>> >>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, >>>>>> colidxptr, valptr, Jac); >>>>>> >>>>>> which as I understand it -- (1) never copies/allocates that >>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>> works great with GAMG. >>>>>> >>>>>> I have the same CSR representation filled in GPU data allocated with >>>>>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for >>>>>> GPU arrays, or some other way to avoid unnecessary copies? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Jan 4 18:09:02 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 4 Jan 2023 18:09:02 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley wrote: > On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang > wrote: > >> >> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >> >>> Oh, is the device backend not known at compile time? >>> >> Currently it is known at compile time. >> > > Are you sure? I don't think it is known at compile time. > We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both > > Thanks, > > Matt > > >> Or multiple backends can be alive at once? >>> >> >> Some petsc developers (Jed and Barry) want to support this, but we are >> incapable now. >> >> >>> >>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>> wrote: >>> >>>> >>>> >>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>> >>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>> GPUs, ... >>>>> >>>>> >>>>> Wouldn't one function suffice? Assuming these are contiguous arrays in >>>>> CSR format, they're just raw device pointers in all cases. >>>>> >>>> But we need to know what device it is (to dispatch to either petsc-CUDA >>>> or petsc-HIP backend) >>>> >>>> >>>>> >>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>> GPUs, ... >>>>>> >>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>> discouraged for users to do so. >>>>>> >>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>> it up. >>>>>> >>>>>> We recently added routines to support matrix assembly on GPUs, see if >>>>>> MatSetValuesCOO >>>>>> >>>>>> helps >>>>>> >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>>>> >>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>> implicit TS context. In the CPU world I call >>>>>>> >>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, >>>>>>> colidxptr, valptr, Jac); >>>>>>> >>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>> works great with GAMG. >>>>>>> >>>>>>> I have the same CSR representation filled in GPU data allocated with >>>>>>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for >>>>>>> GPU arrays, or some other way to avoid unnecessary copies? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 4 18:17:04 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 4 Jan 2023 19:17:04 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang wrote: > On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley wrote: > >> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >> wrote: >> >>> >>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>> >>>> Oh, is the device backend not known at compile time? >>>> >>> Currently it is known at compile time. >>> >> >> Are you sure? I don't think it is known at compile time. >> > We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both > Where is the logic for that in the code? This seems like a crazy design. Thanks, Matt > Thanks, >> >> Matt >> >> >>> Or multiple backends can be alive at once? >>>> >>> >>> Some petsc developers (Jed and Barry) want to support this, but we are >>> incapable now. >>> >>> >>>> >>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>> >>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>> GPUs, ... >>>>>> >>>>>> >>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>> >>>>> But we need to know what device it is (to dispatch to either >>>>> petsc-CUDA or petsc-HIP backend) >>>>> >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang >>>>>> wrote: >>>>>> >>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>> GPUs, ... >>>>>>> >>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>> discouraged for users to do so. >>>>>>> >>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>> it up. >>>>>>> >>>>>>> We recently added routines to support matrix assembly on GPUs, see if >>>>>>> MatSetValuesCOO >>>>>>> >>>>>>> helps >>>>>>> >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>>>>> >>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>> implicit TS context. In the CPU world I call >>>>>>>> >>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>> >>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>> works great with GAMG. >>>>>>>> >>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Jan 4 18:22:34 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 4 Jan 2023 18:22:34 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: We don't have a machine for us to test with both "--with-cuda --with-hip" --Junchao Zhang On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley wrote: > On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang > wrote: > >> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley wrote: >> >>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>> wrote: >>> >>>> >>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>> >>>>> Oh, is the device backend not known at compile time? >>>>> >>>> Currently it is known at compile time. >>>> >>> >>> Are you sure? I don't think it is known at compile time. >>> >> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >> > > Where is the logic for that in the code? This seems like a crazy design. > > Thanks, > > Matt > > >> Thanks, >>> >>> Matt >>> >>> >>>> Or multiple backends can be alive at once? >>>>> >>>> >>>> Some petsc developers (Jed and Barry) want to support this, but we are >>>> incapable now. >>>> >>>> >>>>> >>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>>> >>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>> GPUs, ... >>>>>>> >>>>>>> >>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>>> >>>>>> But we need to know what device it is (to dispatch to either >>>>>> petsc-CUDA or petsc-HIP backend) >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>> GPUs, ... >>>>>>>> >>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>> discouraged for users to do so. >>>>>>>> >>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>> it up. >>>>>>>> >>>>>>>> We recently added routines to support matrix assembly on GPUs, see >>>>>>>> if MatSetValuesCOO >>>>>>>> >>>>>>>> helps >>>>>>>> >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>>>>>> >>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>> >>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>> >>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>> works great with GAMG. >>>>>>>>> >>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 4 18:27:21 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 4 Jan 2023 19:27:21 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang wrote: > We don't have a machine for us to test with both "--with-cuda --with-hip" > Yes, but your answer suggested that the structure of the code prevented this combination. Thanks, Matt > --Junchao Zhang > > > On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley wrote: > >> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang >> wrote: >> >>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley >>> wrote: >>> >>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>>> >>>>>> Oh, is the device backend not known at compile time? >>>>>> >>>>> Currently it is known at compile time. >>>>> >>>> >>>> Are you sure? I don't think it is known at compile time. >>>> >>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >>> >> >> Where is the logic for that in the code? This seems like a crazy design. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Or multiple backends can be alive at once? >>>>>> >>>>> >>>>> Some petsc developers (Jed and Barry) want to support this, but we are >>>>> incapable now. >>>>> >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>>>> >>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>>> GPUs, ... >>>>>>>> >>>>>>>> >>>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>>>> >>>>>>> But we need to know what device it is (to dispatch to either >>>>>>> petsc-CUDA or petsc-HIP backend) >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>>> GPUs, ... >>>>>>>>> >>>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>>> discouraged for users to do so. >>>>>>>>> >>>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>>> it up. >>>>>>>>> >>>>>>>>> We recently added routines to support matrix assembly on GPUs, see >>>>>>>>> if MatSetValuesCOO >>>>>>>>> >>>>>>>>> helps >>>>>>>>> >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>>> >>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>>> >>>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>>> works great with GAMG. >>>>>>>>>> >>>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Jan 4 18:38:04 2023 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 4 Jan 2023 19:38:04 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: You have my condolences if you have to support all those things simultaneously. On Wed, Jan 4, 2023, 7:27 PM Matthew Knepley wrote: > On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang > wrote: > >> We don't have a machine for us to test with both "--with-cuda --with-hip" >> > > Yes, but your answer suggested that the structure of the code prevented > this combination. > > Thanks, > > Matt > > >> --Junchao Zhang >> >> >> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley wrote: >> >>> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang >>> wrote: >>> >>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>>>> >>>>>>> Oh, is the device backend not known at compile time? >>>>>>> >>>>>> Currently it is known at compile time. >>>>>> >>>>> >>>>> Are you sure? I don't think it is known at compile time. >>>>> >>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not >>>> both >>>> >>> >>> Where is the logic for that in the code? This seems like a crazy design. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Or multiple backends can be alive at once? >>>>>>> >>>>>> >>>>>> Some petsc developers (Jed and Barry) want to support this, but we >>>>>> are incapable now. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>>>>> >>>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>>>> GPUs, ... >>>>>>>>> >>>>>>>>> >>>>>>>>> Wouldn't one function suffice? Assuming these are contiguous >>>>>>>>> arrays in CSR format, they're just raw device pointers in all cases. >>>>>>>>> >>>>>>>> But we need to know what device it is (to dispatch to either >>>>>>>> petsc-CUDA or petsc-HIP backend) >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() >>>>>>>>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but >>>>>>>>>> then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on >>>>>>>>>> AMD GPUs, ... >>>>>>>>>> >>>>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>>>> discouraged for users to do so. >>>>>>>>>> >>>>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>>>> it up. >>>>>>>>>> >>>>>>>>>> We recently added routines to support matrix assembly on GPUs, >>>>>>>>>> see if MatSetValuesCOO >>>>>>>>>> >>>>>>>>>> helps >>>>>>>>>> >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>>>> >>>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>>>> >>>>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>>>> works great with GAMG. >>>>>>>>>>> >>>>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Jan 5 04:24:42 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 5 Jan 2023 05:24:42 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: Support of HIP and CUDA hardware together would be crazy, but supporting (Kokkos) OpenMP and a device backend would require something that looks like two device back-ends and long term CPUs (eg, Grace) may not go away wrt kernels. PETSc does not support OMP well, of course, but that support could grow if that is where hardware and applications go. On Wed, Jan 4, 2023 at 7:38 PM Mark Lohry wrote: > You have my condolences if you have to support all those things > simultaneously. > > On Wed, Jan 4, 2023, 7:27 PM Matthew Knepley wrote: > >> On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang >> wrote: >> >>> We don't have a machine for us to test with both "--with-cuda --with-hip" >>> >> >> Yes, but your answer suggested that the structure of the code prevented >> this combination. >> >> Thanks, >> >> Matt >> >> >>> --Junchao Zhang >>> >>> >>> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley >>> wrote: >>> >>>> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang >>>> wrote: >>>> >>>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>>>>> >>>>>>>> Oh, is the device backend not known at compile time? >>>>>>>> >>>>>>> Currently it is known at compile time. >>>>>>> >>>>>> >>>>>> Are you sure? I don't think it is known at compile time. >>>>>> >>>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not >>>>> both >>>>> >>>> >>>> Where is the logic for that in the code? This seems like a crazy design. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Or multiple backends can be alive at once? >>>>>>>> >>>>>>> >>>>>>> Some petsc developers (Jed and Barry) want to support this, but we >>>>>>> are incapable now. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but >>>>>>>>>>> then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on >>>>>>>>>>> AMD GPUs, ... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Wouldn't one function suffice? Assuming these are contiguous >>>>>>>>>> arrays in CSR format, they're just raw device pointers in all cases. >>>>>>>>>> >>>>>>>>> But we need to know what device it is (to dispatch to either >>>>>>>>> petsc-CUDA or petsc-HIP backend) >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() >>>>>>>>>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but >>>>>>>>>>> then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on >>>>>>>>>>> AMD GPUs, ... >>>>>>>>>>> >>>>>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>>>>> discouraged for users to do so. >>>>>>>>>>> >>>>>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>>>>> it up. >>>>>>>>>>> >>>>>>>>>>> We recently added routines to support matrix assembly on GPUs, >>>>>>>>>>> see if MatSetValuesCOO >>>>>>>>>>> >>>>>>>>>>> helps >>>>>>>>>>> >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>>>>> >>>>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>>>>> >>>>>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>>>>> works great with GAMG. >>>>>>>>>>>> >>>>>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Mark >>>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Thu Jan 5 04:46:58 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Thu, 5 Jan 2023 10:46:58 +0000 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mlohry at gmail.com Thu Jan 5 09:39:27 2023 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 5 Jan 2023 10:39:27 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: > > > A workaround is to let petsc build the matrix and allocate the memory, > then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. > Junchao, looking at the code for this it seems to only return a pointer to the value array, but not pointers to the column and row index arrays, is that right? On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch wrote: > We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both > > > CUPM works with both enabled simultaneously, I don?t think there are any > direct restrictions for it. Vec at least was fully usable with both cuda > and hip (though untested) last time I checked. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > On Jan 5, 2023, at 00:09, Junchao Zhang wrote: > > ? > > > > On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley wrote: > >> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >> wrote: >> >>> >>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>> >>>> Oh, is the device backend not known at compile time? >>>> >>> Currently it is known at compile time. >>> >> >> Are you sure? I don't think it is known at compile time. >> > We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both > > >> >> Thanks, >> >> Matt >> >> >>> Or multiple backends can be alive at once? >>>> >>> >>> Some petsc developers (Jed and Barry) want to support this, but we are >>> incapable now. >>> >>> >>>> >>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>> >>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>> GPUs, ... >>>>>> >>>>>> >>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>> >>>>> But we need to know what device it is (to dispatch to either >>>>> petsc-CUDA or petsc-HIP backend) >>>>> >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang >>>>>> wrote: >>>>>> >>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>> GPUs, ... >>>>>>> >>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>> discouraged for users to do so. >>>>>>> >>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>> it up. >>>>>>> >>>>>>> We recently added routines to support matrix assembly on GPUs, see if >>>>>>> MatSetValuesCOO >>>>>>> >>>>>>> helps >>>>>>> >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>>>>> >>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>> implicit TS context. In the CPU world I call >>>>>>>> >>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>> >>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>> works great with GAMG. >>>>>>>> >>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jan 5 10:06:04 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Jan 2023 10:06:04 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry wrote: > >> A workaround is to let petsc build the matrix and allocate the memory, >> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >> > > Junchao, looking at the code for this it seems to only return a pointer to > the value array, but not pointers to the column and row index arrays, is > that right? > Yes, that is correct. I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if A is a device matrix, otherwise (a,i, j) on host and mtype = PETSC_MEMTYPE_HOST. We currently have similar things like VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*). It looks like you need (a, i, j) for assembly, but the above function only works for an assembled matrix. > > > On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch > wrote: > >> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >> >> >> CUPM works with both enabled simultaneously, I don?t think there are any >> direct restrictions for it. Vec at least was fully usable with both cuda >> and hip (though untested) last time I checked. >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >> On Jan 5, 2023, at 00:09, Junchao Zhang wrote: >> >> ? >> >> >> >> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley wrote: >> >>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>> wrote: >>> >>>> >>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>> >>>>> Oh, is the device backend not known at compile time? >>>>> >>>> Currently it is known at compile time. >>>> >>> >>> Are you sure? I don't think it is known at compile time. >>> >> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >> >> >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Or multiple backends can be alive at once? >>>>> >>>> >>>> Some petsc developers (Jed and Barry) want to support this, but we are >>>> incapable now. >>>> >>>> >>>>> >>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>>> >>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>> GPUs, ... >>>>>>> >>>>>>> >>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>>> >>>>>> But we need to know what device it is (to dispatch to either >>>>>> petsc-CUDA or petsc-HIP backend) >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>> GPUs, ... >>>>>>>> >>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>> discouraged for users to do so. >>>>>>>> >>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>> it up. >>>>>>>> >>>>>>>> We recently added routines to support matrix assembly on GPUs, see >>>>>>>> if MatSetValuesCOO >>>>>>>> >>>>>>>> helps >>>>>>>> >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry wrote: >>>>>>>> >>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>> >>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>> >>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>> works great with GAMG. >>>>>>>>> >>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Thu Jan 5 10:23:56 2023 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 5 Jan 2023 11:23:56 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: I am thinking something like MatSeqAIJGetArrayAndMemType > Isn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointer is for, just that if a device matrix was created it returns device pointers, if a host matrix was created it returns host pointers. Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType , isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return me those raw device pointers? On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang wrote: > > > On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry wrote: > >> >>> A workaround is to let petsc build the matrix and allocate the memory, >>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >>> >> >> Junchao, looking at the code for this it seems to only return a pointer >> to the value array, but not pointers to the column and row index arrays, is >> that right? >> > Yes, that is correct. > I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const > PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), > which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if > A is a device matrix, otherwise (a,i, j) on host and mtype = > PETSC_MEMTYPE_HOST. > We currently have similar things like > VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding > MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*). > > It looks like you need (a, i, j) for assembly, but the above function only > works for an assembled matrix. > > >> >> >> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch >> wrote: >> >>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >>> >>> >>> CUPM works with both enabled simultaneously, I don?t think there are any >>> direct restrictions for it. Vec at least was fully usable with both cuda >>> and hip (though untested) last time I checked. >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> >>> On Jan 5, 2023, at 00:09, Junchao Zhang wrote: >>> >>> ? >>> >>> >>> >>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley >>> wrote: >>> >>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>>> >>>>>> Oh, is the device backend not known at compile time? >>>>>> >>>>> Currently it is known at compile time. >>>>> >>>> >>>> Are you sure? I don't think it is known at compile time. >>>> >>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >>> >>> >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Or multiple backends can be alive at once? >>>>>> >>>>> >>>>> Some petsc developers (Jed and Barry) want to support this, but we are >>>>> incapable now. >>>>> >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>>>> >>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>>> GPUs, ... >>>>>>>> >>>>>>>> >>>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>>>> >>>>>>> But we need to know what device it is (to dispatch to either >>>>>>> petsc-CUDA or petsc-HIP backend) >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we >>>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>>> GPUs, ... >>>>>>>>> >>>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>>> discouraged for users to do so. >>>>>>>>> >>>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>>> it up. >>>>>>>>> >>>>>>>>> We recently added routines to support matrix assembly on GPUs, see >>>>>>>>> if MatSetValuesCOO >>>>>>>>> >>>>>>>>> helps >>>>>>>>> >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>>> >>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>>> >>>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>>> works great with GAMG. >>>>>>>>>> >>>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Thu Jan 5 10:35:38 2023 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 5 Jan 2023 09:35:38 -0700 Subject: [petsc-users] error when trying to compile with HPDDM Message-ID: Dear developers, I'm trying to compile petsc together with the HPDDM library. A series on errors appeared: /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp: In static member function ?static constexpr __float128 std::numeric_limits<__float128>::min()?: /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57: error: unable to find numeric literal operator ?operator""Q? 54 | static constexpr __float128 min() noexcept { return FLT128_MIN; } I'm attaching the log files to this email. Could you please help me with this? bests regards Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 131071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 1308843 bytes Desc: not available URL: From jacob.fai at gmail.com Thu Jan 5 10:37:07 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Thu, 5 Jan 2023 16:37:07 +0000 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: <245D8BAF-43C0-4139-BA3E-F2BD0CA66A19@gmail.com> An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jan 5 10:38:18 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Jan 2023 10:38:18 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: On Thu, Jan 5, 2023 at 10:24 AM Mark Lohry wrote: > > I am thinking something like MatSeqAIJGetArrayAndMemType >> > > Isn't the "MemType" of the matrix an invariant on creation? e.g. a user > shouldn't care what memtype a pointer is for, just that if a device matrix > was created it returns device pointers, if a host matrix was created it > returns host pointers. > > Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType > , > isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will > cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return > me those raw device pointers? > > Yeah, I forgot I added it :). On "a user shouldn't care what memtype a pointer is": yes if you can, otherwise you can use mtype to differentiate your code path. > > > > > On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang > wrote: > >> >> >> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry wrote: >> >>> >>>> A workaround is to let petsc build the matrix and allocate the memory, >>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >>>> >>> >>> Junchao, looking at the code for this it seems to only return a pointer >>> to the value array, but not pointers to the column and row index arrays, is >>> that right? >>> >> Yes, that is correct. >> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const >> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), >> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if >> A is a device matrix, otherwise (a,i, j) on host and mtype = >> PETSC_MEMTYPE_HOST. >> We currently have similar things like >> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding >> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*). >> >> It looks like you need (a, i, j) for assembly, but the above function >> only works for an assembled matrix. >> >> >>> >>> >>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch >>> wrote: >>> >>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not >>>> both >>>> >>>> >>>> CUPM works with both enabled simultaneously, I don?t think there are >>>> any direct restrictions for it. Vec at least was fully usable with both >>>> cuda and hip (though untested) last time I checked. >>>> >>>> Best regards, >>>> >>>> Jacob Faibussowitsch >>>> (Jacob Fai - booss - oh - vitch) >>>> >>>> On Jan 5, 2023, at 00:09, Junchao Zhang >>>> wrote: >>>> >>>> ? >>>> >>>> >>>> >>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry wrote: >>>>>> >>>>>>> Oh, is the device backend not known at compile time? >>>>>>> >>>>>> Currently it is known at compile time. >>>>>> >>>>> >>>>> Are you sure? I don't think it is known at compile time. >>>>> >>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not >>>> both >>>> >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Or multiple backends can be alive at once? >>>>>>> >>>>>> >>>>>> Some petsc developers (Jed and Barry) want to support this, but we >>>>>> are incapable now. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry wrote: >>>>>>>> >>>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>>>> GPUs, ... >>>>>>>>> >>>>>>>>> >>>>>>>>> Wouldn't one function suffice? Assuming these are contiguous >>>>>>>>> arrays in CSR format, they're just raw device pointers in all cases. >>>>>>>>> >>>>>>>> But we need to know what device it is (to dispatch to either >>>>>>>> petsc-CUDA or petsc-HIP backend) >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() >>>>>>>>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but >>>>>>>>>> then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on >>>>>>>>>> AMD GPUs, ... >>>>>>>>>> >>>>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>>>> discouraged for users to do so. >>>>>>>>>> >>>>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill >>>>>>>>>> it up. >>>>>>>>>> >>>>>>>>>> We recently added routines to support matrix assembly on GPUs, >>>>>>>>>> see if MatSetValuesCOO >>>>>>>>>> >>>>>>>>>> helps >>>>>>>>>> >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an >>>>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>>>> >>>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>>>> >>>>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already >>>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures >>>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this >>>>>>>>>>> works great with GAMG. >>>>>>>>>>> >>>>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Thu Jan 5 11:25:08 2023 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 5 Jan 2023 10:25:08 -0700 Subject: [petsc-users] setting a vector with VecSetValue versus VecSetValues Message-ID: dear PETSc developers, I have a code where I copy an array to a distributed petsc vector with the next lines: 1 for (int i = 0; i < ndof_local; i++) { 2 PetscInt gl_row = (PetscInt)(i)+rstart; 3 PetscScalar val = (PetscScalar)u[i]; 4 VecSetValues(x,1,&gl_row,&val,INSERT_VALUES); 5 } // for (int i = 0; i < ndof_local; i++) { // PetscInt gl_row = (PetscInt)(i); // PetscScalar val = (PetscScalar)u[i]; // VecSetValue(x,gl_row,val,INSERT_VALUES); // } VecAssemblyBegin(x); VecAssemblyEnd(x); This works as expected. If, instead of using lines 1-5, I use the lines where VecSetValue is used with local indices, then the vector is null on all the processes but rank 0, and the piece of information at rank zero is incorrect. What could I be doing wrong? bests regards Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jan 5 11:37:49 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Jan 2023 11:37:49 -0600 Subject: [petsc-users] setting a vector with VecSetValue versus VecSetValues In-Reply-To: References: Message-ID: VecSetValue() also needs global indices, so you need PetscInt gl_row = ( PetscInt)(i)+rstart; --Junchao Zhang On Thu, Jan 5, 2023 at 11:25 AM Alfredo Jaramillo wrote: > dear PETSc developers, > > I have a code where I copy an array to a distributed petsc vector with the > next lines: > > 1 for (int i = 0; i < ndof_local; i++) { > 2 PetscInt gl_row = (PetscInt)(i)+rstart; > 3 PetscScalar val = (PetscScalar)u[i]; > 4 VecSetValues(x,1,&gl_row,&val,INSERT_VALUES); > 5 } > > // for (int i = 0; i < ndof_local; i++) { > // PetscInt gl_row = (PetscInt)(i); > // PetscScalar val = (PetscScalar)u[i]; > // VecSetValue(x,gl_row,val,INSERT_VALUES); > // } > > VecAssemblyBegin(x); > VecAssemblyEnd(x); > > This works as expected. If, instead of using lines 1-5, I use the lines > where VecSetValue is used with local indices, then the vector is null on > all the processes but rank 0, and the piece of information at rank zero is > incorrect. > > What could I be doing wrong? > > bests regards > Alfredo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Thu Jan 5 11:41:26 2023 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 5 Jan 2023 10:41:26 -0700 Subject: [petsc-users] setting a vector with VecSetValue versus VecSetValues In-Reply-To: References: Message-ID: omg... for some reason I was thinking it takes local indices. Yes.. modifying that line the code works well. thank you! Alfredo On Thu, Jan 5, 2023 at 10:38 AM Junchao Zhang wrote: > VecSetValue() also needs global indices, so you need PetscInt gl_row = ( > PetscInt)(i)+rstart; > > --Junchao Zhang > > > On Thu, Jan 5, 2023 at 11:25 AM Alfredo Jaramillo < > ajaramillopalma at gmail.com> wrote: > >> dear PETSc developers, >> >> I have a code where I copy an array to a distributed petsc vector with >> the next lines: >> >> 1 for (int i = 0; i < ndof_local; i++) { >> 2 PetscInt gl_row = (PetscInt)(i)+rstart; >> 3 PetscScalar val = (PetscScalar)u[i]; >> 4 VecSetValues(x,1,&gl_row,&val,INSERT_VALUES); >> 5 } >> >> // for (int i = 0; i < ndof_local; i++) { >> // PetscInt gl_row = (PetscInt)(i); >> // PetscScalar val = (PetscScalar)u[i]; >> // VecSetValue(x,gl_row,val,INSERT_VALUES); >> // } >> >> VecAssemblyBegin(x); >> VecAssemblyEnd(x); >> >> This works as expected. If, instead of using lines 1-5, I use the lines >> where VecSetValue is used with local indices, then the vector is null on >> all the processes but rank 0, and the piece of information at rank zero is >> incorrect. >> >> What could I be doing wrong? >> >> bests regards >> Alfredo >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 5 12:06:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 5 Jan 2023 13:06:05 -0500 Subject: [petsc-users] error when trying to compile with HPDDM In-Reply-To: References: Message-ID: On Thu, Jan 5, 2023 at 11:36 AM Alfredo Jaramillo wrote: > Dear developers, > I'm trying to compile petsc together with the HPDDM library. A series on > errors appeared: > > /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp: > In static member function ?static constexpr __float128 > std::numeric_limits<__float128>::min()?: > /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57: > error: unable to find numeric literal operator ?operator""Q? > 54 | static constexpr __float128 min() noexcept { return > FLT128_MIN; } > > I'm attaching the log files to this email. > Pierre, It looks like we may need to test for FLT_MIN and FLT_MAX in configure since it looks like Alfredo's headers do not have them. Is this correct? Thanks, Matt > Could you please help me with this? > > bests regards > Alfredo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Thu Jan 5 13:39:14 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 5 Jan 2023 20:39:14 +0100 Subject: [petsc-users] error when trying to compile with HPDDM In-Reply-To: References: Message-ID: > On 5 Jan 2023, at 7:06 PM, Matthew Knepley wrote: > > On Thu, Jan 5, 2023 at 11:36 AM Alfredo Jaramillo > wrote: >> Dear developers, >> I'm trying to compile petsc together with the HPDDM library. A series on errors appeared: >> >> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp: In static member function ?static constexpr __float128 std::numeric_limits<__float128>::min()?: >> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57: error: unable to find numeric literal operator ?operator""Q? >> 54 | static constexpr __float128 min() noexcept { return FLT128_MIN; } >> >> I'm attaching the log files to this email. > > Pierre, > > It looks like we may need to test for FLT_MIN and FLT_MAX in configure since it looks like Alfredo's headers do not have them. > Is this correct? We could do that, but I bet this is a side effect of the fact that Alfredo is using --with-cxx-dialect=C++11. Alfredo, did you got that flag from someone else?s configure, or do you know what that flag is doing? - If yes, do you really need to stick to -std=c++11? - If no, please look at https://gitlab.com/petsc/petsc/-/issues/1284#note_1173803107 and consider removing that flag, or at least changing the option to --with-cxx-dialect=11. If compilation still fails, please send the up-to-date configure.log/make.log Thanks, Pierre > Thanks, > > Matt > >> Could you please help me with this? >> >> bests regards >> Alfredo > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Thu Jan 5 14:06:36 2023 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 5 Jan 2023 13:06:36 -0700 Subject: [petsc-users] error when trying to compile with HPDDM In-Reply-To: References: Message-ID: Hi Pierre, no, I don't really need that flag. I removed it and the installation process went well. I just noticed a "minor" detail when building SLEPc: "gmake[3]: warning: jobserver unavailable: using -j1. Add '+' to parent make rule" so the compilation of that library went slow. Thanks, Alfredo On Thu, Jan 5, 2023 at 12:39 PM Pierre Jolivet wrote: > > > On 5 Jan 2023, at 7:06 PM, Matthew Knepley wrote: > > On Thu, Jan 5, 2023 at 11:36 AM Alfredo Jaramillo < > ajaramillopalma at gmail.com> wrote: > >> Dear developers, >> I'm trying to compile petsc together with the HPDDM library. A series on >> errors appeared: >> >> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp: >> In static member function ?static constexpr __float128 >> std::numeric_limits<__float128>::min()?: >> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57: >> error: unable to find numeric literal operator ?operator""Q? >> 54 | static constexpr __float128 min() noexcept { return >> FLT128_MIN; } >> >> I'm attaching the log files to this email. >> > > Pierre, > > It looks like we may need to test for FLT_MIN and FLT_MAX in configure > since it looks like Alfredo's headers do not have them. > Is this correct? > > > We could do that, but I bet this is a side effect of the fact that Alfredo > is using --with-cxx-dialect=C++11. > Alfredo, did you got that flag from someone else?s configure, or do you > know what that flag is doing? > - If yes, do you really need to stick to -std=c++11? > - If no, please look at > https://gitlab.com/petsc/petsc/-/issues/1284#note_1173803107 and consider > removing that flag, or at least changing the option to > --with-cxx-dialect=11. If compilation still fails, please send the > up-to-date configure.log/make.log > > Thanks, > Pierre > > Thanks, > > Matt > > >> Could you please help me with this? >> >> bests regards >> Alfredo >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jan 5 14:42:01 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 05 Jan 2023 13:42:01 -0700 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: Message-ID: <87eds8hfxi.fsf@jedbrown.org> Mark Adams writes: > Support of HIP and CUDA hardware together would be crazy, I don't think it's remotely crazy. libCEED supports both together and it's very convenient when testing on a development machine that has one of each brand GPU and simplifies binary distribution for us and every package that uses us. Every day I wish PETSc could build with both simultaneously, but everyone tells me it's silly. From mlohry at gmail.com Thu Jan 5 14:42:34 2023 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 5 Jan 2023 15:42:34 -0500 Subject: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported Message-ID: I'm trying to compile the cuda example ./config/examples/arch-ci-linux-cuda-double-64idx.py --with-cudac=/usr/local/cuda-11.5/bin/nvcc and running make test passes the test ok diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy but the eager variant fails, pasted below. I get a similar error running my client code, pasted after. There when running with -info, it seems that some lazy initialization happens first, and i also call VecCreateSeqCuda which seems to have no issue. Any idea? This happens to be with an -sm 3.5 device if it matters, otherwise it's a recent cuda compiler+driver. petsc test code output: not ok sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # Error code: 97 # [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- # [0]PETSC ERROR: GPU error # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 15:22:33 2023 # [0]PETSC ERROR: Configure options --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-cuda=1 --with-precision=double --with-clanguage=c --with-cudac=/usr/local/cuda-11.5/bin/nvcc PETSC_ARCH=arch-ci-linux-cuda-double-64idx # [0]PETSC ERROR: #1 CUPMAwareMPI_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 # [0]PETSC ERROR: #2 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 # [0]PETSC ERROR: #3 init_device_id_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 # [0]PETSC ERROR: #4 getDevice() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 # [0]PETSC ERROR: #5 PetscDeviceCreate() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 # [0]PETSC ERROR: #9 PetscInitialize_Common() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 # [0]PETSC ERROR: #10 PetscInitialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 # [0]PETSC ERROR: #11 main() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 # [0]PETSC ERROR: PETSc Option Table entries: # [0]PETSC ERROR: -default_device_type host # [0]PETSC ERROR: -device_enable eager # [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- solver code output: [0] PetscDetermineInitialFPTrap(): Floating point trapping is off by default 0 [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType host available, initializing [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host initialized, default device id 0, view FALSE, init type lazy [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda available, initializing [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda initialized, default device id 0, view FALSE, init type lazy [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not available [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not available [0] PetscInitialize_Common(): PETSc successfully started: number of processors = 1 [0] PetscGetHostName(): Rejecting domainname, likely is NIS lancer.(none) [0] PetscInitialize_Common(): Running on machine: lancer # [Info] Petsc initialization complete. # [Trace] Timing: Starting solver... # [Info] RNG initial conditions have mean 0.000004, renormalizing. # [Trace] Timing: PetscTimeIntegrator initialization... # [Trace] Timing: Allocating Petsc CUDA arrays... [0] PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = 100000000 [0] configure(): Configured device 0 [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 seconds. [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 [0] PetscCommDuplicate(): Duplicating a communicator 1 4 max tags = 100000000 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 [0] DMGetDMTS(): Creating new DMTS [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMSNESWrite(): Copying DMSNES due to write # [Info] Initializing petsc with ode23 integrator # [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754 seconds. [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 [0] PetscDeviceContextSetupGlobalContext_Private(): Initializing global PetscDeviceContext with device type cuda [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: GPU error [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu Jan 5 15:39:14 2023 [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ --download-hwloc=1 [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ cupmcontext.cu:10 [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 [0]PETSC ERROR: #7 PetscDeviceContextGetCurrentContextAssertType_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ cupmcontext.cu:23 [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ veccuda2.cu:261 [0]PETSC ERROR: #10 VecMAXPY() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 [0]PETSC ERROR: #11 TSStep_RK() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 [0]PETSC ERROR: #12 TSStep() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 [0]PETSC ERROR: #13 TSSolve() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fellypao at yahoo.com.br Thu Jan 5 14:50:35 2023 From: fellypao at yahoo.com.br (Fellype) Date: Thu, 5 Jan 2023 20:50:35 +0000 (UTC) Subject: [petsc-users] How to install in /usr/lib64 instead of /usr/lib? References: <678527414.8071249.1672951835023.ref@mail.yahoo.com> Message-ID: <678527414.8071249.1672951835023@mail.yahoo.com> Hi, I'm building petsc from sources on a 64-bit Slackware Linux and I would like to know how to install the libraries in /usr/lib64 instead of /usr/lib. Is it possible? I've not found an option like --libdir=DIR to pass to ./configure. Regards, Fellype -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jan 5 15:00:44 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Jan 2023 15:00:44 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: <87eds8hfxi.fsf@jedbrown.org> References: <87eds8hfxi.fsf@jedbrown.org> Message-ID: On Thu, Jan 5, 2023 at 2:42 PM Jed Brown wrote: > Mark Adams writes: > > > Support of HIP and CUDA hardware together would be crazy, > > I don't think it's remotely crazy. libCEED supports both together and it's > very convenient when testing on a development machine that has one of each > brand GPU and simplifies binary distribution for us and every package that > uses us. Every day I wish PETSc could build with both simultaneously, but > everyone tells me it's silly. > So an executable supports both GPUs, but a running instance supports one or both at the same time? -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Jan 5 15:00:59 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 5 Jan 2023 15:00:59 -0600 (CST) Subject: [petsc-users] How to install in /usr/lib64 instead of /usr/lib? In-Reply-To: <678527414.8071249.1672951835023@mail.yahoo.com> References: <678527414.8071249.1672951835023.ref@mail.yahoo.com> <678527414.8071249.1672951835023@mail.yahoo.com> Message-ID: <4ab0b0a8-27de-a014-4e9e-35e4539b7c78@mcs.anl.gov> For now - perhaps the following patch... Satish --- diff --git a/config/install.py b/config/install.py index 017bb736542..00f857f939e 100755 --- a/config/install.py +++ b/config/install.py @@ -76,9 +76,9 @@ class Installer(script.Script): self.archBinDir = os.path.join(self.rootDir, self.arch, 'bin') self.archLibDir = os.path.join(self.rootDir, self.arch, 'lib') self.destIncludeDir = os.path.join(self.destDir, 'include') - self.destConfDir = os.path.join(self.destDir, 'lib','petsc','conf') - self.destLibDir = os.path.join(self.destDir, 'lib') - self.destBinDir = os.path.join(self.destDir, 'lib','petsc','bin') + self.destConfDir = os.path.join(self.destDir, 'lib64','petsc','conf') + self.destLibDir = os.path.join(self.destDir, 'lib64') + self.destBinDir = os.path.join(self.destDir, 'lib64','petsc','bin') self.installIncludeDir = os.path.join(self.installDir, 'include') self.installBinDir = os.path.join(self.installDir, 'lib','petsc','bin') self.rootShareDir = os.path.join(self.rootDir, 'share') On Thu, 5 Jan 2023, Fellype via petsc-users wrote: > Hi, > I'm building petsc from sources on a 64-bit Slackware Linux and I would like to know how to install the libraries in /usr/lib64 instead of /usr/lib. Is it possible? I've not found an option like --libdir=DIR to pass to ./configure. > > Regards, > Fellype From mlohry at gmail.com Thu Jan 5 16:30:24 2023 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 5 Jan 2023 17:30:24 -0500 Subject: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported In-Reply-To: References: Message-ID: I'm seeing the same thing on latest main with a different machine and -sm52 card, cuda 11.8. make check fails with the below, where the indicated line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, static_cast(device->deviceId))); in the initialize function. Running check examples to verify correct installation Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes 2,17c2,46 < 0 SNES Function norm 2.391552133017e-01 < 0 KSP Residual norm 2.928487269734e-01 < 1 KSP Residual norm 1.876489580142e-02 < 2 KSP Residual norm 3.291394847944e-03 < 3 KSP Residual norm 2.456493072124e-04 < 4 KSP Residual norm 1.161647147715e-05 < 5 KSP Residual norm 1.285648407621e-06 < 1 SNES Function norm 6.846805706142e-05 < 0 KSP Residual norm 2.292783790384e-05 < 1 KSP Residual norm 2.100673631699e-06 < 2 KSP Residual norm 2.121341386147e-07 < 3 KSP Residual norm 2.455932678957e-08 < 4 KSP Residual norm 1.753095730744e-09 < 5 KSP Residual norm 7.489214418904e-11 < 2 SNES Function norm 2.103908447865e-10 < Number of SNES iterations = 2 --- > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: command line > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: environment > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 source: command line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 +0000 > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu Jan 5 17:25:17 2023 > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 > [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ cupmcontext.cu:10 > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 > [0]PETSC ERROR: #7 GetHandleDispatch_() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 > [0]PETSC ERROR: #8 create() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 > [0]PETSC ERROR: #10 VecSetType() at /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 > [0]PETSC ERROR: #12 DMCreateGlobalVector() at /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 > [0]PETSC ERROR: #13 main() at ex19.c:149 On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: > I'm trying to compile the cuda example > > ./config/examples/arch-ci-linux-cuda-double-64idx.py > --with-cudac=/usr/local/cuda-11.5/bin/nvcc > > and running make test passes the test ok > diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy > but the eager variant fails, pasted below. > > I get a similar error running my client code, pasted after. There when > running with -info, it seems that some lazy initialization happens first, > and i also call VecCreateSeqCuda which seems to have no issue. > > Any idea? This happens to be with an -sm 3.5 device if it matters, > otherwise it's a recent cuda compiler+driver. > > > petsc test code output: > > > > not ok > sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # > Error code: 97 > # [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > # [0]PETSC ERROR: GPU error > # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not > supported > # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 > # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 15:22:33 > 2023 > # [0]PETSC ERROR: Configure options > --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 > --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g > -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 > --with-cuda=1 --with-precision=double --with-clanguage=c > --with-cudac=/usr/local/cuda-11.5/bin/nvcc > PETSC_ARCH=arch-ci-linux-cuda-double-64idx > # [0]PETSC ERROR: #1 CUPMAwareMPI_() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 > # [0]PETSC ERROR: #2 initialize() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 > # [0]PETSC ERROR: #3 init_device_id_() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 > # [0]PETSC ERROR: #4 getDevice() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 > # [0]PETSC ERROR: #5 PetscDeviceCreate() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 > # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 > # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 > # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 > # [0]PETSC ERROR: #9 PetscInitialize_Common() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 > # [0]PETSC ERROR: #10 PetscInitialize() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 > # [0]PETSC ERROR: #11 main() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 > # [0]PETSC ERROR: PETSc Option Table entries: > # [0]PETSC ERROR: -default_device_type host > # [0]PETSC ERROR: -device_enable eager > # [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > > > > > solver code output: > > > > [0] PetscDetermineInitialFPTrap(): Floating point trapping is off by > default 0 > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > host available, initializing > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host > initialized, default device id 0, view FALSE, init type lazy > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > cuda available, initializing > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda > initialized, default device id 0, view FALSE, init type lazy > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > hip not available > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > sycl not available > [0] PetscInitialize_Common(): PETSc successfully started: number of > processors = 1 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > lancer.(none) > [0] PetscInitialize_Common(): Running on machine: lancer > # [Info] Petsc initialization complete. > # [Trace] Timing: Starting solver... > # [Info] RNG initial conditions have mean 0.000004, renormalizing. > # [Trace] Timing: PetscTimeIntegrator initialization... > # [Trace] Timing: Allocating Petsc CUDA arrays... > [0] PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = > 100000000 > [0] configure(): Configured device 0 > [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 > # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 > seconds. > [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 > [0] PetscCommDuplicate(): Duplicating a communicator 1 4 max tags = > 100000000 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 > [0] DMGetDMTS(): Creating new DMTS > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 > [0] DMGetDMSNES(): Creating new DMSNES > [0] DMGetDMSNESWrite(): Copying DMSNES due to write > # [Info] Initializing petsc with ode23 integrator > # [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754 > seconds. > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 > [0] PetscDeviceContextSetupGlobalContext_Private(): Initializing > global PetscDeviceContext with device type cuda > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not > supported > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu Jan > 5 15:39:14 2023 > [0]PETSC ERROR: Configure options > PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc > PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ > --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS > COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 > --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc > --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ > --download-hwloc=1 > [0]PETSC ERROR: #1 initialize() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ > cupmcontext.cu:10 > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 > [0]PETSC ERROR: #7 > PetscDeviceContextGetCurrentContextAssertType_Internal() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 > [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ > cupmcontext.cu:23 > [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ > veccuda2.cu:261 > [0]PETSC ERROR: #10 VecMAXPY() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 > [0]PETSC ERROR: #11 TSStep_RK() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 > [0]PETSC ERROR: #12 TSStep() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 > [0]PETSC ERROR: #13 TSSolve() at > /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 5 17:01:50 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 5 Jan 2023 18:01:50 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: <87eds8hfxi.fsf@jedbrown.org> References: <87eds8hfxi.fsf@jedbrown.org> Message-ID: On Thu, Jan 5, 2023 at 3:42 PM Jed Brown wrote: > Mark Adams writes: > > > Support of HIP and CUDA hardware together would be crazy, > > I don't think it's remotely crazy. libCEED supports both together and it's > very convenient when testing on a development machine that has one of each > brand GPU and simplifies binary distribution for us and every package that > uses us. Every day I wish PETSc could build with both simultaneously, but > everyone tells me it's silly. > This is how I always understood our plan. I think it is crazy _not_ to do this. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jan 5 17:29:50 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 05 Jan 2023 16:29:50 -0700 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: <87eds8hfxi.fsf@jedbrown.org> Message-ID: <87bknch85t.fsf@jedbrown.org> Junchao Zhang writes: >> I don't think it's remotely crazy. libCEED supports both together and it's >> very convenient when testing on a development machine that has one of each >> brand GPU and simplifies binary distribution for us and every package that >> uses us. Every day I wish PETSc could build with both simultaneously, but >> everyone tells me it's silly. >> > > So an executable supports both GPUs, but a running instance supports one > or both at the same time? I personally only have reason to instantiate one at a time within a given executable, though libCEED supports both instantiated at the same time. From junchao.zhang at gmail.com Thu Jan 5 17:37:42 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Jan 2023 17:37:42 -0600 Subject: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported In-Reply-To: References: Message-ID: Jacob, is it because the cuda arch is too old? --Junchao Zhang On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote: > I'm seeing the same thing on latest main with a different machine and > -sm52 card, cuda 11.8. make check fails with the below, where the indicated > line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, > static_cast(device->deviceId))); in the initialize function. > > > Running check examples to verify correct installation > Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes > 2,17c2,46 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.928487269734e-01 > < 1 KSP Residual norm 1.876489580142e-02 > < 2 KSP Residual norm 3.291394847944e-03 > < 3 KSP Residual norm 2.456493072124e-04 > < 4 KSP Residual norm 1.161647147715e-05 > < 5 KSP Residual norm 1.285648407621e-06 > < 1 SNES Function norm 6.846805706142e-05 > < 0 KSP Residual norm 2.292783790384e-05 > < 1 KSP Residual norm 2.100673631699e-06 > < 2 KSP Residual norm 2.121341386147e-07 > < 3 KSP Residual norm 2.455932678957e-08 > < 4 KSP Residual norm 1.753095730744e-09 > < 5 KSP Residual norm 7.489214418904e-11 > < 2 SNES Function norm 2.103908447865e-10 > < Number of SNES iterations = 2 > --- > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: GPU error > > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not > supported > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling mistake, > etc! > > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: > command line > > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment > > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: > environment > > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 > source: command line > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb > GIT Date: 2023-01-05 17:22:48 +0000 > > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry > Thu Jan 5 17:25:17 2023 > > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 > > [0]PETSC ERROR: #1 initialize() at > /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 > > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at > /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ > cupmcontext.cu:10 > > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 > > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() > at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 > > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 > > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 > > [0]PETSC ERROR: #7 GetHandleDispatch_() at > /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 > > [0]PETSC ERROR: #8 create() at > /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 > > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at > /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 > > [0]PETSC ERROR: #10 VecSetType() at > /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 > > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at > /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 > > [0]PETSC ERROR: #12 DMCreateGlobalVector() at > /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 > > [0]PETSC ERROR: #13 main() at ex19.c:149 > > > On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: > >> I'm trying to compile the cuda example >> >> ./config/examples/arch-ci-linux-cuda-double-64idx.py >> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> >> and running make test passes the test ok >> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >> but the eager variant fails, pasted below. >> >> I get a similar error running my client code, pasted after. There when >> running with -info, it seems that some lazy initialization happens first, >> and i also call VecCreateSeqCuda which seems to have no issue. >> >> Any idea? This happens to be with an -sm 3.5 device if it matters, >> otherwise it's a recent cuda compiler+driver. >> >> >> petsc test code output: >> >> >> >> not ok >> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >> Error code: 97 >> # [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> # [0]PETSC ERROR: GPU error >> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >> shooting. >> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 15:22:33 >> 2023 >> # [0]PETSC ERROR: Configure options >> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 >> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g >> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 >> --with-cuda=1 --with-precision=double --with-clanguage=c >> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> PETSC_ARCH=arch-ci-linux-cuda-double-64idx >> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 >> # [0]PETSC ERROR: #2 initialize() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 >> # [0]PETSC ERROR: #3 init_device_id_() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 >> # [0]PETSC ERROR: #4 getDevice() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 >> # [0]PETSC ERROR: #5 PetscDeviceCreate() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 >> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 >> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 >> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 >> # [0]PETSC ERROR: #9 PetscInitialize_Common() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 >> # [0]PETSC ERROR: #10 PetscInitialize() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 >> # [0]PETSC ERROR: #11 main() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 >> # [0]PETSC ERROR: PETSc Option Table entries: >> # [0]PETSC ERROR: -default_device_type host >> # [0]PETSC ERROR: -device_enable eager >> # [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> >> >> >> >> >> solver code output: >> >> >> >> [0] PetscDetermineInitialFPTrap(): Floating point trapping is off >> by default 0 >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> host available, initializing >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >> host initialized, default device id 0, view FALSE, init type lazy >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> cuda available, initializing >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >> cuda initialized, default device id 0, view FALSE, init type lazy >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> hip not available >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> sycl not available >> [0] PetscInitialize_Common(): PETSc successfully started: number of >> processors = 1 >> [0] PetscGetHostName(): Rejecting domainname, likely is NIS >> lancer.(none) >> [0] PetscInitialize_Common(): Running on machine: lancer >> # [Info] Petsc initialization complete. >> # [Trace] Timing: Starting solver... >> # [Info] RNG initial conditions have mean 0.000004, renormalizing. >> # [Trace] Timing: PetscTimeIntegrator initialization... >> # [Trace] Timing: Allocating Petsc CUDA arrays... >> [0] PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = >> 100000000 >> [0] configure(): Configured device 0 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 >> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 >> seconds. >> [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 >> [0] PetscCommDuplicate(): Duplicating a communicator 1 4 max tags = >> 100000000 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] DMGetDMTS(): Creating new DMTS >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] DMGetDMSNES(): Creating new DMSNES >> [0] DMGetDMSNESWrite(): Copying DMSNES due to write >> # [Info] Initializing petsc with ode23 integrator >> # [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754 >> seconds. >> >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] PetscDeviceContextSetupGlobalContext_Private(): Initializing >> global PetscDeviceContext with device type cuda >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: GPU error >> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu >> Jan 5 15:39:14 2023 >> [0]PETSC ERROR: Configure options >> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc >> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ >> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS >> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 >> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ >> --download-hwloc=1 >> [0]PETSC ERROR: #1 initialize() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 >> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:10 >> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 >> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >> at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 >> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >> [0]PETSC ERROR: #7 >> PetscDeviceContextGetCurrentContextAssertType_Internal() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 >> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:23 >> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ >> veccuda2.cu:261 >> [0]PETSC ERROR: #10 VecMAXPY() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 >> [0]PETSC ERROR: #11 TSStep_RK() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 >> [0]PETSC ERROR: #12 TSStep() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 >> [0]PETSC ERROR: #13 TSSolve() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Jan 5 22:21:36 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 5 Jan 2023 23:21:36 -0500 Subject: [petsc-users] setting a vector with VecSetValue versus VecSetValues In-Reply-To: References: Message-ID: <56303D01-ED75-4F03-A452-A3C8C54A08E5@petsc.dev> Note there is also VecSetValuesLocal() that takes ghosted local indices (ghosted local indices are different from your meaning of "local indices"). See https://petsc.org/release/docs/manualpages/Vec/VecSetValuesLocal/ https://petsc.org/release/docs/manualpages/Vec/VecSetLocalToGlobalMapping/ Barry > On Jan 5, 2023, at 12:41 PM, Alfredo Jaramillo wrote: > > omg... for some reason I was thinking it takes local indices. Yes.. modifying that line the code works well. > > thank you! > Alfredo > > On Thu, Jan 5, 2023 at 10:38 AM Junchao Zhang > wrote: >> VecSetValue() also needs global indices, so you need PetscInt gl_row = (PetscInt)(i)+rstart; >> >> --Junchao Zhang >> >> >> On Thu, Jan 5, 2023 at 11:25 AM Alfredo Jaramillo > wrote: >>> dear PETSc developers, >>> >>> I have a code where I copy an array to a distributed petsc vector with the next lines: >>> >>> 1 for (int i = 0; i < ndof_local; i++) { >>> 2 PetscInt gl_row = (PetscInt)(i)+rstart; >>> 3 PetscScalar val = (PetscScalar)u[i]; >>> 4 VecSetValues(x,1,&gl_row,&val,INSERT_VALUES); >>> 5 } >>> >>> // for (int i = 0; i < ndof_local; i++) { >>> // PetscInt gl_row = (PetscInt)(i); >>> // PetscScalar val = (PetscScalar)u[i]; >>> // VecSetValue(x,gl_row,val,INSERT_VALUES); >>> // } >>> >>> VecAssemblyBegin(x); >>> VecAssemblyEnd(x); >>> >>> This works as expected. If, instead of using lines 1-5, I use the lines where VecSetValue is used with local indices, then the vector is null on all the processes but rank 0, and the piece of information at rank zero is incorrect. >>> >>> What could I be doing wrong? >>> >>> bests regards >>> Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Jan 5 22:31:53 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 5 Jan 2023 23:31:53 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: <87eds8hfxi.fsf@jedbrown.org> References: <87eds8hfxi.fsf@jedbrown.org> Message-ID: <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> > On Jan 5, 2023, at 3:42 PM, Jed Brown wrote: > > Mark Adams writes: > >> Support of HIP and CUDA hardware together would be crazy, > > I don't think it's remotely crazy. libCEED supports both together and it's very convenient when testing on a development machine that has one of each brand GPU and simplifies binary distribution for us and every package that uses us. Every day I wish PETSc could build with both simultaneously, but everyone tells me it's silly. Not everyone at all; just a subset of everyone. Junchao is really the hold-out :-) I just don't care about "binary packages" :-); I think they are an archaic and bad way of thinking about code distribution (but yes the alternatives need lots of work to make them flawless, but I think that is where the work should go in the packaging world.) I go further and think one should be able to automatically use a CUDA vector on a HIP device as well, it is not hard in theory but requires thinking about how we handle classes and subclasses a little to make it straightforward; or perhaps Jacob has fixed that also? From junchao.zhang at gmail.com Thu Jan 5 22:50:48 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Jan 2023 22:50:48 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> References: <87eds8hfxi.fsf@jedbrown.org> <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> Message-ID: On Thu, Jan 5, 2023 at 10:32 PM Barry Smith wrote: > > > > On Jan 5, 2023, at 3:42 PM, Jed Brown wrote: > > > > Mark Adams writes: > > > >> Support of HIP and CUDA hardware together would be crazy, > > > > I don't think it's remotely crazy. libCEED supports both together and > it's very convenient when testing on a development machine that has one of > each brand GPU and simplifies binary distribution for us and every package > that uses us. Every day I wish PETSc could build with both simultaneously, > but everyone tells me it's silly. > > Not everyone at all; just a subset of everyone. Junchao is really the > hold-out :-) > I am not, instead I think we should try (I fully agree it can ease binary distribution). But satish needs to install such a machine first :) There are issues out of our control if we want to mix GPUs in execution. For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we do it on the host? Also, there are no gpu-aware MPI implementations supporting messages between cuda memory and hip memory. > > I just don't care about "binary packages" :-); I think they are an > archaic and bad way of thinking about code distribution (but yes the > alternatives need lots of work to make them flawless, but I think that is > where the work should go in the packaging world.) > > I go further and think one should be able to automatically use a CUDA > vector on a HIP device as well, it is not hard in theory but requires > thinking about how we handle classes and subclasses a little to make it > straightforward; or perhaps Jacob has fixed that also? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Jan 5 23:27:06 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 6 Jan 2023 00:27:06 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: <87eds8hfxi.fsf@jedbrown.org> <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> Message-ID: <7130D391-34D2-4459-99D9-E09F0D20A987@petsc.dev> So Jed's "everyone" now consists of "no one" and Jed can stop complaining that "everyone" thinks it is a bad idea. > On Jan 5, 2023, at 11:50 PM, Junchao Zhang wrote: > > > > > On Thu, Jan 5, 2023 at 10:32 PM Barry Smith > wrote: >> >> >> > On Jan 5, 2023, at 3:42 PM, Jed Brown > wrote: >> > >> > Mark Adams > writes: >> > >> >> Support of HIP and CUDA hardware together would be crazy, >> > >> > I don't think it's remotely crazy. libCEED supports both together and it's very convenient when testing on a development machine that has one of each brand GPU and simplifies binary distribution for us and every package that uses us. Every day I wish PETSc could build with both simultaneously, but everyone tells me it's silly. >> >> Not everyone at all; just a subset of everyone. Junchao is really the hold-out :-) > I am not, instead I think we should try (I fully agree it can ease binary distribution). But satish needs to install such a machine first :) > There are issues out of our control if we want to mix GPUs in execution. For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we do it on the host? Also, there are no gpu-aware MPI implementations supporting messages between cuda memory and hip memory. >> >> I just don't care about "binary packages" :-); I think they are an archaic and bad way of thinking about code distribution (but yes the alternatives need lots of work to make them flawless, but I think that is where the work should go in the packaging world.) >> >> I go further and think one should be able to automatically use a CUDA vector on a HIP device as well, it is not hard in theory but requires thinking about how we handle classes and subclasses a little to make it straightforward; or perhaps Jacob has fixed that also? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 01:27:56 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 02:27:56 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets Message-ID: Hi Petsc Users, I'm working with a dmplex system with a subsampled mesh distributed with an overlap of 1. I'm encountering unusual situations when using VecGetOwnershipRange to adjust the offset received from a global section. The logic of the following code is first to get the offset needed to index a global vector while still being able to check if it is an overlapped cell and skip if needed while counting the owned cells. call DMGetGlobalSection(dmplex,section,ierr) call VecGetArrayF90(stateVec,stateVecV,ierr) call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) do i = c0, (c1-1) call PetscSectionGetOffset(section,i,offset,ierr) write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart if(offset<0) then cycle endif offset=offset-oStart plexcells=plexcells+1 stateVecV(offset)= enddo I'm noticing some very weird results that I've appended below. The GetOffset documentation notes that a negative offset indicates an unowned point (which I use to cycle). However, the offset subtraction with oStart will yield an illegal index for the Vector access. I see that on the documentation for GetOwnershipRange, it notes that this may be "ill-defined" but I wanted to see if this is type of ill-defined I can expect or there is just something terribly wrong with my PetscSection.(both the Vec and Section were produced from DMPlexDistributeField so should by definition have synchronized section information) I was wondering if there is a possible output and/or the best way to index the vector. I'm thinking of subtracting the offset of cell 0 perhaps? on rank 0 cell 0 offset 0 oStart 0 0 cell 1 offset 55 oStart 0 55 cell 2 offset 110 oStart 0 110 cell 3 offset 165 oStart 0 165 cell 4 offset 220 oStart 0 220 cell 5 offset 275 oStart 0 275 cell 6 offset 330 oStart 0 330 cell 7 offset 385 oStart 0 385 cell 8 offset 440 oStart 0 440 cell 9 offset 495 oStart 0 495 cell 10 offset 550 oStart 0 550 cell 11 offset 605 oStart 0 605 cell 12 offset 660 oStart 0 660 cell 13 offset 715 oStart 0 715 and on rank one cell 0 offset 2475 oStart 2640 -165 cell 1 offset 2530 oStart 2640 -110 cell 2 offset 2585 oStart 2640 -55 cell 3 offset 2640 oStart 2640 0 cell 4 offset 2695 oStart 2640 55 cell 5 offset 2750 oStart 2640 110 cell 6 offset 2805 oStart 2640 165 cell 7 offset 2860 oStart 2640 220 cell 8 offset 2915 oStart 2640 275 cell 9 offset 2970 oStart 2640 330 cell 10 offset 3025 oStart 2640 385 cell 11 offset 3080 oStart 2640 440 cell 12 offset 3135 oStart 2640 495 cell 13 offset 3190 oStart 2640 550 cell 14 offset 3245 oStart 2640 605 cell 15 offset -771 oStart 2640 -3411 Sincerely Nicholas -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Jan 6 03:09:18 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 6 Jan 2023 10:09:18 +0100 Subject: [petsc-users] error when trying to compile with HPDDM In-Reply-To: References: Message-ID: <2CDE8F62-87E2-4C15-AF85-E4DC1E2DD9F7@dsic.upv.es> This happens because you have typed 'make -j128'. If you just do 'make', PETSc will choose a reasonable value (-j59 in your case). Satish: do we want to support this use case? Then a possible fix is: diff --git a/config/BuildSystem/config/packages/slepc.py b/config/BuildSystem/config/packages/slepc.py index b7c80930750..2688403f908 100644 --- a/config/BuildSystem/config/packages/slepc.py +++ b/config/BuildSystem/config/packages/slepc.py @@ -63,7 +63,7 @@ class Configure(config.package.Package): self.addMakeRule('slepcbuild','', \ ['@echo "*** Building SLEPc ***"',\ '@${RM} ${PETSC_ARCH}/lib/petsc/conf/slepc.errorflg',\ - '@(cd '+self.packageDir+' && \\\n\ + '+@(cd '+self.packageDir+' && \\\n\ '+carg+self.python.pyexe+' ./configure --prefix='+prefix+' '+configargs+' && \\\n\ '+barg+'${OMAKE} '+barg+') || \\\n\ (echo "**************************ERROR*************************************" && \\\n\ > El 5 ene 2023, a las 21:06, Alfredo Jaramillo escribi?: > > Hi Pierre, no, I don't really need that flag. I removed it and the installation process went well. I just noticed a "minor" detail when building SLEPc: > > "gmake[3]: warning: jobserver unavailable: using -j1. Add '+' to parent make rule" > > so the compilation of that library went slow. > > Thanks, > Alfredo From mlohry at gmail.com Fri Jan 6 08:17:35 2023 From: mlohry at gmail.com (Mark Lohry) Date: Fri, 6 Jan 2023 09:17:35 -0500 Subject: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported In-Reply-To: References: Message-ID: It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause? One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked. On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang wrote: > Jacob, is it because the cuda arch is too old? > > --Junchao Zhang > > > On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote: > >> I'm seeing the same thing on latest main with a different machine and >> -sm52 card, cuda 11.8. make check fails with the below, where the indicated >> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, >> static_cast(device->deviceId))); in the initialize function. >> >> >> Running check examples to verify correct installation >> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug >> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >> processes >> 2,17c2,46 >> < 0 SNES Function norm 2.391552133017e-01 >> < 0 KSP Residual norm 2.928487269734e-01 >> < 1 KSP Residual norm 1.876489580142e-02 >> < 2 KSP Residual norm 3.291394847944e-03 >> < 3 KSP Residual norm 2.456493072124e-04 >> < 4 KSP Residual norm 1.161647147715e-05 >> < 5 KSP Residual norm 1.285648407621e-06 >> < 1 SNES Function norm 6.846805706142e-05 >> < 0 KSP Residual norm 2.292783790384e-05 >> < 1 KSP Residual norm 2.100673631699e-06 >> < 2 KSP Residual norm 2.121341386147e-07 >> < 3 KSP Residual norm 2.455932678957e-08 >> < 4 KSP Residual norm 1.753095730744e-09 >> < 5 KSP Residual norm 7.489214418904e-11 >> < 2 SNES Function norm 2.103908447865e-10 >> < Number of SNES iterations = 2 >> --- >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: GPU error >> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >> Could be the program crashed before they were used or a spelling mistake, >> etc! >> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 >> source: command line >> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment >> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: >> environment >> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 >> source: command line >> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >> shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb >> GIT Date: 2023-01-05 17:22:48 +0000 >> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry >> Thu Jan 5 17:25:17 2023 >> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 >> > [0]PETSC ERROR: #1 initialize() at >> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 >> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:10 >> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 >> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 >> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >> > [0]PETSC ERROR: #7 GetHandleDispatch_() at >> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 >> > [0]PETSC ERROR: #8 create() at >> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 >> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at >> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 >> > [0]PETSC ERROR: #10 VecSetType() at >> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 >> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at >> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 >> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at >> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 >> > [0]PETSC ERROR: #13 main() at ex19.c:149 >> >> >> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: >> >>> I'm trying to compile the cuda example >>> >>> ./config/examples/arch-ci-linux-cuda-double-64idx.py >>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> >>> and running make test passes the test ok >>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >>> but the eager variant fails, pasted below. >>> >>> I get a similar error running my client code, pasted after. There when >>> running with -info, it seems that some lazy initialization happens first, >>> and i also call VecCreateSeqCuda which seems to have no issue. >>> >>> Any idea? This happens to be with an -sm 3.5 device if it matters, >>> otherwise it's a recent cuda compiler+driver. >>> >>> >>> petsc test code output: >>> >>> >>> >>> not ok >>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >>> Error code: 97 >>> # [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> # [0]PETSC ERROR: GPU error >>> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>> shooting. >>> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 >>> 15:22:33 2023 >>> # [0]PETSC ERROR: Configure options >>> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 >>> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g >>> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 >>> --with-cuda=1 --with-precision=double --with-clanguage=c >>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> PETSC_ARCH=arch-ci-linux-cuda-double-64idx >>> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 >>> # [0]PETSC ERROR: #2 initialize() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 >>> # [0]PETSC ERROR: #3 init_device_id_() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 >>> # [0]PETSC ERROR: #4 getDevice() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 >>> # [0]PETSC ERROR: #5 PetscDeviceCreate() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 >>> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 >>> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 >>> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 >>> # [0]PETSC ERROR: #9 PetscInitialize_Common() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 >>> # [0]PETSC ERROR: #10 PetscInitialize() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 >>> # [0]PETSC ERROR: #11 main() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 >>> # [0]PETSC ERROR: PETSc Option Table entries: >>> # [0]PETSC ERROR: -default_device_type host >>> # [0]PETSC ERROR: -device_enable eager >>> # [0]PETSC ERROR: ----------------End of Error Message -------send >>> entire error message to petsc-maint at mcs.anl.gov---------- >>> >>> >>> >>> >>> >>> solver code output: >>> >>> >>> >>> [0] PetscDetermineInitialFPTrap(): Floating point trapping is off >>> by default 0 >>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType host available, initializing >>> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>> host initialized, default device id 0, view FALSE, init type lazy >>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType cuda available, initializing >>> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>> cuda initialized, default device id 0, view FALSE, init type lazy >>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType hip not available >>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType sycl not available >>> [0] PetscInitialize_Common(): PETSc successfully started: number >>> of processors = 1 >>> [0] PetscGetHostName(): Rejecting domainname, likely is NIS >>> lancer.(none) >>> [0] PetscInitialize_Common(): Running on machine: lancer >>> # [Info] Petsc initialization complete. >>> # [Trace] Timing: Starting solver... >>> # [Info] RNG initial conditions have mean 0.000004, renormalizing. >>> # [Trace] Timing: PetscTimeIntegrator initialization... >>> # [Trace] Timing: Allocating Petsc CUDA arrays... >>> [0] PetscCommDuplicate(): Duplicating a communicator 2 3 max tags >>> = 100000000 >>> [0] configure(): Configured device 0 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 >>> seconds. >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1 4 max tags >>> = 100000000 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] DMGetDMTS(): Creating new DMTS >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] DMGetDMSNES(): Creating new DMSNES >>> [0] DMGetDMSNESWrite(): Copying DMSNES due to write >>> # [Info] Initializing petsc with ode23 integrator >>> # [Trace] Timing: PetscTimeIntegrator initialization finished in >>> 0.016754 seconds. >>> >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] PetscDeviceContextSetupGlobalContext_Private(): >>> Initializing global PetscDeviceContext with device type cuda >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: GPU error >>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu >>> Jan 5 15:39:14 2023 >>> [0]PETSC ERROR: Configure options >>> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc >>> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ >>> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS >>> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 >>> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ >>> --download-hwloc=1 >>> [0]PETSC ERROR: #1 initialize() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 >>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:10 >>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 >>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >>> at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 >>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>> [0]PETSC ERROR: #7 >>> PetscDeviceContextGetCurrentContextAssertType_Internal() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 >>> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:23 >>> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ >>> veccuda2.cu:261 >>> [0]PETSC ERROR: #10 VecMAXPY() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 >>> [0]PETSC ERROR: #11 TSStep_RK() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 >>> [0]PETSC ERROR: #12 TSStep() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 >>> [0]PETSC ERROR: #13 TSSolve() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 08:20:18 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 09:20:18 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Petsc Users, > > I'm working with a dmplex system with a subsampled mesh distributed with > an overlap of 1. > > I'm encountering unusual situations when using VecGetOwnershipRange to > adjust the offset received from a global section. The logic of the > following code is first to get the offset needed to index a global vector > while still being able to check if it is an overlapped cell and skip if > needed while counting the owned cells. > > > call DMGetGlobalSection(dmplex,section,ierr) > call VecGetArrayF90(stateVec,stateVecV,ierr) > call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) > do i = c0, (c1-1) > > call PetscSectionGetOffset(section,i,offset,ierr) > write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart > > if(offset<0) then > cycle > endif > offset=offset-oStart > plexcells=plexcells+1 > stateVecV(offset)= enddo > > I'm noticing some very weird results that I've appended below. The > GetOffset documentation notes that a negative offset indicates an unowned > point (which I use to cycle). However, the offset subtraction with oStart > will yield an illegal index for the Vector access. I see that on the > documentation for GetOwnershipRange, it notes that this may be > "ill-defined" but I wanted to see if this is type of ill-defined I can > expect or there is just something terribly wrong with my PetscSection.(both > the Vec and Section were produced from DMPlexDistributeField so should by > definition have synchronized section information) I was wondering if there > is a possible output and/or the best way to index the vector. I'm thinking > of subtracting the offset of cell 0 perhaps? > Can you show your vector sizes? Are you sure it is not the fact that F90 arrays use 1-based indices, but these are 0-based offsets? Thanks, Matt > on rank 0 > > cell 0 offset 0 oStart 0 0 > cell 1 offset 55 oStart 0 55 > cell 2 offset 110 oStart 0 110 > cell 3 offset 165 oStart 0 165 > cell 4 offset 220 oStart 0 220 > cell 5 offset 275 oStart 0 275 > cell 6 offset 330 oStart 0 330 > cell 7 offset 385 oStart 0 385 > cell 8 offset 440 oStart 0 440 > cell 9 offset 495 oStart 0 495 > cell 10 offset 550 oStart 0 550 > cell 11 offset 605 oStart 0 605 > cell 12 offset 660 oStart 0 660 > cell 13 offset 715 oStart 0 715 > > and on rank one > cell 0 offset 2475 oStart 2640 -165 > cell 1 offset 2530 oStart 2640 -110 > cell 2 offset 2585 oStart 2640 -55 > cell 3 offset 2640 oStart 2640 0 > cell 4 offset 2695 oStart 2640 55 > cell 5 offset 2750 oStart 2640 110 > cell 6 offset 2805 oStart 2640 165 > cell 7 offset 2860 oStart 2640 220 > cell 8 offset 2915 oStart 2640 275 > cell 9 offset 2970 oStart 2640 330 > cell 10 offset 3025 oStart 2640 385 > cell 11 offset 3080 oStart 2640 440 > cell 12 offset 3135 oStart 2640 495 > cell 13 offset 3190 oStart 2640 550 > cell 14 offset 3245 oStart 2640 605 > cell 15 offset -771 oStart 2640 -3411 > > > Sincerely > Nicholas > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 08:36:42 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 09:36:42 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: Hi Matt I made a typo on the line statVecV(offset) = in my example, I agree. (I wrote that offhand since the actual assignment is much larger) I should be statVecV(offset+1) = so I'm confident it's not a 1 0 indexing thing. My question is more related to what is happening in the offsets. c0 and c1 are pulled using DMplexgetheight stratum, so they are zero-indexed (which is why I loop from c0 to (c1-1)). For the size inquiries. on processor 0 Petsc VecGetSize(stateVec) 5390 size(stateVecV) 2640 on processor 1 Petsc VecGetSize 5390 size(stateVecV) 2750 It's quite weird to me that processor one can have a positive offset that is less than its starting ownership index (in the initial email output). Thanks for the assistance Nicholas On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Petsc Users, >> >> I'm working with a dmplex system with a subsampled mesh distributed with >> an overlap of 1. >> >> I'm encountering unusual situations when using VecGetOwnershipRange to >> adjust the offset received from a global section. The logic of the >> following code is first to get the offset needed to index a global vector >> while still being able to check if it is an overlapped cell and skip if >> needed while counting the owned cells. >> > > >> >> call DMGetGlobalSection(dmplex,section,ierr) >> call VecGetArrayF90(stateVec,stateVecV,ierr) >> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >> do i = c0, (c1-1) >> >> call PetscSectionGetOffset(section,i,offset,ierr) >> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >> >> if(offset<0) then >> cycle >> endif >> offset=offset-oStart >> plexcells=plexcells+1 >> stateVecV(offset)= enddo >> >> I'm noticing some very weird results that I've appended below. The >> GetOffset documentation notes that a negative offset indicates an unowned >> point (which I use to cycle). However, the offset subtraction with oStart >> will yield an illegal index for the Vector access. I see that on the >> documentation for GetOwnershipRange, it notes that this may be >> "ill-defined" but I wanted to see if this is type of ill-defined I can >> expect or there is just something terribly wrong with my PetscSection.(both >> the Vec and Section were produced from DMPlexDistributeField so should by >> definition have synchronized section information) I was wondering if there >> is a possible output and/or the best way to index the vector. I'm thinking >> of subtracting the offset of cell 0 perhaps? >> > > Can you show your vector sizes? Are you sure it is not the fact that F90 > arrays use 1-based indices, but these are 0-based offsets? > > Thanks, > > Matt > > >> on rank 0 >> >> cell 0 offset 0 oStart 0 0 >> cell 1 offset 55 oStart 0 55 >> cell 2 offset 110 oStart 0 110 >> cell 3 offset 165 oStart 0 165 >> cell 4 offset 220 oStart 0 220 >> cell 5 offset 275 oStart 0 275 >> cell 6 offset 330 oStart 0 330 >> cell 7 offset 385 oStart 0 385 >> cell 8 offset 440 oStart 0 440 >> cell 9 offset 495 oStart 0 495 >> cell 10 offset 550 oStart 0 550 >> cell 11 offset 605 oStart 0 605 >> cell 12 offset 660 oStart 0 660 >> cell 13 offset 715 oStart 0 715 >> >> and on rank one >> cell 0 offset 2475 oStart 2640 -165 >> cell 1 offset 2530 oStart 2640 -110 >> cell 2 offset 2585 oStart 2640 -55 >> cell 3 offset 2640 oStart 2640 0 >> cell 4 offset 2695 oStart 2640 55 >> cell 5 offset 2750 oStart 2640 110 >> cell 6 offset 2805 oStart 2640 165 >> cell 7 offset 2860 oStart 2640 220 >> cell 8 offset 2915 oStart 2640 275 >> cell 9 offset 2970 oStart 2640 330 >> cell 10 offset 3025 oStart 2640 385 >> cell 11 offset 3080 oStart 2640 440 >> cell 12 offset 3135 oStart 2640 495 >> cell 13 offset 3190 oStart 2640 550 >> cell 14 offset 3245 oStart 2640 605 >> cell 15 offset -771 oStart 2640 -3411 >> >> >> Sincerely >> Nicholas >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 08:50:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 09:50:51 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Matt > > I made a typo on the line statVecV(offset) = in my > example, I agree. (I wrote that offhand since the actual assignment is much > larger) I should be statVecV(offset+1) = so I'm confident it's > not a 1 0 indexing thing. > > My question is more related to what is happening in the offsets. c0 and c1 > are pulled using DMplexgetheight stratum, so they are zero-indexed (which > is why I loop from c0 to (c1-1)). > > For the size inquiries. on processor 0 > Petsc VecGetSize(stateVec) 5390 > I need to see VecGetLocalSize() Matt > size(stateVecV) 2640 > > on processor 1 > Petsc VecGetSize 5390 > size(stateVecV) 2750 > > It's quite weird to me that processor one can have a positive offset that > is less than its starting ownership index (in the initial email output). > > Thanks for the assistance > Nicholas > > > On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley wrote: > >> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Petsc Users, >>> >>> I'm working with a dmplex system with a subsampled mesh distributed with >>> an overlap of 1. >>> >>> I'm encountering unusual situations when using VecGetOwnershipRange to >>> adjust the offset received from a global section. The logic of the >>> following code is first to get the offset needed to index a global vector >>> while still being able to check if it is an overlapped cell and skip if >>> needed while counting the owned cells. >>> >> >> >>> >>> call DMGetGlobalSection(dmplex,section,ierr) >>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>> do i = c0, (c1-1) >>> >>> call PetscSectionGetOffset(section,i,offset,ierr) >>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>> >>> if(offset<0) then >>> cycle >>> endif >>> offset=offset-oStart >>> plexcells=plexcells+1 >>> stateVecV(offset)= enddo >>> >>> I'm noticing some very weird results that I've appended below. The >>> GetOffset documentation notes that a negative offset indicates an unowned >>> point (which I use to cycle). However, the offset subtraction with oStart >>> will yield an illegal index for the Vector access. I see that on the >>> documentation for GetOwnershipRange, it notes that this may be >>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>> expect or there is just something terribly wrong with my PetscSection.(both >>> the Vec and Section were produced from DMPlexDistributeField so should by >>> definition have synchronized section information) I was wondering if there >>> is a possible output and/or the best way to index the vector. I'm thinking >>> of subtracting the offset of cell 0 perhaps? >>> >> >> Can you show your vector sizes? Are you sure it is not the fact that F90 >> arrays use 1-based indices, but these are 0-based offsets? >> >> Thanks, >> >> Matt >> >> >>> on rank 0 >>> >>> cell 0 offset 0 oStart 0 0 >>> cell 1 offset 55 oStart 0 55 >>> cell 2 offset 110 oStart 0 110 >>> cell 3 offset 165 oStart 0 165 >>> cell 4 offset 220 oStart 0 220 >>> cell 5 offset 275 oStart 0 275 >>> cell 6 offset 330 oStart 0 330 >>> cell 7 offset 385 oStart 0 385 >>> cell 8 offset 440 oStart 0 440 >>> cell 9 offset 495 oStart 0 495 >>> cell 10 offset 550 oStart 0 550 >>> cell 11 offset 605 oStart 0 605 >>> cell 12 offset 660 oStart 0 660 >>> cell 13 offset 715 oStart 0 715 >>> >>> and on rank one >>> cell 0 offset 2475 oStart 2640 -165 >>> cell 1 offset 2530 oStart 2640 -110 >>> cell 2 offset 2585 oStart 2640 -55 >>> cell 3 offset 2640 oStart 2640 0 >>> cell 4 offset 2695 oStart 2640 55 >>> cell 5 offset 2750 oStart 2640 110 >>> cell 6 offset 2805 oStart 2640 165 >>> cell 7 offset 2860 oStart 2640 220 >>> cell 8 offset 2915 oStart 2640 275 >>> cell 9 offset 2970 oStart 2640 330 >>> cell 10 offset 3025 oStart 2640 385 >>> cell 11 offset 3080 oStart 2640 440 >>> cell 12 offset 3135 oStart 2640 495 >>> cell 13 offset 3190 oStart 2640 550 >>> cell 14 offset 3245 oStart 2640 605 >>> cell 15 offset -771 oStart 2640 -3411 >>> >>> >>> Sincerely >>> Nicholas >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Fri Jan 6 08:55:33 2023 From: mlohry at gmail.com (Mark Lohry) Date: Fri, 6 Jan 2023 09:55:33 -0500 Subject: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported In-Reply-To: References: Message-ID: These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization. Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea? On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry wrote: > It built+ran fine on a different system with an sm75 arch. Is there a > documented minimum version if that indeed is the cause? > > One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, > due to cusprase removing csrsv2Info_t (although it's still referenced in > their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 > worked. > > > On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang > wrote: > >> Jacob, is it because the cuda arch is too old? >> >> --Junchao Zhang >> >> >> On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote: >> >>> I'm seeing the same thing on latest main with a different machine and >>> -sm52 card, cuda 11.8. make check fails with the below, where the indicated >>> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, >>> static_cast(device->deviceId))); in the initialize function. >>> >>> >>> Running check examples to verify correct installation >>> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug >>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >>> processes >>> 2,17c2,46 >>> < 0 SNES Function norm 2.391552133017e-01 >>> < 0 KSP Residual norm 2.928487269734e-01 >>> < 1 KSP Residual norm 1.876489580142e-02 >>> < 2 KSP Residual norm 3.291394847944e-03 >>> < 3 KSP Residual norm 2.456493072124e-04 >>> < 4 KSP Residual norm 1.161647147715e-05 >>> < 5 KSP Residual norm 1.285648407621e-06 >>> < 1 SNES Function norm 6.846805706142e-05 >>> < 0 KSP Residual norm 2.292783790384e-05 >>> < 1 KSP Residual norm 2.100673631699e-06 >>> < 2 KSP Residual norm 2.121341386147e-07 >>> < 3 KSP Residual norm 2.455932678957e-08 >>> < 4 KSP Residual norm 1.753095730744e-09 >>> < 5 KSP Residual norm 7.489214418904e-11 >>> < 2 SNES Function norm 2.103908447865e-10 >>> < Number of SNES iterations = 2 >>> --- >>> > [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > [0]PETSC ERROR: GPU error >>> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >>> Could be the program crashed before they were used or a spelling mistake, >>> etc! >>> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 >>> source: command line >>> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment >>> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: >>> environment >>> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 >>> source: command line >>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>> shooting. >>> > [0]PETSC ERROR: Petsc Development GIT revision: >>> v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 +0000 >>> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry >>> Thu Jan 5 17:25:17 2023 >>> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 >>> > [0]PETSC ERROR: #1 initialize() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 >>> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:10 >>> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 >>> > [0]PETSC ERROR: #4 >>> PetscDeviceContextSetDefaultDeviceForType_Internal() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 >>> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>> > [0]PETSC ERROR: #7 GetHandleDispatch_() at >>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 >>> > [0]PETSC ERROR: #8 create() at >>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 >>> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at >>> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 >>> > [0]PETSC ERROR: #10 VecSetType() at >>> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 >>> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at >>> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 >>> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at >>> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 >>> > [0]PETSC ERROR: #13 main() at ex19.c:149 >>> >>> >>> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: >>> >>>> I'm trying to compile the cuda example >>>> >>>> ./config/examples/arch-ci-linux-cuda-double-64idx.py >>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>>> >>>> and running make test passes the test ok >>>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >>>> but the eager variant fails, pasted below. >>>> >>>> I get a similar error running my client code, pasted after. There when >>>> running with -info, it seems that some lazy initialization happens first, >>>> and i also call VecCreateSeqCuda which seems to have no issue. >>>> >>>> Any idea? This happens to be with an -sm 3.5 device if it matters, >>>> otherwise it's a recent cuda compiler+driver. >>>> >>>> >>>> petsc test code output: >>>> >>>> >>>> >>>> not ok >>>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >>>> Error code: 97 >>>> # [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> # [0]PETSC ERROR: GPU error >>>> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation >>>> not supported >>>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>> shooting. >>>> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>>> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 >>>> 15:22:33 2023 >>>> # [0]PETSC ERROR: Configure options >>>> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 >>>> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g >>>> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 >>>> --with-cuda=1 --with-precision=double --with-clanguage=c >>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>>> PETSC_ARCH=arch-ci-linux-cuda-double-64idx >>>> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 >>>> # [0]PETSC ERROR: #2 initialize() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 >>>> # [0]PETSC ERROR: #3 init_device_id_() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 >>>> # [0]PETSC ERROR: #4 getDevice() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 >>>> # [0]PETSC ERROR: #5 PetscDeviceCreate() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 >>>> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 >>>> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 >>>> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 >>>> # [0]PETSC ERROR: #9 PetscInitialize_Common() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 >>>> # [0]PETSC ERROR: #10 PetscInitialize() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 >>>> # [0]PETSC ERROR: #11 main() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 >>>> # [0]PETSC ERROR: PETSc Option Table entries: >>>> # [0]PETSC ERROR: -default_device_type host >>>> # [0]PETSC ERROR: -device_enable eager >>>> # [0]PETSC ERROR: ----------------End of Error Message -------send >>>> entire error message to petsc-maint at mcs.anl.gov---------- >>>> >>>> >>>> >>>> >>>> >>>> solver code output: >>>> >>>> >>>> >>>> [0] PetscDetermineInitialFPTrap(): Floating point trapping is off >>>> by default 0 >>>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType host available, initializing >>>> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>>> host initialized, default device id 0, view FALSE, init type lazy >>>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType cuda available, initializing >>>> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>>> cuda initialized, default device id 0, view FALSE, init type lazy >>>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType hip not available >>>> [0] PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType sycl not available >>>> [0] PetscInitialize_Common(): PETSc successfully started: number >>>> of processors = 1 >>>> [0] PetscGetHostName(): Rejecting domainname, likely is NIS >>>> lancer.(none) >>>> [0] PetscInitialize_Common(): Running on machine: lancer >>>> # [Info] Petsc initialization complete. >>>> # [Trace] Timing: Starting solver... >>>> # [Info] RNG initial conditions have mean 0.000004, renormalizing. >>>> # [Trace] Timing: PetscTimeIntegrator initialization... >>>> # [Trace] Timing: Allocating Petsc CUDA arrays... >>>> [0] PetscCommDuplicate(): Duplicating a communicator 2 3 max tags >>>> = 100000000 >>>> [0] configure(): Configured device 0 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>>> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 >>>> seconds. >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1 4 max tags >>>> = 100000000 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] DMGetDMTS(): Creating new DMTS >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] DMGetDMSNES(): Creating new DMSNES >>>> [0] DMGetDMSNESWrite(): Copying DMSNES due to write >>>> # [Info] Initializing petsc with ode23 integrator >>>> # [Trace] Timing: PetscTimeIntegrator initialization finished in >>>> 0.016754 seconds. >>>> >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] PetscDeviceContextSetupGlobalContext_Private(): >>>> Initializing global PetscDeviceContext with device type cuda >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: GPU error >>>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>>> supported >>>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>> shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>>> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu >>>> Jan 5 15:39:14 2023 >>>> [0]PETSC ERROR: Configure options >>>> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc >>>> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ >>>> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS >>>> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 >>>> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>>> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ >>>> --download-hwloc=1 >>>> [0]PETSC ERROR: #1 initialize() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 >>>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>>> cupmcontext.cu:10 >>>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 >>>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >>>> at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 >>>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>>> [0]PETSC ERROR: #7 >>>> PetscDeviceContextGetCurrentContextAssertType_Internal() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 >>>> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>>> cupmcontext.cu:23 >>>> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ >>>> veccuda2.cu:261 >>>> [0]PETSC ERROR: #10 VecMAXPY() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 >>>> [0]PETSC ERROR: #11 TSStep_RK() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 >>>> [0]PETSC ERROR: #12 TSStep() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 >>>> [0]PETSC ERROR: #13 TSSolve() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 08:56:22 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 09:56:22 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: Apologies. If it helps, there is one cell of overlap in this small test case for a 2D mesh that is 1 cell in height and a number of cells in length. . process 0 Petsc VecGetLocalSize 2750 size(stateVecV) 2750 process 1 Petsc VecGetLocalSize 2640 size(stateVecV) 2640 On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Matt >> >> I made a typo on the line statVecV(offset) = in my >> example, I agree. (I wrote that offhand since the actual assignment is much >> larger) I should be statVecV(offset+1) = so I'm confident it's >> not a 1 0 indexing thing. >> >> My question is more related to what is happening in the offsets. c0 and >> c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >> (which is why I loop from c0 to (c1-1)). >> >> For the size inquiries. on processor 0 >> Petsc VecGetSize(stateVec) 5390 >> > > I need to see VecGetLocalSize() > > Matt > > >> size(stateVecV) 2640 >> >> on processor 1 >> Petsc VecGetSize 5390 >> size(stateVecV) 2750 >> >> It's quite weird to me that processor one can have a positive offset that >> is less than its starting ownership index (in the initial email output). >> >> Thanks for the assistance >> Nicholas >> >> >> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley wrote: >> >>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Petsc Users, >>>> >>>> I'm working with a dmplex system with a subsampled mesh distributed >>>> with an overlap of 1. >>>> >>>> I'm encountering unusual situations when using VecGetOwnershipRange to >>>> adjust the offset received from a global section. The logic of the >>>> following code is first to get the offset needed to index a global vector >>>> while still being able to check if it is an overlapped cell and skip if >>>> needed while counting the owned cells. >>>> >>> >>> >>>> >>>> call DMGetGlobalSection(dmplex,section,ierr) >>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>> do i = c0, (c1-1) >>>> >>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>>> >>>> if(offset<0) then >>>> cycle >>>> endif >>>> offset=offset-oStart >>>> plexcells=plexcells+1 >>>> stateVecV(offset)= enddo >>>> >>>> I'm noticing some very weird results that I've appended below. The >>>> GetOffset documentation notes that a negative offset indicates an unowned >>>> point (which I use to cycle). However, the offset subtraction with oStart >>>> will yield an illegal index for the Vector access. I see that on the >>>> documentation for GetOwnershipRange, it notes that this may be >>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>> expect or there is just something terribly wrong with my PetscSection.(both >>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>> definition have synchronized section information) I was wondering if there >>>> is a possible output and/or the best way to index the vector. I'm thinking >>>> of subtracting the offset of cell 0 perhaps? >>>> >>> >>> Can you show your vector sizes? Are you sure it is not the fact that F90 >>> arrays use 1-based indices, but these are 0-based offsets? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> on rank 0 >>>> >>>> cell 0 offset 0 oStart 0 0 >>>> cell 1 offset 55 oStart 0 55 >>>> cell 2 offset 110 oStart 0 110 >>>> cell 3 offset 165 oStart 0 165 >>>> cell 4 offset 220 oStart 0 220 >>>> cell 5 offset 275 oStart 0 275 >>>> cell 6 offset 330 oStart 0 330 >>>> cell 7 offset 385 oStart 0 385 >>>> cell 8 offset 440 oStart 0 440 >>>> cell 9 offset 495 oStart 0 495 >>>> cell 10 offset 550 oStart 0 550 >>>> cell 11 offset 605 oStart 0 605 >>>> cell 12 offset 660 oStart 0 660 >>>> cell 13 offset 715 oStart 0 715 >>>> >>>> and on rank one >>>> cell 0 offset 2475 oStart 2640 -165 >>>> cell 1 offset 2530 oStart 2640 -110 >>>> cell 2 offset 2585 oStart 2640 -55 >>>> cell 3 offset 2640 oStart 2640 0 >>>> cell 4 offset 2695 oStart 2640 55 >>>> cell 5 offset 2750 oStart 2640 110 >>>> cell 6 offset 2805 oStart 2640 165 >>>> cell 7 offset 2860 oStart 2640 220 >>>> cell 8 offset 2915 oStart 2640 275 >>>> cell 9 offset 2970 oStart 2640 330 >>>> cell 10 offset 3025 oStart 2640 385 >>>> cell 11 offset 3080 oStart 2640 440 >>>> cell 12 offset 3135 oStart 2640 495 >>>> cell 13 offset 3190 oStart 2640 550 >>>> cell 14 offset 3245 oStart 2640 605 >>>> cell 15 offset -771 oStart 2640 -3411 >>>> >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 09:02:53 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 10:02:53 -0500 Subject: [petsc-users] Petsc DMLabel Fortran Stub request Message-ID: Hi Petsc Users I am trying to use the sequence of call DMLabelPropagateBegin(synchLabel,sf,ierr) call DMLabelPropagatePush(synchLabel,sf,PETSC_NULL_OPTIONS,PETSC_NULL_INTEGER,ierr) call DMLabelPropagateEnd(synchLabel,sf, ierr) in fortran. I apologize if I messed something up, it appears as if the DMLabelPropagatePush command doesn't have an appropriate Fortran interface as I get an undefined reference when it is called. I would appreciate any assistance. As a side note in practice, what is the proper Fortran NULL pointer to use for void arguments? I used an integer one temporarily to get to the undefined reference error but I assume it doesn't matter? Sincerely Nicholas -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 09:04:39 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 10:04:39 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Apologies. If it helps, there is one cell of overlap in this small test > case for a 2D mesh that is 1 cell in height and a number of cells in > length. . > > process 0 > Petsc VecGetLocalSize 2750 > size(stateVecV) 2750 > > process 1 > Petsc VecGetLocalSize 2640 > size(stateVecV) 2640 > The offsets shown below are well-within these sizes. I do not understand the problem. Thanks, Matt > On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley wrote: > >> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Matt >>> >>> I made a typo on the line statVecV(offset) = in my >>> example, I agree. (I wrote that offhand since the actual assignment is much >>> larger) I should be statVecV(offset+1) = so I'm confident it's >>> not a 1 0 indexing thing. >>> >>> My question is more related to what is happening in the offsets. c0 and >>> c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >>> (which is why I loop from c0 to (c1-1)). >>> >>> For the size inquiries. on processor 0 >>> Petsc VecGetSize(stateVec) 5390 >>> >> >> I need to see VecGetLocalSize() >> >> Matt >> >> >>> size(stateVecV) 2640 >>> >>> on processor 1 >>> Petsc VecGetSize 5390 >>> size(stateVecV) 2750 >>> >>> It's quite weird to me that processor one can have a positive offset >>> that is less than its starting ownership index (in the initial email >>> output). >>> >>> Thanks for the assistance >>> Nicholas >>> >>> >>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>> wrote: >>> >>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Petsc Users, >>>>> >>>>> I'm working with a dmplex system with a subsampled mesh distributed >>>>> with an overlap of 1. >>>>> >>>>> I'm encountering unusual situations when using VecGetOwnershipRange to >>>>> adjust the offset received from a global section. The logic of the >>>>> following code is first to get the offset needed to index a global vector >>>>> while still being able to check if it is an overlapped cell and skip if >>>>> needed while counting the owned cells. >>>>> >>>> >>>> >>>>> >>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>> do i = c0, (c1-1) >>>>> >>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>>>> >>>>> if(offset<0) then >>>>> cycle >>>>> endif >>>>> offset=offset-oStart >>>>> plexcells=plexcells+1 >>>>> stateVecV(offset)= enddo >>>>> >>>>> I'm noticing some very weird results that I've appended below. The >>>>> GetOffset documentation notes that a negative offset indicates an unowned >>>>> point (which I use to cycle). However, the offset subtraction with oStart >>>>> will yield an illegal index for the Vector access. I see that on the >>>>> documentation for GetOwnershipRange, it notes that this may be >>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>> definition have synchronized section information) I was wondering if there >>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>> of subtracting the offset of cell 0 perhaps? >>>>> >>>> >>>> Can you show your vector sizes? Are you sure it is not the fact that >>>> F90 arrays use 1-based indices, but these are 0-based offsets? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> on rank 0 >>>>> >>>>> cell 0 offset 0 oStart 0 0 >>>>> cell 1 offset 55 oStart 0 55 >>>>> cell 2 offset 110 oStart 0 110 >>>>> cell 3 offset 165 oStart 0 165 >>>>> cell 4 offset 220 oStart 0 220 >>>>> cell 5 offset 275 oStart 0 275 >>>>> cell 6 offset 330 oStart 0 330 >>>>> cell 7 offset 385 oStart 0 385 >>>>> cell 8 offset 440 oStart 0 440 >>>>> cell 9 offset 495 oStart 0 495 >>>>> cell 10 offset 550 oStart 0 550 >>>>> cell 11 offset 605 oStart 0 605 >>>>> cell 12 offset 660 oStart 0 660 >>>>> cell 13 offset 715 oStart 0 715 >>>>> >>>>> and on rank one >>>>> cell 0 offset 2475 oStart 2640 -165 >>>>> cell 1 offset 2530 oStart 2640 -110 >>>>> cell 2 offset 2585 oStart 2640 -55 >>>>> cell 3 offset 2640 oStart 2640 0 >>>>> cell 4 offset 2695 oStart 2640 55 >>>>> cell 5 offset 2750 oStart 2640 110 >>>>> cell 6 offset 2805 oStart 2640 165 >>>>> cell 7 offset 2860 oStart 2640 220 >>>>> cell 8 offset 2915 oStart 2640 275 >>>>> cell 9 offset 2970 oStart 2640 330 >>>>> cell 10 offset 3025 oStart 2640 385 >>>>> cell 11 offset 3080 oStart 2640 440 >>>>> cell 12 offset 3135 oStart 2640 495 >>>>> cell 13 offset 3190 oStart 2640 550 >>>>> cell 14 offset 3245 oStart 2640 605 >>>>> cell 15 offset -771 oStart 2640 -3411 >>>>> >>>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 09:09:58 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 10:09:58 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: Hi Matt I apologize for any lack of clarity in the initial email. looking at the initial output on rank 1 write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart cell 0 offset 2475 oStart 2640 -165 cell 1 offset 2530 oStart 2640 -110 cell 2 offset 2585 oStart 2640 -55 cell 3 offset 2640 oStart 2640 0 ..... cell 15 offset -771 oStart 2640 -3411 cell 15 provides a negative offset because it is the overlap cell (that is unowned) The remained of cells are all owned. However, the first 3 cells (0,1,2) return an offset that is less than the starting ownership range. I would expect cell 0 to start at offset 2640 at minimum. Sincerely Nicholas On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Apologies. If it helps, there is one cell of overlap in this small test >> case for a 2D mesh that is 1 cell in height and a number of cells in >> length. . >> >> process 0 >> Petsc VecGetLocalSize 2750 >> size(stateVecV) 2750 >> >> process 1 >> Petsc VecGetLocalSize 2640 >> size(stateVecV) 2640 >> > > The offsets shown below are well-within these sizes. I do not understand > the problem. > > Thanks, > > Matt > > >> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley wrote: >> >>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Matt >>>> >>>> I made a typo on the line statVecV(offset) = in my >>>> example, I agree. (I wrote that offhand since the actual assignment is much >>>> larger) I should be statVecV(offset+1) = so I'm confident it's >>>> not a 1 0 indexing thing. >>>> >>>> My question is more related to what is happening in the offsets. c0 and >>>> c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >>>> (which is why I loop from c0 to (c1-1)). >>>> >>>> For the size inquiries. on processor 0 >>>> Petsc VecGetSize(stateVec) 5390 >>>> >>> >>> I need to see VecGetLocalSize() >>> >>> Matt >>> >>> >>>> size(stateVecV) 2640 >>>> >>>> on processor 1 >>>> Petsc VecGetSize 5390 >>>> size(stateVecV) 2750 >>>> >>>> It's quite weird to me that processor one can have a positive offset >>>> that is less than its starting ownership index (in the initial email >>>> output). >>>> >>>> Thanks for the assistance >>>> Nicholas >>>> >>>> >>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Petsc Users, >>>>>> >>>>>> I'm working with a dmplex system with a subsampled mesh distributed >>>>>> with an overlap of 1. >>>>>> >>>>>> I'm encountering unusual situations when using VecGetOwnershipRange >>>>>> to adjust the offset received from a global section. The logic of the >>>>>> following code is first to get the offset needed to index a global vector >>>>>> while still being able to check if it is an overlapped cell and skip if >>>>>> needed while counting the owned cells. >>>>>> >>>>> >>>>> >>>>>> >>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>> do i = c0, (c1-1) >>>>>> >>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset- >>>>>> oStart >>>>>> >>>>>> if(offset<0) then >>>>>> cycle >>>>>> endif >>>>>> offset=offset-oStart >>>>>> plexcells=plexcells+1 >>>>>> stateVecV(offset)= enddo >>>>>> >>>>>> I'm noticing some very weird results that I've appended below. The >>>>>> GetOffset documentation notes that a negative offset indicates an unowned >>>>>> point (which I use to cycle). However, the offset subtraction with oStart >>>>>> will yield an illegal index for the Vector access. I see that on the >>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>> definition have synchronized section information) I was wondering if there >>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>> of subtracting the offset of cell 0 perhaps? >>>>>> >>>>> >>>>> Can you show your vector sizes? Are you sure it is not the fact that >>>>> F90 arrays use 1-based indices, but these are 0-based offsets? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> on rank 0 >>>>>> >>>>>> cell 0 offset 0 oStart 0 0 >>>>>> cell 1 offset 55 oStart 0 55 >>>>>> cell 2 offset 110 oStart 0 110 >>>>>> cell 3 offset 165 oStart 0 165 >>>>>> cell 4 offset 220 oStart 0 220 >>>>>> cell 5 offset 275 oStart 0 275 >>>>>> cell 6 offset 330 oStart 0 330 >>>>>> cell 7 offset 385 oStart 0 385 >>>>>> cell 8 offset 440 oStart 0 440 >>>>>> cell 9 offset 495 oStart 0 495 >>>>>> cell 10 offset 550 oStart 0 550 >>>>>> cell 11 offset 605 oStart 0 605 >>>>>> cell 12 offset 660 oStart 0 660 >>>>>> cell 13 offset 715 oStart 0 715 >>>>>> >>>>>> and on rank one >>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>> cell 1 offset 2530 oStart 2640 -110 >>>>>> cell 2 offset 2585 oStart 2640 -55 >>>>>> cell 3 offset 2640 oStart 2640 0 >>>>>> cell 4 offset 2695 oStart 2640 55 >>>>>> cell 5 offset 2750 oStart 2640 110 >>>>>> cell 6 offset 2805 oStart 2640 165 >>>>>> cell 7 offset 2860 oStart 2640 220 >>>>>> cell 8 offset 2915 oStart 2640 275 >>>>>> cell 9 offset 2970 oStart 2640 330 >>>>>> cell 10 offset 3025 oStart 2640 385 >>>>>> cell 11 offset 3080 oStart 2640 440 >>>>>> cell 12 offset 3135 oStart 2640 495 >>>>>> cell 13 offset 3190 oStart 2640 550 >>>>>> cell 14 offset 3245 oStart 2640 605 >>>>>> cell 15 offset -771 oStart 2640 -3411 >>>>>> >>>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 09:23:25 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 10:23:25 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Matt > > I apologize for any lack of clarity in the initial email. > > looking at the initial output on rank 1 > write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart > cell 0 offset 2475 oStart 2640 -165 > cell 1 offset 2530 oStart 2640 -110 > cell 2 offset 2585 oStart 2640 -55 > cell 3 offset 2640 oStart 2640 0 > ..... > cell 15 offset -771 oStart 2640 -3411 > > > cell 15 provides a negative offset because it is the overlap cell (that is > unowned) > The remained of cells are all owned. However, the first 3 cells (0,1,2) > return an offset that is less than the starting ownership range. I would > expect cell 0 to start at offset 2640 at minimum. > Send the output for this section call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); Thanks, Matt > Sincerely > Nicholas > > > > > On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley wrote: > >> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Apologies. If it helps, there is one cell of overlap in this small test >>> case for a 2D mesh that is 1 cell in height and a number of cells in >>> length. . >>> >>> process 0 >>> Petsc VecGetLocalSize 2750 >>> size(stateVecV) 2750 >>> >>> process 1 >>> Petsc VecGetLocalSize 2640 >>> size(stateVecV) 2640 >>> >> >> The offsets shown below are well-within these sizes. I do not understand >> the problem. >> >> Thanks, >> >> Matt >> >> >>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>> wrote: >>> >>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Matt >>>>> >>>>> I made a typo on the line statVecV(offset) = in my >>>>> example, I agree. (I wrote that offhand since the actual assignment is much >>>>> larger) I should be statVecV(offset+1) = so I'm confident it's >>>>> not a 1 0 indexing thing. >>>>> >>>>> My question is more related to what is happening in the offsets. c0 >>>>> and c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >>>>> (which is why I loop from c0 to (c1-1)). >>>>> >>>>> For the size inquiries. on processor 0 >>>>> Petsc VecGetSize(stateVec) 5390 >>>>> >>>> >>>> I need to see VecGetLocalSize() >>>> >>>> Matt >>>> >>>> >>>>> size(stateVecV) 2640 >>>>> >>>>> on processor 1 >>>>> Petsc VecGetSize 5390 >>>>> size(stateVecV) 2750 >>>>> >>>>> It's quite weird to me that processor one can have a positive offset >>>>> that is less than its starting ownership index (in the initial email >>>>> output). >>>>> >>>>> Thanks for the assistance >>>>> Nicholas >>>>> >>>>> >>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Petsc Users, >>>>>>> >>>>>>> I'm working with a dmplex system with a subsampled mesh distributed >>>>>>> with an overlap of 1. >>>>>>> >>>>>>> I'm encountering unusual situations when using VecGetOwnershipRange >>>>>>> to adjust the offset received from a global section. The logic of the >>>>>>> following code is first to get the offset needed to index a global vector >>>>>>> while still being able to check if it is an overlapped cell and skip if >>>>>>> needed while counting the owned cells. >>>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>> do i = c0, (c1-1) >>>>>>> >>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset- >>>>>>> oStart >>>>>>> >>>>>>> if(offset<0) then >>>>>>> cycle >>>>>>> endif >>>>>>> offset=offset-oStart >>>>>>> plexcells=plexcells+1 >>>>>>> stateVecV(offset)= enddo >>>>>>> >>>>>>> I'm noticing some very weird results that I've appended below. The >>>>>>> GetOffset documentation notes that a negative offset indicates an unowned >>>>>>> point (which I use to cycle). However, the offset subtraction with oStart >>>>>>> will yield an illegal index for the Vector access. I see that on the >>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>> definition have synchronized section information) I was wondering if there >>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>> >>>>>> >>>>>> Can you show your vector sizes? Are you sure it is not the fact that >>>>>> F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> on rank 0 >>>>>>> >>>>>>> cell 0 offset 0 oStart 0 0 >>>>>>> cell 1 offset 55 oStart 0 55 >>>>>>> cell 2 offset 110 oStart 0 110 >>>>>>> cell 3 offset 165 oStart 0 165 >>>>>>> cell 4 offset 220 oStart 0 220 >>>>>>> cell 5 offset 275 oStart 0 275 >>>>>>> cell 6 offset 330 oStart 0 330 >>>>>>> cell 7 offset 385 oStart 0 385 >>>>>>> cell 8 offset 440 oStart 0 440 >>>>>>> cell 9 offset 495 oStart 0 495 >>>>>>> cell 10 offset 550 oStart 0 550 >>>>>>> cell 11 offset 605 oStart 0 605 >>>>>>> cell 12 offset 660 oStart 0 660 >>>>>>> cell 13 offset 715 oStart 0 715 >>>>>>> >>>>>>> and on rank one >>>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>>> cell 1 offset 2530 oStart 2640 -110 >>>>>>> cell 2 offset 2585 oStart 2640 -55 >>>>>>> cell 3 offset 2640 oStart 2640 0 >>>>>>> cell 4 offset 2695 oStart 2640 55 >>>>>>> cell 5 offset 2750 oStart 2640 110 >>>>>>> cell 6 offset 2805 oStart 2640 165 >>>>>>> cell 7 offset 2860 oStart 2640 220 >>>>>>> cell 8 offset 2915 oStart 2640 275 >>>>>>> cell 9 offset 2970 oStart 2640 330 >>>>>>> cell 10 offset 3025 oStart 2640 385 >>>>>>> cell 11 offset 3080 oStart 2640 440 >>>>>>> cell 12 offset 3135 oStart 2640 495 >>>>>>> cell 13 offset 3190 oStart 2640 550 >>>>>>> cell 14 offset 3245 oStart 2640 605 >>>>>>> cell 15 offset -771 oStart 2640 -3411 >>>>>>> >>>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 09:41:07 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 10:41:07 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: Hi Matt I appreciate the help. The section view is quite extensive because each cell has 55 dofs located at the cells and on certain faces. I've appended the first of these which corresponds with the output in the first email, to save space. The following 54 are exactly the same but offset incremented by 1. (or negative 1 for negative offsets) Thanks for your time Nicholas On Fri, Jan 6, 2023 at 10:23 AM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Matt >> >> I apologize for any lack of clarity in the initial email. >> >> looking at the initial output on rank 1 >> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >> cell 0 offset 2475 oStart 2640 -165 >> cell 1 offset 2530 oStart 2640 -110 >> cell 2 offset 2585 oStart 2640 -55 >> cell 3 offset 2640 oStart 2640 0 >> ..... >> cell 15 offset -771 oStart 2640 -3411 >> >> >> cell 15 provides a negative offset because it is the overlap cell (that >> is unowned) >> The remained of cells are all owned. However, the first 3 cells (0,1,2) >> return an offset that is less than the starting ownership range. I would >> expect cell 0 to start at offset 2640 at minimum. >> > > Send the output for this section > > call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); > > Thanks, > > Matt > > >> Sincerely >> Nicholas >> >> >> >> >> On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley >> wrote: >> >>> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Apologies. If it helps, there is one cell of overlap in this small test >>>> case for a 2D mesh that is 1 cell in height and a number of cells in >>>> length. . >>>> >>>> process 0 >>>> Petsc VecGetLocalSize 2750 >>>> size(stateVecV) 2750 >>>> >>>> process 1 >>>> Petsc VecGetLocalSize 2640 >>>> size(stateVecV) 2640 >>>> >>> >>> The offsets shown below are well-within these sizes. I do not understand >>> the problem. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Matt >>>>>> >>>>>> I made a typo on the line statVecV(offset) = in my >>>>>> example, I agree. (I wrote that offhand since the actual assignment is much >>>>>> larger) I should be statVecV(offset+1) = so I'm confident it's >>>>>> not a 1 0 indexing thing. >>>>>> >>>>>> My question is more related to what is happening in the offsets. c0 >>>>>> and c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >>>>>> (which is why I loop from c0 to (c1-1)). >>>>>> >>>>>> For the size inquiries. on processor 0 >>>>>> Petsc VecGetSize(stateVec) 5390 >>>>>> >>>>> >>>>> I need to see VecGetLocalSize() >>>>> >>>>> Matt >>>>> >>>>> >>>>>> size(stateVecV) 2640 >>>>>> >>>>>> on processor 1 >>>>>> Petsc VecGetSize 5390 >>>>>> size(stateVecV) 2750 >>>>>> >>>>>> It's quite weird to me that processor one can have a positive offset >>>>>> that is less than its starting ownership index (in the initial email >>>>>> output). >>>>>> >>>>>> Thanks for the assistance >>>>>> Nicholas >>>>>> >>>>>> >>>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Hi Petsc Users, >>>>>>>> >>>>>>>> I'm working with a dmplex system with a subsampled mesh distributed >>>>>>>> with an overlap of 1. >>>>>>>> >>>>>>>> I'm encountering unusual situations when using VecGetOwnershipRange >>>>>>>> to adjust the offset received from a global section. The logic of the >>>>>>>> following code is first to get the offset needed to index a global vector >>>>>>>> while still being able to check if it is an overlapped cell and skip if >>>>>>>> needed while counting the owned cells. >>>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>>> do i = c0, (c1-1) >>>>>>>> >>>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset- >>>>>>>> oStart >>>>>>>> >>>>>>>> if(offset<0) then >>>>>>>> cycle >>>>>>>> endif >>>>>>>> offset=offset-oStart >>>>>>>> plexcells=plexcells+1 >>>>>>>> stateVecV(offset)= enddo >>>>>>>> >>>>>>>> I'm noticing some very weird results that I've appended below. The >>>>>>>> GetOffset documentation notes that a negative offset indicates an unowned >>>>>>>> point (which I use to cycle). However, the offset subtraction with oStart >>>>>>>> will yield an illegal index for the Vector access. I see that on the >>>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>>> definition have synchronized section information) I was wondering if there >>>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>>> >>>>>>> >>>>>>> Can you show your vector sizes? Are you sure it is not the fact that >>>>>>> F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> on rank 0 >>>>>>>> >>>>>>>> cell 0 offset 0 oStart 0 0 >>>>>>>> cell 1 offset 55 oStart 0 55 >>>>>>>> cell 2 offset 110 oStart 0 110 >>>>>>>> cell 3 offset 165 oStart 0 165 >>>>>>>> cell 4 offset 220 oStart 0 220 >>>>>>>> cell 5 offset 275 oStart 0 275 >>>>>>>> cell 6 offset 330 oStart 0 330 >>>>>>>> cell 7 offset 385 oStart 0 385 >>>>>>>> cell 8 offset 440 oStart 0 440 >>>>>>>> cell 9 offset 495 oStart 0 495 >>>>>>>> cell 10 offset 550 oStart 0 550 >>>>>>>> cell 11 offset 605 oStart 0 605 >>>>>>>> cell 12 offset 660 oStart 0 660 >>>>>>>> cell 13 offset 715 oStart 0 715 >>>>>>>> >>>>>>>> and on rank one >>>>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>>>> cell 1 offset 2530 oStart 2640 -110 >>>>>>>> cell 2 offset 2585 oStart 2640 -55 >>>>>>>> cell 3 offset 2640 oStart 2640 0 >>>>>>>> cell 4 offset 2695 oStart 2640 55 >>>>>>>> cell 5 offset 2750 oStart 2640 110 >>>>>>>> cell 6 offset 2805 oStart 2640 165 >>>>>>>> cell 7 offset 2860 oStart 2640 220 >>>>>>>> cell 8 offset 2915 oStart 2640 275 >>>>>>>> cell 9 offset 2970 oStart 2640 330 >>>>>>>> cell 10 offset 3025 oStart 2640 385 >>>>>>>> cell 11 offset 3080 oStart 2640 440 >>>>>>>> cell 12 offset 3135 oStart 2640 495 >>>>>>>> cell 13 offset 3190 oStart 2640 550 >>>>>>>> cell 14 offset 3245 oStart 2640 605 >>>>>>>> cell 15 offset -771 oStart 2640 -3411 >>>>>>>> >>>>>>>> >>>>>>>> Sincerely >>>>>>>> Nicholas >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- PetscSection Object: 2 MPI processes type not yet set 55 fields field 0 with 1 components Process 0: ( 0) dim 1 offset 0 ( 1) dim 1 offset 55 ( 2) dim 1 offset 110 ( 3) dim 1 offset 165 ( 4) dim 1 offset 220 ( 5) dim 1 offset 275 ( 6) dim 1 offset 330 ( 7) dim 1 offset 385 ( 8) dim 1 offset 440 ( 9) dim 1 offset 495 ( 10) dim 1 offset 550 ( 11) dim 1 offset 605 ( 12) dim 1 offset 660 ( 13) dim 1 offset 715 ( 14) dim 1 offset 770 ( 15) dim -2 offset -2641 ( 16) dim 0 offset 825 ( 17) dim 0 offset 825 ( 18) dim 0 offset 825 ( 19) dim 0 offset 825 ( 20) dim 0 offset 825 ( 21) dim 0 offset 825 ( 22) dim 0 offset 825 ( 23) dim 0 offset 825 ( 24) dim 0 offset 825 ( 25) dim 0 offset 825 ( 26) dim 0 offset 825 ( 27) dim 0 offset 825 ( 28) dim 0 offset 825 ( 29) dim 0 offset 825 ( 30) dim 0 offset 825 ( 31) dim 0 offset 825 ( 32) dim 0 offset 825 ( 33) dim 0 offset 825 ( 34) dim 0 offset 825 ( 35) dim 0 offset 825 ( 36) dim 0 offset 825 ( 37) dim 0 offset 825 ( 38) dim 0 offset 825 ( 39) dim 0 offset 825 ( 40) dim 0 offset 825 ( 41) dim 0 offset 825 ( 42) dim 0 offset 825 ( 43) dim 0 offset 825 ( 44) dim 0 offset 825 ( 45) dim 0 offset 825 ( 46) dim -1 offset -3301 ( 47) dim -1 offset -3301 ( 48) dim -1 offset -3301 ( 49) dim -1 offset -3301 ( 50) dim 0 offset 825 ( 51) dim 1 offset 825 ( 52) dim 1 offset 880 ( 53) dim 0 offset 935 ( 54) dim 1 offset 935 ( 55) dim 0 offset 990 ( 56) dim 1 offset 990 ( 57) dim 1 offset 1045 ( 58) dim 1 offset 1100 ( 59) dim 0 offset 1155 ( 60) dim 1 offset 1155 ( 61) dim 0 offset 1210 ( 62) dim 1 offset 1210 ( 63) dim 1 offset 1265 ( 64) dim 1 offset 1320 ( 65) dim 0 offset 1375 ( 66) dim 1 offset 1375 ( 67) dim 0 offset 1430 ( 68) dim 1 offset 1430 ( 69) dim 1 offset 1485 ( 70) dim 1 offset 1540 ( 71) dim 0 offset 1595 ( 72) dim 1 offset 1595 ( 73) dim 0 offset 1650 ( 74) dim 1 offset 1650 ( 75) dim 1 offset 1705 ( 76) dim 1 offset 1760 ( 77) dim 0 offset 1815 ( 78) dim 1 offset 1815 ( 79) dim 0 offset 1870 ( 80) dim 1 offset 1870 ( 81) dim 1 offset 1925 ( 82) dim 1 offset 1980 ( 83) dim 0 offset 2035 ( 84) dim 1 offset 2035 ( 85) dim 0 offset 2090 ( 86) dim 1 offset 2090 ( 87) dim 1 offset 2145 ( 88) dim 1 offset 2200 ( 89) dim 0 offset 2255 ( 90) dim 1 offset 2255 ( 91) dim 0 offset 2310 ( 92) dim 1 offset 2310 ( 93) dim 1 offset 2365 ( 94) dim 1 offset 2420 ( 95) dim -1 offset -3686 ( 96) dim -2 offset -3686 ( 97) dim -1 offset -3741 ( 98) dim -2 offset -3741 Process 1: ( 0) dim 1 offset 2475 ( 1) dim 1 offset 2530 ( 2) dim 1 offset 2585 ( 3) dim 1 offset 2640 ( 4) dim 1 offset 2695 ( 5) dim 1 offset 2750 ( 6) dim 1 offset 2805 ( 7) dim 1 offset 2860 ( 8) dim 1 offset 2915 ( 9) dim 1 offset 2970 ( 10) dim 1 offset 3025 ( 11) dim 1 offset 3080 ( 12) dim 1 offset 3135 ( 13) dim 1 offset 3190 ( 14) dim 1 offset 3245 ( 15) dim -2 offset -771 ( 16) dim 0 offset 3300 ( 17) dim 0 offset 3300 ( 18) dim 0 offset 3300 ( 19) dim 0 offset 3300 ( 20) dim 0 offset 3300 ( 21) dim 0 offset 3300 ( 22) dim 0 offset 3300 ( 23) dim 0 offset 3300 ( 24) dim 0 offset 3300 ( 25) dim 0 offset 3300 ( 26) dim 0 offset 3300 ( 27) dim 0 offset 3300 ( 28) dim 0 offset 3300 ( 29) dim 0 offset 3300 ( 30) dim 0 offset 3300 ( 31) dim 0 offset 3300 ( 32) dim 0 offset 3300 ( 33) dim 0 offset 3300 ( 34) dim 0 offset 3300 ( 35) dim 0 offset 3300 ( 36) dim 0 offset 3300 ( 37) dim 0 offset 3300 ( 38) dim 0 offset 3300 ( 39) dim 0 offset 3300 ( 40) dim 0 offset 3300 ( 41) dim 0 offset 3300 ( 42) dim 0 offset 3300 ( 43) dim 0 offset 3300 ( 44) dim 0 offset 3300 ( 45) dim 0 offset 3300 ( 46) dim 0 offset 3300 ( 47) dim 0 offset 3300 ( 48) dim 0 offset 3300 ( 49) dim 0 offset 3300 ( 50) dim 0 offset 3300 ( 51) dim 0 offset 3300 ( 52) dim 0 offset 3300 ( 53) dim 0 offset 3300 ( 54) dim -1 offset -826 ( 55) dim -1 offset -826 ( 56) dim 0 offset 3300 ( 57) dim 1 offset 3300 ( 58) dim 1 offset 3355 ( 59) dim 1 offset 3410 ( 60) dim 1 offset 3465 ( 61) dim 1 offset 3520 ( 62) dim 0 offset 3575 ( 63) dim 1 offset 3575 ( 64) dim 0 offset 3630 ( 65) dim 1 offset 3630 ( 66) dim 0 offset 3685 ( 67) dim 1 offset 3685 ( 68) dim 0 offset 3740 ( 69) dim 1 offset 3740 ( 70) dim 1 offset 3795 ( 71) dim 1 offset 3850 ( 72) dim 0 offset 3905 ( 73) dim 1 offset 3905 ( 74) dim 0 offset 3960 ( 75) dim 1 offset 3960 ( 76) dim 0 offset 4015 ( 77) dim 1 offset 4015 ( 78) dim 0 offset 4070 ( 79) dim 1 offset 4070 ( 80) dim 1 offset 4125 ( 81) dim 1 offset 4180 ( 82) dim 0 offset 4235 ( 83) dim 1 offset 4235 ( 84) dim 0 offset 4290 ( 85) dim 1 offset 4290 ( 86) dim 1 offset 4345 ( 87) dim 1 offset 4400 ( 88) dim 0 offset 4455 ( 89) dim 1 offset 4455 ( 90) dim 0 offset 4510 ( 91) dim 1 offset 4510 ( 92) dim 1 offset 4565 ( 93) dim 1 offset 4620 ( 94) dim 0 offset 4675 ( 95) dim 0 offset 4675 ( 96) dim 1 offset 4675 ( 97) dim 1 offset 4730 ( 98) dim 0 offset 4785 ( 99) dim 1 offset 4785 ( 100) dim 0 offset 4840 ( 101) dim 1 offset 4840 ( 102) dim 1 offset 4895 ( 103) dim 1 offset 4950 ( 104) dim 1 offset 5005 ( 105) dim -1 offset -2311 ( 106) dim -2 offset -2366 ( 107) dim -2 offset -2421 From edoardo.centofanti01 at universitadipavia.it Fri Jan 6 09:43:20 2023 From: edoardo.centofanti01 at universitadipavia.it (Edoardo Centofanti) Date: Fri, 6 Jan 2023 16:43:20 +0100 Subject: [petsc-users] Definition of threshold(s) in gamg and boomerAMG Message-ID: Hi PETSc users, I was looking for the exact definitions of the threshold parameter (-pc_gamg_threshold) for gamg and of the strong threshold (-pc_hypre_boomeramg_strong_threshold) for Hypre BoomerAMG. My curiosity comes from the fact that the suggested parameters (apparently acting on the same aspect of the algorithms) differ of around one order of magnitude (if I remember right from the PETSc manual, the gamg threshold for 3D problems is around 0.05, while for boomerAMG ranges from 0.25 to 0.5). Thank you in advance, Edoardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Fri Jan 6 09:49:25 2023 From: danyang.su at gmail.com (Danyang Su) Date: Fri, 06 Jan 2023 07:49:25 -0800 Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> Message-ID: <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> Hi All, I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings Any idea on this? Thanks, Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 6610560 bytes Desc: not available URL: From pierre at joliv.et Fri Jan 6 09:58:51 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 6 Jan 2023 16:58:51 +0100 Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> Message-ID: <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> > On 6 Jan 2023, at 4:49 PM, Danyang Su wrote: > > Hi All, > > I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. > > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings > > Any idea on this? Could you try to reconfigure in a shell without conda being activated? You have PATH=/Users/danyangsu/Soft/Anaconda3/bin:/Users/danyangsu/Soft/Anaconda3/condabin:[?] which typically results in a broken configuration. Thanks, Pierre > Thanks, > > Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 09:59:08 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 10:59:08 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 10:41 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Matt > > I appreciate the help. The section view is quite extensive because each > cell has 55 dofs located at the cells and on certain faces. I've appended > the first of these which corresponds with the output in the first email, to > save space. The following 54 are exactly the same but offset incremented by > 1. (or negative 1 for negative offsets) > Okay, from the output it is clear that this vector does not match your global section. Did you get stateVec by calling DMCreateGlobalVector()? Thanks, Matt > Thanks for your time > Nicholas > > On Fri, Jan 6, 2023 at 10:23 AM Matthew Knepley wrote: > >> On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Matt >>> >>> I apologize for any lack of clarity in the initial email. >>> >>> looking at the initial output on rank 1 >>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>> cell 0 offset 2475 oStart 2640 -165 >>> cell 1 offset 2530 oStart 2640 -110 >>> cell 2 offset 2585 oStart 2640 -55 >>> cell 3 offset 2640 oStart 2640 0 >>> ..... >>> cell 15 offset -771 oStart 2640 -3411 >>> >>> >>> cell 15 provides a negative offset because it is the overlap cell (that >>> is unowned) >>> The remained of cells are all owned. However, the first 3 cells (0,1,2) >>> return an offset that is less than the starting ownership range. I would >>> expect cell 0 to start at offset 2640 at minimum. >>> >> >> Send the output for this section >> >> call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); >> >> Thanks, >> >> Matt >> >> >>> Sincerely >>> Nicholas >>> >>> >>> >>> >>> On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley >>> wrote: >>> >>>> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Apologies. If it helps, there is one cell of overlap in this small >>>>> test case for a 2D mesh that is 1 cell in height and a number of cells in >>>>> length. . >>>>> >>>>> process 0 >>>>> Petsc VecGetLocalSize 2750 >>>>> size(stateVecV) 2750 >>>>> >>>>> process 1 >>>>> Petsc VecGetLocalSize 2640 >>>>> size(stateVecV) 2640 >>>>> >>>> >>>> The offsets shown below are well-within these sizes. I do not >>>> understand the problem. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Matt >>>>>>> >>>>>>> I made a typo on the line statVecV(offset) = in >>>>>>> my example, I agree. (I wrote that offhand since the actual assignment is >>>>>>> much larger) I should be statVecV(offset+1) = so I'm confident >>>>>>> it's not a 1 0 indexing thing. >>>>>>> >>>>>>> My question is more related to what is happening in the offsets. c0 >>>>>>> and c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >>>>>>> (which is why I loop from c0 to (c1-1)). >>>>>>> >>>>>>> For the size inquiries. on processor 0 >>>>>>> Petsc VecGetSize(stateVec) 5390 >>>>>>> >>>>>> >>>>>> I need to see VecGetLocalSize() >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> size(stateVecV) 2640 >>>>>>> >>>>>>> on processor 1 >>>>>>> Petsc VecGetSize 5390 >>>>>>> size(stateVecV) 2750 >>>>>>> >>>>>>> It's quite weird to me that processor one can have a positive offset >>>>>>> that is less than its starting ownership index (in the initial email >>>>>>> output). >>>>>>> >>>>>>> Thanks for the assistance >>>>>>> Nicholas >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi Petsc Users, >>>>>>>>> >>>>>>>>> I'm working with a dmplex system with a subsampled mesh >>>>>>>>> distributed with an overlap of 1. >>>>>>>>> >>>>>>>>> I'm encountering unusual situations when using >>>>>>>>> VecGetOwnershipRange to adjust the offset received from a global section. >>>>>>>>> The logic of the following code is first to get the offset needed to index >>>>>>>>> a global vector while still being able to check if it is an overlapped cell >>>>>>>>> and skip if needed while counting the owned cells. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>>>> do i = c0, (c1-1) >>>>>>>>> >>>>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset- >>>>>>>>> oStart >>>>>>>>> >>>>>>>>> if(offset<0) then >>>>>>>>> cycle >>>>>>>>> endif >>>>>>>>> offset=offset-oStart >>>>>>>>> plexcells=plexcells+1 >>>>>>>>> stateVecV(offset)= enddo >>>>>>>>> >>>>>>>>> I'm noticing some very weird results that I've appended below. The >>>>>>>>> GetOffset documentation notes that a negative offset indicates an unowned >>>>>>>>> point (which I use to cycle). However, the offset subtraction with oStart >>>>>>>>> will yield an illegal index for the Vector access. I see that on the >>>>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>>>> definition have synchronized section information) I was wondering if there >>>>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>>>> >>>>>>>> >>>>>>>> Can you show your vector sizes? Are you sure it is not the fact >>>>>>>> that F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> on rank 0 >>>>>>>>> >>>>>>>>> cell 0 offset 0 oStart 0 0 >>>>>>>>> cell 1 offset 55 oStart 0 55 >>>>>>>>> cell 2 offset 110 oStart 0 110 >>>>>>>>> cell 3 offset 165 oStart 0 165 >>>>>>>>> cell 4 offset 220 oStart 0 220 >>>>>>>>> cell 5 offset 275 oStart 0 275 >>>>>>>>> cell 6 offset 330 oStart 0 330 >>>>>>>>> cell 7 offset 385 oStart 0 385 >>>>>>>>> cell 8 offset 440 oStart 0 440 >>>>>>>>> cell 9 offset 495 oStart 0 495 >>>>>>>>> cell 10 offset 550 oStart 0 550 >>>>>>>>> cell 11 offset 605 oStart 0 605 >>>>>>>>> cell 12 offset 660 oStart 0 660 >>>>>>>>> cell 13 offset 715 oStart 0 715 >>>>>>>>> >>>>>>>>> and on rank one >>>>>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>>>>> cell 1 offset 2530 oStart 2640 -110 >>>>>>>>> cell 2 offset 2585 oStart 2640 -55 >>>>>>>>> cell 3 offset 2640 oStart 2640 0 >>>>>>>>> cell 4 offset 2695 oStart 2640 55 >>>>>>>>> cell 5 offset 2750 oStart 2640 110 >>>>>>>>> cell 6 offset 2805 oStart 2640 165 >>>>>>>>> cell 7 offset 2860 oStart 2640 220 >>>>>>>>> cell 8 offset 2915 oStart 2640 275 >>>>>>>>> cell 9 offset 2970 oStart 2640 330 >>>>>>>>> cell 10 offset 3025 oStart 2640 385 >>>>>>>>> cell 11 offset 3080 oStart 2640 440 >>>>>>>>> cell 12 offset 3135 oStart 2640 495 >>>>>>>>> cell 13 offset 3190 oStart 2640 550 >>>>>>>>> cell 14 offset 3245 oStart 2640 605 >>>>>>>>> cell 15 offset -771 oStart 2640 -3411 >>>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 10:31:59 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 11:31:59 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: Hi Matt This was generated using the DMPlexDistributeField which we discussed a while back. Everything seemed to be working fine when I only had cells dofs but I recently added face dofs, which seems to have caused some issues. Whats weird is that I'm feeding the same distribution SF and inputting a vector and section that are consistent to the DMPlexDistributeField so I'd expect the vectors and section output to be consistent. I'll take a closer look at that. Thanks Nicholas On Fri, Jan 6, 2023 at 10:59 AM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 10:41 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Matt >> >> I appreciate the help. The section view is quite extensive because each >> cell has 55 dofs located at the cells and on certain faces. I've appended >> the first of these which corresponds with the output in the first email, to >> save space. The following 54 are exactly the same but offset incremented by >> 1. (or negative 1 for negative offsets) >> > > Okay, from the output it is clear that this vector does not match your > global section. Did you get stateVec by calling DMCreateGlobalVector()? > > Thanks, > > Matt > > >> Thanks for your time >> Nicholas >> >> On Fri, Jan 6, 2023 at 10:23 AM Matthew Knepley >> wrote: >> >>> On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Matt >>>> >>>> I apologize for any lack of clarity in the initial email. >>>> >>>> looking at the initial output on rank 1 >>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>>> cell 0 offset 2475 oStart 2640 -165 >>>> cell 1 offset 2530 oStart 2640 -110 >>>> cell 2 offset 2585 oStart 2640 -55 >>>> cell 3 offset 2640 oStart 2640 0 >>>> ..... >>>> cell 15 offset -771 oStart 2640 -3411 >>>> >>>> >>>> cell 15 provides a negative offset because it is the overlap cell (that >>>> is unowned) >>>> The remained of cells are all owned. However, the first 3 cells (0,1,2) >>>> return an offset that is less than the starting ownership range. I would >>>> expect cell 0 to start at offset 2640 at minimum. >>>> >>> >>> Send the output for this section >>> >>> call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Sincerely >>>> Nicholas >>>> >>>> >>>> >>>> >>>> On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Apologies. If it helps, there is one cell of overlap in this small >>>>>> test case for a 2D mesh that is 1 cell in height and a number of cells in >>>>>> length. . >>>>>> >>>>>> process 0 >>>>>> Petsc VecGetLocalSize 2750 >>>>>> size(stateVecV) 2750 >>>>>> >>>>>> process 1 >>>>>> Petsc VecGetLocalSize 2640 >>>>>> size(stateVecV) 2640 >>>>>> >>>>> >>>>> The offsets shown below are well-within these sizes. I do not >>>>> understand the problem. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Hi Matt >>>>>>>> >>>>>>>> I made a typo on the line statVecV(offset) = in >>>>>>>> my example, I agree. (I wrote that offhand since the actual assignment is >>>>>>>> much larger) I should be statVecV(offset+1) = so I'm confident >>>>>>>> it's not a 1 0 indexing thing. >>>>>>>> >>>>>>>> My question is more related to what is happening in the offsets. c0 >>>>>>>> and c1 are pulled using DMplexgetheight stratum, so they are zero-indexed >>>>>>>> (which is why I loop from c0 to (c1-1)). >>>>>>>> >>>>>>>> For the size inquiries. on processor 0 >>>>>>>> Petsc VecGetSize(stateVec) 5390 >>>>>>>> >>>>>>> >>>>>>> I need to see VecGetLocalSize() >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> size(stateVecV) 2640 >>>>>>>> >>>>>>>> on processor 1 >>>>>>>> Petsc VecGetSize 5390 >>>>>>>> size(stateVecV) 2750 >>>>>>>> >>>>>>>> It's quite weird to me that processor one can have a positive >>>>>>>> offset that is less than its starting ownership index (in the initial email >>>>>>>> output). >>>>>>>> >>>>>>>> Thanks for the assistance >>>>>>>> Nicholas >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi Petsc Users, >>>>>>>>>> >>>>>>>>>> I'm working with a dmplex system with a subsampled mesh >>>>>>>>>> distributed with an overlap of 1. >>>>>>>>>> >>>>>>>>>> I'm encountering unusual situations when using >>>>>>>>>> VecGetOwnershipRange to adjust the offset received from a global section. >>>>>>>>>> The logic of the following code is first to get the offset needed to index >>>>>>>>>> a global vector while still being able to check if it is an overlapped cell >>>>>>>>>> and skip if needed while counting the owned cells. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>>>>> do i = c0, (c1-1) >>>>>>>>>> >>>>>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset- >>>>>>>>>> oStart >>>>>>>>>> >>>>>>>>>> if(offset<0) then >>>>>>>>>> cycle >>>>>>>>>> endif >>>>>>>>>> offset=offset-oStart >>>>>>>>>> plexcells=plexcells+1 >>>>>>>>>> stateVecV(offset)= enddo >>>>>>>>>> >>>>>>>>>> I'm noticing some very weird results that I've appended below. >>>>>>>>>> The GetOffset documentation notes that a negative offset indicates an >>>>>>>>>> unowned point (which I use to cycle). However, the offset subtraction with >>>>>>>>>> oStart will yield an illegal index for the Vector access. I see that on the >>>>>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>>>>> definition have synchronized section information) I was wondering if there >>>>>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Can you show your vector sizes? Are you sure it is not the fact >>>>>>>>> that F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> on rank 0 >>>>>>>>>> >>>>>>>>>> cell 0 offset 0 oStart 0 >>>>>>>>>> 0 >>>>>>>>>> cell 1 offset 55 oStart 0 >>>>>>>>>> 55 >>>>>>>>>> cell 2 offset 110 oStart 0 >>>>>>>>>> 110 >>>>>>>>>> cell 3 offset 165 oStart 0 >>>>>>>>>> 165 >>>>>>>>>> cell 4 offset 220 oStart 0 >>>>>>>>>> 220 >>>>>>>>>> cell 5 offset 275 oStart 0 >>>>>>>>>> 275 >>>>>>>>>> cell 6 offset 330 oStart 0 >>>>>>>>>> 330 >>>>>>>>>> cell 7 offset 385 oStart 0 >>>>>>>>>> 385 >>>>>>>>>> cell 8 offset 440 oStart 0 >>>>>>>>>> 440 >>>>>>>>>> cell 9 offset 495 oStart 0 >>>>>>>>>> 495 >>>>>>>>>> cell 10 offset 550 oStart 0 >>>>>>>>>> 550 >>>>>>>>>> cell 11 offset 605 oStart 0 >>>>>>>>>> 605 >>>>>>>>>> cell 12 offset 660 oStart 0 >>>>>>>>>> 660 >>>>>>>>>> cell 13 offset 715 oStart 0 >>>>>>>>>> 715 >>>>>>>>>> >>>>>>>>>> and on rank one >>>>>>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>>>>>> cell 1 offset 2530 oStart 2640 >>>>>>>>>> -110 >>>>>>>>>> cell 2 offset 2585 oStart 2640 >>>>>>>>>> -55 >>>>>>>>>> cell 3 offset 2640 oStart 2640 >>>>>>>>>> 0 >>>>>>>>>> cell 4 offset 2695 oStart 2640 >>>>>>>>>> 55 >>>>>>>>>> cell 5 offset 2750 oStart 2640 >>>>>>>>>> 110 >>>>>>>>>> cell 6 offset 2805 oStart 2640 >>>>>>>>>> 165 >>>>>>>>>> cell 7 offset 2860 oStart 2640 >>>>>>>>>> 220 >>>>>>>>>> cell 8 offset 2915 oStart 2640 >>>>>>>>>> 275 >>>>>>>>>> cell 9 offset 2970 oStart 2640 >>>>>>>>>> 330 >>>>>>>>>> cell 10 offset 3025 oStart 2640 >>>>>>>>>> 385 >>>>>>>>>> cell 11 offset 3080 oStart 2640 >>>>>>>>>> 440 >>>>>>>>>> cell 12 offset 3135 oStart 2640 >>>>>>>>>> 495 >>>>>>>>>> cell 13 offset 3190 oStart 2640 >>>>>>>>>> 550 >>>>>>>>>> cell 14 offset 3245 oStart 2640 >>>>>>>>>> 605 >>>>>>>>>> cell 15 offset -771 oStart 2640 >>>>>>>>>> -3411 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sincerely >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 10:37:03 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 11:37:03 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 11:32 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Matt > > This was generated using the DMPlexDistributeField which we discussed a > while back. Everything seemed to be working fine when I only had cells dofs > but I recently added face dofs, which seems to have caused some issues. > Whats weird is that I'm feeding the same distribution SF and inputting a > vector and section that are consistent to the DMPlexDistributeField so I'd > expect the vectors and section output to be consistent. I'll take a closer > look at that. > The way I use DistributeField(), for example to distribute coordinates, is that I give it the migrationSF, the local coordinate section, and the local coordinate vector. It gives me back the new local coordinate section (which I set into the distributed DM), and the local coordinate vector. It seems like you are doing something else. Maybe the documentation is misleading. Thanks, Matt > Thanks > Nicholas > > On Fri, Jan 6, 2023 at 10:59 AM Matthew Knepley wrote: > >> On Fri, Jan 6, 2023 at 10:41 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Matt >>> >>> I appreciate the help. The section view is quite extensive because each >>> cell has 55 dofs located at the cells and on certain faces. I've appended >>> the first of these which corresponds with the output in the first email, to >>> save space. The following 54 are exactly the same but offset incremented by >>> 1. (or negative 1 for negative offsets) >>> >> >> Okay, from the output it is clear that this vector does not match your >> global section. Did you get stateVec by calling DMCreateGlobalVector()? >> >> Thanks, >> >> Matt >> >> >>> Thanks for your time >>> Nicholas >>> >>> On Fri, Jan 6, 2023 at 10:23 AM Matthew Knepley >>> wrote: >>> >>>> On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Matt >>>>> >>>>> I apologize for any lack of clarity in the initial email. >>>>> >>>>> looking at the initial output on rank 1 >>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>>>> cell 0 offset 2475 oStart 2640 -165 >>>>> cell 1 offset 2530 oStart 2640 -110 >>>>> cell 2 offset 2585 oStart 2640 -55 >>>>> cell 3 offset 2640 oStart 2640 0 >>>>> ..... >>>>> cell 15 offset -771 oStart 2640 -3411 >>>>> >>>>> >>>>> cell 15 provides a negative offset because it is the overlap cell >>>>> (that is unowned) >>>>> The remained of cells are all owned. However, the first 3 cells >>>>> (0,1,2) return an offset that is less than the starting ownership range. I >>>>> would expect cell 0 to start at offset 2640 at minimum. >>>>> >>>> >>>> Send the output for this section >>>> >>>> call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Apologies. If it helps, there is one cell of overlap in this small >>>>>>> test case for a 2D mesh that is 1 cell in height and a number of cells in >>>>>>> length. . >>>>>>> >>>>>>> process 0 >>>>>>> Petsc VecGetLocalSize 2750 >>>>>>> size(stateVecV) 2750 >>>>>>> >>>>>>> process 1 >>>>>>> Petsc VecGetLocalSize 2640 >>>>>>> size(stateVecV) 2640 >>>>>>> >>>>>> >>>>>> The offsets shown below are well-within these sizes. I do not >>>>>> understand the problem. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi Matt >>>>>>>>> >>>>>>>>> I made a typo on the line statVecV(offset) = in >>>>>>>>> my example, I agree. (I wrote that offhand since the actual assignment is >>>>>>>>> much larger) I should be statVecV(offset+1) = so I'm confident >>>>>>>>> it's not a 1 0 indexing thing. >>>>>>>>> >>>>>>>>> My question is more related to what is happening in the offsets. >>>>>>>>> c0 and c1 are pulled using DMplexgetheight stratum, so they are >>>>>>>>> zero-indexed (which is why I loop from c0 to (c1-1)). >>>>>>>>> >>>>>>>>> For the size inquiries. on processor 0 >>>>>>>>> Petsc VecGetSize(stateVec) 5390 >>>>>>>>> >>>>>>>> >>>>>>>> I need to see VecGetLocalSize() >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> size(stateVecV) 2640 >>>>>>>>> >>>>>>>>> on processor 1 >>>>>>>>> Petsc VecGetSize 5390 >>>>>>>>> size(stateVecV) 2750 >>>>>>>>> >>>>>>>>> It's quite weird to me that processor one can have a positive >>>>>>>>> offset that is less than its starting ownership index (in the initial email >>>>>>>>> output). >>>>>>>>> >>>>>>>>> Thanks for the assistance >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Petsc Users, >>>>>>>>>>> >>>>>>>>>>> I'm working with a dmplex system with a subsampled mesh >>>>>>>>>>> distributed with an overlap of 1. >>>>>>>>>>> >>>>>>>>>>> I'm encountering unusual situations when using >>>>>>>>>>> VecGetOwnershipRange to adjust the offset received from a global section. >>>>>>>>>>> The logic of the following code is first to get the offset needed to index >>>>>>>>>>> a global vector while still being able to check if it is an overlapped cell >>>>>>>>>>> and skip if needed while counting the owned cells. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>>>>>> do i = c0, (c1-1) >>>>>>>>>>> >>>>>>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset- >>>>>>>>>>> oStart >>>>>>>>>>> >>>>>>>>>>> if(offset<0) then >>>>>>>>>>> cycle >>>>>>>>>>> endif >>>>>>>>>>> offset=offset-oStart >>>>>>>>>>> plexcells=plexcells+1 >>>>>>>>>>> stateVecV(offset)= enddo >>>>>>>>>>> >>>>>>>>>>> I'm noticing some very weird results that I've appended below. >>>>>>>>>>> The GetOffset documentation notes that a negative offset indicates an >>>>>>>>>>> unowned point (which I use to cycle). However, the offset subtraction with >>>>>>>>>>> oStart will yield an illegal index for the Vector access. I see that on the >>>>>>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>>>>>> definition have synchronized section information) I was wondering if there >>>>>>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can you show your vector sizes? Are you sure it is not the fact >>>>>>>>>> that F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> on rank 0 >>>>>>>>>>> >>>>>>>>>>> cell 0 offset 0 oStart 0 >>>>>>>>>>> 0 >>>>>>>>>>> cell 1 offset 55 oStart 0 >>>>>>>>>>> 55 >>>>>>>>>>> cell 2 offset 110 oStart 0 >>>>>>>>>>> 110 >>>>>>>>>>> cell 3 offset 165 oStart 0 >>>>>>>>>>> 165 >>>>>>>>>>> cell 4 offset 220 oStart 0 >>>>>>>>>>> 220 >>>>>>>>>>> cell 5 offset 275 oStart 0 >>>>>>>>>>> 275 >>>>>>>>>>> cell 6 offset 330 oStart 0 >>>>>>>>>>> 330 >>>>>>>>>>> cell 7 offset 385 oStart 0 >>>>>>>>>>> 385 >>>>>>>>>>> cell 8 offset 440 oStart 0 >>>>>>>>>>> 440 >>>>>>>>>>> cell 9 offset 495 oStart 0 >>>>>>>>>>> 495 >>>>>>>>>>> cell 10 offset 550 oStart 0 >>>>>>>>>>> 550 >>>>>>>>>>> cell 11 offset 605 oStart 0 >>>>>>>>>>> 605 >>>>>>>>>>> cell 12 offset 660 oStart 0 >>>>>>>>>>> 660 >>>>>>>>>>> cell 13 offset 715 oStart 0 >>>>>>>>>>> 715 >>>>>>>>>>> >>>>>>>>>>> and on rank one >>>>>>>>>>> cell 0 offset 2475 oStart 2640 >>>>>>>>>>> -165 >>>>>>>>>>> cell 1 offset 2530 oStart 2640 >>>>>>>>>>> -110 >>>>>>>>>>> cell 2 offset 2585 oStart 2640 >>>>>>>>>>> -55 >>>>>>>>>>> cell 3 offset 2640 oStart 2640 >>>>>>>>>>> 0 >>>>>>>>>>> cell 4 offset 2695 oStart 2640 >>>>>>>>>>> 55 >>>>>>>>>>> cell 5 offset 2750 oStart 2640 >>>>>>>>>>> 110 >>>>>>>>>>> cell 6 offset 2805 oStart 2640 >>>>>>>>>>> 165 >>>>>>>>>>> cell 7 offset 2860 oStart 2640 >>>>>>>>>>> 220 >>>>>>>>>>> cell 8 offset 2915 oStart 2640 >>>>>>>>>>> 275 >>>>>>>>>>> cell 9 offset 2970 oStart 2640 >>>>>>>>>>> 330 >>>>>>>>>>> cell 10 offset 3025 oStart 2640 >>>>>>>>>>> 385 >>>>>>>>>>> cell 11 offset 3080 oStart 2640 >>>>>>>>>>> 440 >>>>>>>>>>> cell 12 offset 3135 oStart 2640 >>>>>>>>>>> 495 >>>>>>>>>>> cell 13 offset 3190 oStart 2640 >>>>>>>>>>> 550 >>>>>>>>>>> cell 14 offset 3245 oStart 2640 >>>>>>>>>>> 605 >>>>>>>>>>> cell 15 offset -771 oStart 2640 >>>>>>>>>>> -3411 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sincerely >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Fri Jan 6 11:17:51 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 6 Jan 2023 12:17:51 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: I am doing that as well (although not for vertex dofs). And I had it working quite well for purely cell-associated DOFs. But I realized later that I also wanted to transmit some DOFs associated with faces so I suspect I'm messing something up there. Something we discussed back on 12/26 (email subject:Getting a vector from a DM to output VTK) was associating a vector generated this way with the DM so it can be more easily visualized using vtk files is VecSetOperation(state_dist, VECOP_VIEW, (void(*)(void))VecView_Plex); If there is an easy way to get this working in Fortran I would very much appreciate it as it was very helpful when I was debugging in C. Thanks Nicholas On Fri, Jan 6, 2023 at 11:37 AM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 11:32 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Matt >> >> This was generated using the DMPlexDistributeField which we discussed a >> while back. Everything seemed to be working fine when I only had cells dofs >> but I recently added face dofs, which seems to have caused some issues. >> Whats weird is that I'm feeding the same distribution SF and inputting a >> vector and section that are consistent to the DMPlexDistributeField so I'd >> expect the vectors and section output to be consistent. I'll take a closer >> look at that. >> > > The way I use DistributeField(), for example to distribute coordinates, is > that I give it the migrationSF, the local coordinate section, and the local > coordinate vector. It gives me back the new local coordinate section (which > I set into the distributed DM), and the local coordinate vector. It seems > like you are doing something else. Maybe the documentation is misleading. > > Thanks, > > Matt > > >> Thanks >> Nicholas >> >> On Fri, Jan 6, 2023 at 10:59 AM Matthew Knepley >> wrote: >> >>> On Fri, Jan 6, 2023 at 10:41 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Matt >>>> >>>> I appreciate the help. The section view is quite extensive because each >>>> cell has 55 dofs located at the cells and on certain faces. I've appended >>>> the first of these which corresponds with the output in the first email, to >>>> save space. The following 54 are exactly the same but offset incremented by >>>> 1. (or negative 1 for negative offsets) >>>> >>> >>> Okay, from the output it is clear that this vector does not match your >>> global section. Did you get stateVec by calling DMCreateGlobalVector()? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks for your time >>>> Nicholas >>>> >>>> On Fri, Jan 6, 2023 at 10:23 AM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Matt >>>>>> >>>>>> I apologize for any lack of clarity in the initial email. >>>>>> >>>>>> looking at the initial output on rank 1 >>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>> cell 1 offset 2530 oStart 2640 -110 >>>>>> cell 2 offset 2585 oStart 2640 -55 >>>>>> cell 3 offset 2640 oStart 2640 0 >>>>>> ..... >>>>>> cell 15 offset -771 oStart 2640 -3411 >>>>>> >>>>>> >>>>>> cell 15 provides a negative offset because it is the overlap cell >>>>>> (that is unowned) >>>>>> The remained of cells are all owned. However, the first 3 cells >>>>>> (0,1,2) return an offset that is less than the starting ownership range. I >>>>>> would expect cell 0 to start at offset 2640 at minimum. >>>>>> >>>>> >>>>> Send the output for this section >>>>> >>>>> call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Apologies. If it helps, there is one cell of overlap in this small >>>>>>>> test case for a 2D mesh that is 1 cell in height and a number of cells in >>>>>>>> length. . >>>>>>>> >>>>>>>> process 0 >>>>>>>> Petsc VecGetLocalSize 2750 >>>>>>>> size(stateVecV) 2750 >>>>>>>> >>>>>>>> process 1 >>>>>>>> Petsc VecGetLocalSize 2640 >>>>>>>> size(stateVecV) 2640 >>>>>>>> >>>>>>> >>>>>>> The offsets shown below are well-within these sizes. I do not >>>>>>> understand the problem. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi Matt >>>>>>>>>> >>>>>>>>>> I made a typo on the line statVecV(offset) = >>>>>>>>>> in my example, I agree. (I wrote that offhand since the actual assignment >>>>>>>>>> is much larger) I should be statVecV(offset+1) = so I'm >>>>>>>>>> confident it's not a 1 0 indexing thing. >>>>>>>>>> >>>>>>>>>> My question is more related to what is happening in the offsets. >>>>>>>>>> c0 and c1 are pulled using DMplexgetheight stratum, so they are >>>>>>>>>> zero-indexed (which is why I loop from c0 to (c1-1)). >>>>>>>>>> >>>>>>>>>> For the size inquiries. on processor 0 >>>>>>>>>> Petsc VecGetSize(stateVec) 5390 >>>>>>>>>> >>>>>>>>> >>>>>>>>> I need to see VecGetLocalSize() >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> size(stateVecV) 2640 >>>>>>>>>> >>>>>>>>>> on processor 1 >>>>>>>>>> Petsc VecGetSize 5390 >>>>>>>>>> size(stateVecV) 2750 >>>>>>>>>> >>>>>>>>>> It's quite weird to me that processor one can have a positive >>>>>>>>>> offset that is less than its starting ownership index (in the initial email >>>>>>>>>> output). >>>>>>>>>> >>>>>>>>>> Thanks for the assistance >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Petsc Users, >>>>>>>>>>>> >>>>>>>>>>>> I'm working with a dmplex system with a subsampled mesh >>>>>>>>>>>> distributed with an overlap of 1. >>>>>>>>>>>> >>>>>>>>>>>> I'm encountering unusual situations when using >>>>>>>>>>>> VecGetOwnershipRange to adjust the offset received from a global section. >>>>>>>>>>>> The logic of the following code is first to get the offset needed to index >>>>>>>>>>>> a global vector while still being able to check if it is an overlapped cell >>>>>>>>>>>> and skip if needed while counting the owned cells. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>>>>>>> do i = c0, (c1-1) >>>>>>>>>>>> >>>>>>>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset >>>>>>>>>>>> -oStart >>>>>>>>>>>> >>>>>>>>>>>> if(offset<0) then >>>>>>>>>>>> cycle >>>>>>>>>>>> endif >>>>>>>>>>>> offset=offset-oStart >>>>>>>>>>>> plexcells=plexcells+1 >>>>>>>>>>>> stateVecV(offset)= enddo >>>>>>>>>>>> >>>>>>>>>>>> I'm noticing some very weird results that I've appended below. >>>>>>>>>>>> The GetOffset documentation notes that a negative offset indicates an >>>>>>>>>>>> unowned point (which I use to cycle). However, the offset subtraction with >>>>>>>>>>>> oStart will yield an illegal index for the Vector access. I see that on the >>>>>>>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>>>>>>> definition have synchronized section information) I was wondering if there >>>>>>>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Can you show your vector sizes? Are you sure it is not the fact >>>>>>>>>>> that F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> on rank 0 >>>>>>>>>>>> >>>>>>>>>>>> cell 0 offset 0 oStart 0 >>>>>>>>>>>> 0 >>>>>>>>>>>> cell 1 offset 55 oStart 0 >>>>>>>>>>>> 55 >>>>>>>>>>>> cell 2 offset 110 oStart 0 >>>>>>>>>>>> 110 >>>>>>>>>>>> cell 3 offset 165 oStart 0 >>>>>>>>>>>> 165 >>>>>>>>>>>> cell 4 offset 220 oStart 0 >>>>>>>>>>>> 220 >>>>>>>>>>>> cell 5 offset 275 oStart 0 >>>>>>>>>>>> 275 >>>>>>>>>>>> cell 6 offset 330 oStart 0 >>>>>>>>>>>> 330 >>>>>>>>>>>> cell 7 offset 385 oStart 0 >>>>>>>>>>>> 385 >>>>>>>>>>>> cell 8 offset 440 oStart 0 >>>>>>>>>>>> 440 >>>>>>>>>>>> cell 9 offset 495 oStart 0 >>>>>>>>>>>> 495 >>>>>>>>>>>> cell 10 offset 550 oStart 0 >>>>>>>>>>>> 550 >>>>>>>>>>>> cell 11 offset 605 oStart 0 >>>>>>>>>>>> 605 >>>>>>>>>>>> cell 12 offset 660 oStart 0 >>>>>>>>>>>> 660 >>>>>>>>>>>> cell 13 offset 715 oStart 0 >>>>>>>>>>>> 715 >>>>>>>>>>>> >>>>>>>>>>>> and on rank one >>>>>>>>>>>> cell 0 offset 2475 oStart 2640 >>>>>>>>>>>> -165 >>>>>>>>>>>> cell 1 offset 2530 oStart 2640 >>>>>>>>>>>> -110 >>>>>>>>>>>> cell 2 offset 2585 oStart 2640 >>>>>>>>>>>> -55 >>>>>>>>>>>> cell 3 offset 2640 oStart 2640 >>>>>>>>>>>> 0 >>>>>>>>>>>> cell 4 offset 2695 oStart 2640 >>>>>>>>>>>> 55 >>>>>>>>>>>> cell 5 offset 2750 oStart 2640 >>>>>>>>>>>> 110 >>>>>>>>>>>> cell 6 offset 2805 oStart 2640 >>>>>>>>>>>> 165 >>>>>>>>>>>> cell 7 offset 2860 oStart 2640 >>>>>>>>>>>> 220 >>>>>>>>>>>> cell 8 offset 2915 oStart 2640 >>>>>>>>>>>> 275 >>>>>>>>>>>> cell 9 offset 2970 oStart 2640 >>>>>>>>>>>> 330 >>>>>>>>>>>> cell 10 offset 3025 oStart 2640 >>>>>>>>>>>> 385 >>>>>>>>>>>> cell 11 offset 3080 oStart 2640 >>>>>>>>>>>> 440 >>>>>>>>>>>> cell 12 offset 3135 oStart 2640 >>>>>>>>>>>> 495 >>>>>>>>>>>> cell 13 offset 3190 oStart 2640 >>>>>>>>>>>> 550 >>>>>>>>>>>> cell 14 offset 3245 oStart 2640 >>>>>>>>>>>> 605 >>>>>>>>>>>> cell 15 offset -771 oStart 2640 >>>>>>>>>>>> -3411 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Sincerely >>>>>>>>>>>> Nicholas >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>> >>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>> University of Michigan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Jan 6 11:32:31 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 6 Jan 2023 12:32:31 -0500 Subject: [petsc-users] Definition of threshold(s) in gamg and boomerAMG In-Reply-To: References: Message-ID: These thresholds are for completely different coursening algorithms. GAMG just drops edges below the threshold and hypre (classical AMG) does something completely different. Mark On Fri, Jan 6, 2023 at 10:43 AM Edoardo Centofanti < edoardo.centofanti01 at universitadipavia.it> wrote: > Hi PETSc users, > > I was looking for the exact definitions of the threshold parameter > (-pc_gamg_threshold) for gamg and of the strong threshold > (-pc_hypre_boomeramg_strong_threshold) for Hypre BoomerAMG. My curiosity > comes from the fact that the suggested parameters (apparently acting on the > same aspect of the algorithms) differ of around one order of magnitude (if > I remember right from the PETSc manual, the gamg threshold for 3D problems > is around 0.05, while for boomerAMG ranges from 0.25 to 0.5). > > Thank you in advance, > Edoardo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Fri Jan 6 12:23:04 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Fri, 6 Jan 2023 13:23:04 -0500 Subject: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported In-Reply-To: References: Message-ID: <5284CBD3-099E-4A05-83B0-FCCEFE91A817@gmail.com> An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 6 14:50:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 15:50:56 -0500 Subject: [petsc-users] Vec Ownership ranges with Global Section Offsets In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 12:18 PM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > I am doing that as well (although not for vertex dofs). And I had it > working quite well for purely cell-associated DOFs. But I realized later > that I also wanted to transmit some DOFs associated with faces so I suspect > I'm messing something up there. > > Something we discussed back on 12/26 (email subject:Getting a vector from > a DM to output VTK) was associating a vector generated this way with the > DM so it can be more easily visualized using vtk files is > > VecSetOperation(state_dist, VECOP_VIEW, (void(*)(void))VecView_Plex); > > If there is an easy way to get this working in Fortran I would very much > appreciate it as it was very helpful when I was debugging in C. > This is a workaround for Fortran (the problem is that Fortran cannot easily refer to the function pointers). You can 1) Set the new section into the distributed DM 2) Make a new global or local vector 3) Call VecCopy() to move the data The new vector will have the right viewer. Thanks, Matt > Thanks > Nicholas > > On Fri, Jan 6, 2023 at 11:37 AM Matthew Knepley wrote: > >> On Fri, Jan 6, 2023 at 11:32 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Matt >>> >>> This was generated using the DMPlexDistributeField which we discussed a >>> while back. Everything seemed to be working fine when I only had cells dofs >>> but I recently added face dofs, which seems to have caused some issues. >>> Whats weird is that I'm feeding the same distribution SF and inputting a >>> vector and section that are consistent to the DMPlexDistributeField so I'd >>> expect the vectors and section output to be consistent. I'll take a closer >>> look at that. >>> >> >> The way I use DistributeField(), for example to distribute coordinates, >> is that I give it the migrationSF, the local coordinate section, and the >> local coordinate vector. It gives me back the new local coordinate section >> (which I set into the distributed DM), and the local coordinate vector. It >> seems like you are doing something else. Maybe the documentation is >> misleading. >> >> Thanks, >> >> Matt >> >> >>> Thanks >>> Nicholas >>> >>> On Fri, Jan 6, 2023 at 10:59 AM Matthew Knepley >>> wrote: >>> >>>> On Fri, Jan 6, 2023 at 10:41 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Matt >>>>> >>>>> I appreciate the help. The section view is quite extensive because >>>>> each cell has 55 dofs located at the cells and on certain faces. I've >>>>> appended the first of these which corresponds with the output in the first >>>>> email, to save space. The following 54 are exactly the same but offset >>>>> incremented by 1. (or negative 1 for negative offsets) >>>>> >>>> >>>> Okay, from the output it is clear that this vector does not match your >>>> global section. Did you get stateVec by calling DMCreateGlobalVector()? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks for your time >>>>> Nicholas >>>>> >>>>> On Fri, Jan 6, 2023 at 10:23 AM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Fri, Jan 6, 2023 at 10:10 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Matt >>>>>>> >>>>>>> I apologize for any lack of clarity in the initial email. >>>>>>> >>>>>>> looking at the initial output on rank 1 >>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart >>>>>>> cell 0 offset 2475 oStart 2640 -165 >>>>>>> cell 1 offset 2530 oStart 2640 -110 >>>>>>> cell 2 offset 2585 oStart 2640 -55 >>>>>>> cell 3 offset 2640 oStart 2640 0 >>>>>>> ..... >>>>>>> cell 15 offset -771 oStart 2640 -3411 >>>>>>> >>>>>>> >>>>>>> cell 15 provides a negative offset because it is the overlap cell >>>>>>> (that is unowned) >>>>>>> The remained of cells are all owned. However, the first 3 cells >>>>>>> (0,1,2) return an offset that is less than the starting ownership range. I >>>>>>> would expect cell 0 to start at offset 2640 at minimum. >>>>>>> >>>>>> >>>>>> Send the output for this section >>>>>> >>>>>> call PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD); >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 6, 2023 at 10:05 AM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Fri, Jan 6, 2023 at 9:56 AM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Apologies. If it helps, there is one cell of overlap in this small >>>>>>>>> test case for a 2D mesh that is 1 cell in height and a number of cells in >>>>>>>>> length. . >>>>>>>>> >>>>>>>>> process 0 >>>>>>>>> Petsc VecGetLocalSize 2750 >>>>>>>>> size(stateVecV) 2750 >>>>>>>>> >>>>>>>>> process 1 >>>>>>>>> Petsc VecGetLocalSize 2640 >>>>>>>>> size(stateVecV) 2640 >>>>>>>>> >>>>>>>> >>>>>>>> The offsets shown below are well-within these sizes. I do not >>>>>>>> understand the problem. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> On Fri, Jan 6, 2023 at 9:51 AM Matthew Knepley >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Fri, Jan 6, 2023 at 9:37 AM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Matt >>>>>>>>>>> >>>>>>>>>>> I made a typo on the line statVecV(offset) = >>>>>>>>>>> in my example, I agree. (I wrote that offhand since the actual assignment >>>>>>>>>>> is much larger) I should be statVecV(offset+1) = so I'm >>>>>>>>>>> confident it's not a 1 0 indexing thing. >>>>>>>>>>> >>>>>>>>>>> My question is more related to what is happening in the offsets. >>>>>>>>>>> c0 and c1 are pulled using DMplexgetheight stratum, so they are >>>>>>>>>>> zero-indexed (which is why I loop from c0 to (c1-1)). >>>>>>>>>>> >>>>>>>>>>> For the size inquiries. on processor 0 >>>>>>>>>>> Petsc VecGetSize(stateVec) 5390 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I need to see VecGetLocalSize() >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> size(stateVecV) 2640 >>>>>>>>>>> >>>>>>>>>>> on processor 1 >>>>>>>>>>> Petsc VecGetSize 5390 >>>>>>>>>>> size(stateVecV) 2750 >>>>>>>>>>> >>>>>>>>>>> It's quite weird to me that processor one can have a positive >>>>>>>>>>> offset that is less than its starting ownership index (in the initial email >>>>>>>>>>> output). >>>>>>>>>>> >>>>>>>>>>> Thanks for the assistance >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 6, 2023 at 9:20 AM Matthew Knepley < >>>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 6, 2023 at 2:28 AM Nicholas Arnold-Medabalimi < >>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Petsc Users, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm working with a dmplex system with a subsampled mesh >>>>>>>>>>>>> distributed with an overlap of 1. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm encountering unusual situations when using >>>>>>>>>>>>> VecGetOwnershipRange to adjust the offset received from a global section. >>>>>>>>>>>>> The logic of the following code is first to get the offset needed to index >>>>>>>>>>>>> a global vector while still being able to check if it is an overlapped cell >>>>>>>>>>>>> and skip if needed while counting the owned cells. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> call DMGetGlobalSection(dmplex,section,ierr) >>>>>>>>>>>>> call VecGetArrayF90(stateVec,stateVecV,ierr) >>>>>>>>>>>>> call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr) >>>>>>>>>>>>> do i = c0, (c1-1) >>>>>>>>>>>>> >>>>>>>>>>>>> call PetscSectionGetOffset(section,i,offset,ierr) >>>>>>>>>>>>> write(*,*) "cell",i,"offset",offset,'oStart',oStart, >>>>>>>>>>>>> offset-oStart >>>>>>>>>>>>> >>>>>>>>>>>>> if(offset<0) then >>>>>>>>>>>>> cycle >>>>>>>>>>>>> endif >>>>>>>>>>>>> offset=offset-oStart >>>>>>>>>>>>> plexcells=plexcells+1 >>>>>>>>>>>>> stateVecV(offset)= enddo >>>>>>>>>>>>> >>>>>>>>>>>>> I'm noticing some very weird results that I've appended below. >>>>>>>>>>>>> The GetOffset documentation notes that a negative offset indicates an >>>>>>>>>>>>> unowned point (which I use to cycle). However, the offset subtraction with >>>>>>>>>>>>> oStart will yield an illegal index for the Vector access. I see that on the >>>>>>>>>>>>> documentation for GetOwnershipRange, it notes that this may be >>>>>>>>>>>>> "ill-defined" but I wanted to see if this is type of ill-defined I can >>>>>>>>>>>>> expect or there is just something terribly wrong with my PetscSection.(both >>>>>>>>>>>>> the Vec and Section were produced from DMPlexDistributeField so should by >>>>>>>>>>>>> definition have synchronized section information) I was wondering if there >>>>>>>>>>>>> is a possible output and/or the best way to index the vector. I'm thinking >>>>>>>>>>>>> of subtracting the offset of cell 0 perhaps? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Can you show your vector sizes? Are you sure it is not the fact >>>>>>>>>>>> that F90 arrays use 1-based indices, but these are 0-based offsets? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> on rank 0 >>>>>>>>>>>>> >>>>>>>>>>>>> cell 0 offset 0 oStart 0 >>>>>>>>>>>>> 0 >>>>>>>>>>>>> cell 1 offset 55 oStart 0 >>>>>>>>>>>>> 55 >>>>>>>>>>>>> cell 2 offset 110 oStart 0 >>>>>>>>>>>>> 110 >>>>>>>>>>>>> cell 3 offset 165 oStart 0 >>>>>>>>>>>>> 165 >>>>>>>>>>>>> cell 4 offset 220 oStart 0 >>>>>>>>>>>>> 220 >>>>>>>>>>>>> cell 5 offset 275 oStart 0 >>>>>>>>>>>>> 275 >>>>>>>>>>>>> cell 6 offset 330 oStart 0 >>>>>>>>>>>>> 330 >>>>>>>>>>>>> cell 7 offset 385 oStart 0 >>>>>>>>>>>>> 385 >>>>>>>>>>>>> cell 8 offset 440 oStart 0 >>>>>>>>>>>>> 440 >>>>>>>>>>>>> cell 9 offset 495 oStart 0 >>>>>>>>>>>>> 495 >>>>>>>>>>>>> cell 10 offset 550 oStart 0 >>>>>>>>>>>>> 550 >>>>>>>>>>>>> cell 11 offset 605 oStart 0 >>>>>>>>>>>>> 605 >>>>>>>>>>>>> cell 12 offset 660 oStart 0 >>>>>>>>>>>>> 660 >>>>>>>>>>>>> cell 13 offset 715 oStart 0 >>>>>>>>>>>>> 715 >>>>>>>>>>>>> >>>>>>>>>>>>> and on rank one >>>>>>>>>>>>> cell 0 offset 2475 oStart 2640 >>>>>>>>>>>>> -165 >>>>>>>>>>>>> cell 1 offset 2530 oStart 2640 >>>>>>>>>>>>> -110 >>>>>>>>>>>>> cell 2 offset 2585 oStart 2640 >>>>>>>>>>>>> -55 >>>>>>>>>>>>> cell 3 offset 2640 oStart 2640 >>>>>>>>>>>>> 0 >>>>>>>>>>>>> cell 4 offset 2695 oStart 2640 >>>>>>>>>>>>> 55 >>>>>>>>>>>>> cell 5 offset 2750 oStart 2640 >>>>>>>>>>>>> 110 >>>>>>>>>>>>> cell 6 offset 2805 oStart 2640 >>>>>>>>>>>>> 165 >>>>>>>>>>>>> cell 7 offset 2860 oStart 2640 >>>>>>>>>>>>> 220 >>>>>>>>>>>>> cell 8 offset 2915 oStart 2640 >>>>>>>>>>>>> 275 >>>>>>>>>>>>> cell 9 offset 2970 oStart 2640 >>>>>>>>>>>>> 330 >>>>>>>>>>>>> cell 10 offset 3025 oStart 2640 >>>>>>>>>>>>> 385 >>>>>>>>>>>>> cell 11 offset 3080 oStart 2640 >>>>>>>>>>>>> 440 >>>>>>>>>>>>> cell 12 offset 3135 oStart 2640 >>>>>>>>>>>>> 495 >>>>>>>>>>>>> cell 13 offset 3190 oStart 2640 >>>>>>>>>>>>> 550 >>>>>>>>>>>>> cell 14 offset 3245 oStart 2640 >>>>>>>>>>>>> 605 >>>>>>>>>>>>> cell 15 offset -771 oStart 2640 >>>>>>>>>>>>> -3411 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely >>>>>>>>>>>>> Nicholas >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>> >>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>> their experiments lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Fri Jan 6 16:14:38 2023 From: danyang.su at gmail.com (Danyang Su) Date: Fri, 06 Jan 2023 14:14:38 -0800 Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> Message-ID: Hi Pierre, I have tried to exclude Conda related environment variables but it does not work. Instead, if I include ?--download-hdf5=yes? but exclude ?--with-hdf5-fortran-bindings? in the configuration, PETSc can be configured and installed without problem, even with Conda related environment activated. However, since my code requires fortran interface to HDF5, I do need ?--with-hdf5-fortran-bindings?, otherwise, my code cannot be compiled. Any other suggestions? Thanks, Danyang From: Pierre Jolivet Date: Friday, January 6, 2023 at 7:59 AM To: Danyang Su Cc: Subject: Re: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 On 6 Jan 2023, at 4:49 PM, Danyang Su wrote: Hi All, I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings Any idea on this? Could you try to reconfigure in a shell without conda being activated? You have PATH=/Users/danyangsu/Soft/Anaconda3/bin:/Users/danyangsu/Soft/Anaconda3/condabin:[?] which typically results in a broken configuration. Thanks, Pierre Thanks, Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jan 6 16:21:33 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 6 Jan 2023 17:21:33 -0500 Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> Message-ID: Please email your latest configure.log (with no Conda stuff) to petsc-maint at mcs.anl.gov The configuration of HDF5 (done by HDF5) is objecting to some particular aspect of your current Fortran compiler, we need to figure out the exact objection. Barry > On Jan 6, 2023, at 5:14 PM, Danyang Su wrote: > > Hi Pierre, > > I have tried to exclude Conda related environment variables but it does not work. Instead, if I include ?--download-hdf5=yes? but exclude ?--with-hdf5-fortran-bindings? in the configuration, PETSc can be configured and installed without problem, even with Conda related environment activated. However, since my code requires fortran interface to HDF5, I do need ?--with-hdf5-fortran-bindings?, otherwise, my code cannot be compiled. > > Any other suggestions? > > Thanks, > > Danyang > > From: Pierre Jolivet > > Date: Friday, January 6, 2023 at 7:59 AM > To: Danyang Su > > Cc: > > Subject: Re: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 > > > > >> On 6 Jan 2023, at 4:49 PM, Danyang Su > wrote: >> >> Hi All, >> >> I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. >> >> ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings >> >> Any idea on this? > > Could you try to reconfigure in a shell without conda being activated? > You have PATH=/Users/danyangsu/Soft/Anaconda3/bin:/Users/danyangsu/Soft/Anaconda3/condabin:[?] which typically results in a broken configuration. > > Thanks, > Pierre > > >> Thanks, >> >> Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Jan 6 16:23:56 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 6 Jan 2023 16:23:56 -0600 (CST) Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> Message-ID: <82e62a5e-2a30-35be-cdc9-be4fbcff3489@mcs.anl.gov> Likely your installed gfortran is incompatible with hdf5 >>>> Executing: gfortran --version stdout: GNU Fortran (GCC) 8.2.0 <<<< We generally use brew gfortran - and that works with hdf5 aswell balay at ypro ~ % gfortran --version GNU Fortran (Homebrew GCC 11.2.0_1) 11.2.0 Satish On Fri, 6 Jan 2023, Danyang Su wrote: > Hi Pierre, > > > > I have tried to exclude Conda related environment variables but it does not work. Instead, if I include ?--download-hdf5=yes? but exclude ?--with-hdf5-fortran-bindings? in the configuration, PETSc can be configured and installed without problem, even with Conda related environment activated. However, since my code requires fortran interface to HDF5, I do need ?--with-hdf5-fortran-bindings?, otherwise, my code cannot be compiled. > > > > Any other suggestions? > > > > Thanks, > > > > Danyang > > > > From: Pierre Jolivet > Date: Friday, January 6, 2023 at 7:59 AM > To: Danyang Su > Cc: > Subject: Re: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 > > > > > > > > On 6 Jan 2023, at 4:49 PM, Danyang Su wrote: > > > > Hi All, > > > > I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. > > > > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings > > > > Any idea on this? > > > > Could you try to reconfigure in a shell without conda being activated? > > You have PATH=/Users/danyangsu/Soft/Anaconda3/bin:/Users/danyangsu/Soft/Anaconda3/condabin:[?] which typically results in a broken configuration. > > > > Thanks, > > Pierre > > > > Thanks, > > > > Danyang > > > > > From venugovh at mail.uc.edu Fri Jan 6 17:21:53 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Fri, 6 Jan 2023 23:21:53 +0000 Subject: [petsc-users] Getting global indices of vector distributed among different processes. In-Reply-To: References: Message-ID: Thank you, Matthew! From: Matthew Knepley Sent: Wednesday, January 4, 2023 10:52 AM To: Venugopal, Vysakh (venugovh) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Getting global indices of vector distributed among different processes. External Email: Use Caution On Wed, Jan 4, 2023 at 10:48 AM Venugopal, Vysakh (venugovh) via petsc-users > wrote: Hello, Is there a way to get the global indices from a vector created from DMCreateGlobalVector? Example: If global vector V (of size 10) has indices {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and they are divided into 2 processes. Is there a way to get information such as (process 1: {0,1,2,3,4}, process 2: {5,6,7,8,9})? https://petsc.org/main/docs/manualpages/Vec/VecGetOwnershipRange/ Thanks, Matt The reason I need this information is that I need to query the values of a different vector Q of size 10 and place those values in V. Example: Q(1) --- V(1) @ process 1, Q(7) - V(7) @ process 2, etc.. If there are smarter ways to do this, I am happy to pursue that. Thank you, Vysakh V. --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From venugovh at mail.uc.edu Fri Jan 6 17:22:40 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Fri, 6 Jan 2023 23:22:40 +0000 Subject: [petsc-users] Getting correct local size using VecScatterCreateToAll Message-ID: Hello, I have created a global vector V using DMCreateGlobalVector of size m. For n processes, the local size of V is m/n. Subsequently, I am using VecScatterCreateToAll to get a sequential copy of the V, let's call it V_seq of local size m. It passes through a function and outputs the vector V_seq_hat (of local size m). Is there a way for me to substitute the values of V_seq_hat (at the correct indices) to the original V (with local size m/n)? When I use VecScatterCreateToAll (with SCATTER_REVERSE), I am getting a vector V with new values from V_seq_hat but with local size m. This is causing issues in the rest of my code where the size of V needs to be m/n. Any help would be really appreciated! Thanks, --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Fri Jan 6 17:31:14 2023 From: danyang.su at gmail.com (Danyang Su) Date: Fri, 06 Jan 2023 15:31:14 -0800 Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: <82e62a5e-2a30-35be-cdc9-be4fbcff3489@mcs.anl.gov> References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> <82e62a5e-2a30-35be-cdc9-be4fbcff3489@mcs.anl.gov> Message-ID: <1A23D02E-6F85-465D-B02E-24AA72252206@gmail.com> Hi All, Problem is resolved by Homebrew Gfortran. I use GNU Fortran (GCC) 8.2.0 before. Conda does not cause the problem. Thanks, Danyang ?On 2023-01-06, 2:24 PM, "Satish Balay" > wrote: Likely your installed gfortran is incompatible with hdf5 >>>> Executing: gfortran --version stdout: GNU Fortran (GCC) 8.2.0 <<<< We generally use brew gfortran - and that works with hdf5 aswell balay at ypro ~ % gfortran --version GNU Fortran (Homebrew GCC 11.2.0_1) 11.2.0 Satish On Fri, 6 Jan 2023, Danyang Su wrote: > Hi Pierre, > > > > I have tried to exclude Conda related environment variables but it does not work. Instead, if I include ?--download-hdf5=yes? but exclude ?--with-hdf5-fortran-bindings? in the configuration, PETSc can be configured and installed without problem, even with Conda related environment activated. However, since my code requires fortran interface to HDF5, I do need ?--with-hdf5-fortran-bindings?, otherwise, my code cannot be compiled. > > > > Any other suggestions? > > > > Thanks, > > > > Danyang > > > > From: Pierre Jolivet > > Date: Friday, January 6, 2023 at 7:59 AM > To: Danyang Su > > Cc: > > Subject: Re: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 > > > > > > > > On 6 Jan 2023, at 4:49 PM, Danyang Su > wrote: > > > > Hi All, > > > > I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. > > > > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings > > > > Any idea on this? > > > > Could you try to reconfigure in a shell without conda being activated? > > You have PATH=/Users/danyangsu/Soft/Anaconda3/bin:/Users/danyangsu/Soft/Anaconda3/condabin:[?] which typically results in a broken configuration. > > > > Thanks, > > Pierre > > > > Thanks, > > > > Danyang > > > > > From knepley at gmail.com Fri Jan 6 18:21:13 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Jan 2023 19:21:13 -0500 Subject: [petsc-users] Getting correct local size using VecScatterCreateToAll In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 6:22 PM Venugopal, Vysakh (venugovh) via petsc-users wrote: > Hello, > > > > I have created a global vector V using DMCreateGlobalVector of size m. For > n processes, the local size of V is m/n. > > > > Subsequently, I am using VecScatterCreateToAll to get a sequential copy of > the V, let's call it V_seq of local size m. It passes through a function > and outputs the vector V_seq_hat (of local size m). > > > > Is there a way for me to substitute the values of V_seq_hat (at the > correct indices) to the original V (with local size m/n)? > > > > When I use VecScatterCreateToAll (with SCATTER_REVERSE), I am getting a > vector V with new values from V_seq_hat but with local size m. This is > causing issues in the rest of my code where the size of V needs to be m/n. > You do not create a new scatter. You use the same scatter you got in the first place, but with SCATTER_REVERSE. Thanks, Matt > Any help would be really appreciated! > > > > Thanks, > > > > --- > > Vysakh Venugopal > > Ph.D. Candidate > > Department of Mechanical Engineering > > University of Cincinnati, Cincinnati, OH 45221-0072 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Fri Jan 6 19:35:33 2023 From: mlohry at gmail.com (Mark Lohry) Date: Fri, 6 Jan 2023 20:35:33 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: <7130D391-34D2-4459-99D9-E09F0D20A987@petsc.dev> References: <87eds8hfxi.fsf@jedbrown.org> <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> <7130D391-34D2-4459-99D9-E09F0D20A987@petsc.dev> Message-ID: Well, I think it's a moderately crazy idea unless it's less painful to implement than I'm thinking. Is there a use case for a mixed device system where one petsc executable might be addressing both a HIP and CUDA device beyond some frankenstein test system somebody cooked up? In all my code I implicitly assume I have either have one host with one device or one host with zero devices. I guess you can support these weird scenarios, but why? Life is hard enough supporting one device compiler with one host compiler. Many thanks Junchao -- with combinations of SetPreallocation I was able to grab allocated pointers out of petsc. Now I have all the jacobian construction on device with no copies. On Fri, Jan 6, 2023 at 12:27 AM Barry Smith wrote: > > So Jed's "everyone" now consists of "no one" and Jed can stop > complaining that "everyone" thinks it is a bad idea. > > > > On Jan 5, 2023, at 11:50 PM, Junchao Zhang > wrote: > > > > > On Thu, Jan 5, 2023 at 10:32 PM Barry Smith wrote: > >> >> >> > On Jan 5, 2023, at 3:42 PM, Jed Brown wrote: >> > >> > Mark Adams writes: >> > >> >> Support of HIP and CUDA hardware together would be crazy, >> > >> > I don't think it's remotely crazy. libCEED supports both together and >> it's very convenient when testing on a development machine that has one of >> each brand GPU and simplifies binary distribution for us and every package >> that uses us. Every day I wish PETSc could build with both simultaneously, >> but everyone tells me it's silly. >> >> Not everyone at all; just a subset of everyone. Junchao is really the >> hold-out :-) >> > I am not, instead I think we should try (I fully agree it can ease binary > distribution). But satish needs to install such a machine first :) > There are issues out of our control if we want to mix GPUs in execution. > For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we > do it on the host? Also, there are no gpu-aware MPI implementations > supporting messages between cuda memory and hip memory. > >> >> I just don't care about "binary packages" :-); I think they are an >> archaic and bad way of thinking about code distribution (but yes the >> alternatives need lots of work to make them flawless, but I think that is >> where the work should go in the packaging world.) >> >> I go further and think one should be able to automatically use a CUDA >> vector on a HIP device as well, it is not hard in theory but requires >> thinking about how we handle classes and subclasses a little to make it >> straightforward; or perhaps Jacob has fixed that also? > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Jan 6 20:44:39 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 6 Jan 2023 20:44:39 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: <87eds8hfxi.fsf@jedbrown.org> <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> <7130D391-34D2-4459-99D9-E09F0D20A987@petsc.dev> Message-ID: On Fri, Jan 6, 2023 at 7:35 PM Mark Lohry wrote: > Well, I think it's a moderately crazy idea unless it's less painful to > implement than I'm thinking. Is there a use case for a mixed device system > where one petsc executable might be addressing both a HIP and CUDA device > beyond some frankenstein test system somebody cooked up? In all my code I > implicitly assume I have either have one host with one device or one host > with zero devices. I guess you can support these weird scenarios, but why? > Life is hard enough supporting one device compiler with one host compiler. > > Many thanks Junchao -- with combinations of SetPreallocation I was able to > grab allocated pointers out of petsc. Now I have all the jacobian > construction on device with no copies. > Hi, Mark, could you say a few words about how you assemble matrices on GPUs? We ported MatSetValues like routines to GPUs but did not continue this approach since we have to resolve data races between GPU threads. > > On Fri, Jan 6, 2023 at 12:27 AM Barry Smith wrote: > >> >> So Jed's "everyone" now consists of "no one" and Jed can stop >> complaining that "everyone" thinks it is a bad idea. >> >> >> >> On Jan 5, 2023, at 11:50 PM, Junchao Zhang >> wrote: >> >> >> >> >> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith wrote: >> >>> >>> >>> > On Jan 5, 2023, at 3:42 PM, Jed Brown wrote: >>> > >>> > Mark Adams writes: >>> > >>> >> Support of HIP and CUDA hardware together would be crazy, >>> > >>> > I don't think it's remotely crazy. libCEED supports both together and >>> it's very convenient when testing on a development machine that has one of >>> each brand GPU and simplifies binary distribution for us and every package >>> that uses us. Every day I wish PETSc could build with both simultaneously, >>> but everyone tells me it's silly. >>> >>> Not everyone at all; just a subset of everyone. Junchao is really the >>> hold-out :-) >>> >> I am not, instead I think we should try (I fully agree it can ease binary >> distribution). But satish needs to install such a machine first :) >> There are issues out of our control if we want to mix GPUs in execution. >> For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we >> do it on the host? Also, there are no gpu-aware MPI implementations >> supporting messages between cuda memory and hip memory. >> >>> >>> I just don't care about "binary packages" :-); I think they are an >>> archaic and bad way of thinking about code distribution (but yes the >>> alternatives need lots of work to make them flawless, but I think that is >>> where the work should go in the packaging world.) >>> >>> I go further and think one should be able to automatically use a CUDA >>> vector on a HIP device as well, it is not hard in theory but requires >>> thinking about how we handle classes and subclasses a little to make it >>> straightforward; or perhaps Jacob has fixed that also? >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sat Jan 7 00:22:30 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 06 Jan 2023 23:22:30 -0700 Subject: [petsc-users] How to install in /usr/lib64 instead of /usr/lib? In-Reply-To: <4ab0b0a8-27de-a014-4e9e-35e4539b7c78@mcs.anl.gov> References: <678527414.8071249.1672951835023.ref@mail.yahoo.com> <678527414.8071249.1672951835023@mail.yahoo.com> <4ab0b0a8-27de-a014-4e9e-35e4539b7c78@mcs.anl.gov> Message-ID: <87zgauq2xl.fsf@jedbrown.org> The make convention would be to respond to `libdir`, which is probably the simplest if we can defer that choice until install time. It probably needs to be known at build time, thus should go in configure. https://www.gnu.org/software/make/manual/html_node/Directory-Variables.html Satish Balay via petsc-users writes: > For now - perhaps the following patch... > > Satish > > --- > > diff --git a/config/install.py b/config/install.py > index 017bb736542..00f857f939e 100755 > --- a/config/install.py > +++ b/config/install.py > @@ -76,9 +76,9 @@ class Installer(script.Script): > self.archBinDir = os.path.join(self.rootDir, self.arch, 'bin') > self.archLibDir = os.path.join(self.rootDir, self.arch, 'lib') > self.destIncludeDir = os.path.join(self.destDir, 'include') > - self.destConfDir = os.path.join(self.destDir, 'lib','petsc','conf') > - self.destLibDir = os.path.join(self.destDir, 'lib') > - self.destBinDir = os.path.join(self.destDir, 'lib','petsc','bin') > + self.destConfDir = os.path.join(self.destDir, 'lib64','petsc','conf') > + self.destLibDir = os.path.join(self.destDir, 'lib64') > + self.destBinDir = os.path.join(self.destDir, 'lib64','petsc','bin') > self.installIncludeDir = os.path.join(self.installDir, 'include') > self.installBinDir = os.path.join(self.installDir, 'lib','petsc','bin') > self.rootShareDir = os.path.join(self.rootDir, 'share') > > On Thu, 5 Jan 2023, Fellype via petsc-users wrote: > >> Hi, >> I'm building petsc from sources on a 64-bit Slackware Linux and I would like to know how to install the libraries in /usr/lib64 instead of /usr/lib. Is it possible? I've not found an option like --libdir=DIR to pass to ./configure. >> >> Regards, >> Fellype From mlohry at gmail.com Sat Jan 7 06:15:20 2023 From: mlohry at gmail.com (Mark Lohry) Date: Sat, 7 Jan 2023 07:15:20 -0500 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: <87eds8hfxi.fsf@jedbrown.org> <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> <7130D391-34D2-4459-99D9-E09F0D20A987@petsc.dev> Message-ID: I've worked on a few different codes doing matrix assembly on GPU independently of petsc. In all instances to plug into petsc all I need are the device CSR pointers and some guarantee they don't move around (my first try without setpreallocation on CPU I saw the value array pointer move after the first solve). It would also be nice to have a guarantee there aren't any unnecessary copies since memory constraints are always a concern. Here I call MatCreateSeqAIJCUSPARSE MatSeqAIJSetPreallocationCSR (filled using a preexisting CSR on host using the correct index arrays and zeros for values) MatSeqAIJGetCSRAndMemType (grab the allocated device CSR pointers and use those directly) Then in the Jacobian evaluation routine I fill that CSR directly with no calls to MatSetValues, just MatAssemblyBegin(J,MAT_FINAL_ASSEMBLY); MatAssemblyEnd(J,MAT_FINAL_ASSEMBLY); after to put it in the correct state. In this code to fill the CSR coefficients, each GPU thread gets one row and fills it. No race conditions to contend with. Technically I'm duplicating some computations (a given dof could fill its own row and column) but this is much faster than the linear solver anyway. Other mesh based codes did GPU assembly using either coloring or mutexes, but still just need the CSR value array to fill. On Fri, Jan 6, 2023, 9:44 PM Junchao Zhang wrote: > > > > On Fri, Jan 6, 2023 at 7:35 PM Mark Lohry wrote: > >> Well, I think it's a moderately crazy idea unless it's less painful to >> implement than I'm thinking. Is there a use case for a mixed device system >> where one petsc executable might be addressing both a HIP and CUDA device >> beyond some frankenstein test system somebody cooked up? In all my code I >> implicitly assume I have either have one host with one device or one host >> with zero devices. I guess you can support these weird scenarios, but why? >> Life is hard enough supporting one device compiler with one host compiler. >> >> Many thanks Junchao -- with combinations of SetPreallocation I was able >> to grab allocated pointers out of petsc. Now I have all the jacobian >> construction on device with no copies. >> > Hi, Mark, could you say a few words about how you assemble matrices on > GPUs? We ported MatSetValues like routines to GPUs but did not continue > this approach since we have to resolve data races between GPU threads. > > >> >> On Fri, Jan 6, 2023 at 12:27 AM Barry Smith wrote: >> >>> >>> So Jed's "everyone" now consists of "no one" and Jed can stop >>> complaining that "everyone" thinks it is a bad idea. >>> >>> >>> >>> On Jan 5, 2023, at 11:50 PM, Junchao Zhang >>> wrote: >>> >>> >>> >>> >>> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith wrote: >>> >>>> >>>> >>>> > On Jan 5, 2023, at 3:42 PM, Jed Brown wrote: >>>> > >>>> > Mark Adams writes: >>>> > >>>> >> Support of HIP and CUDA hardware together would be crazy, >>>> > >>>> > I don't think it's remotely crazy. libCEED supports both together and >>>> it's very convenient when testing on a development machine that has one of >>>> each brand GPU and simplifies binary distribution for us and every package >>>> that uses us. Every day I wish PETSc could build with both simultaneously, >>>> but everyone tells me it's silly. >>>> >>>> Not everyone at all; just a subset of everyone. Junchao is really the >>>> hold-out :-) >>>> >>> I am not, instead I think we should try (I fully agree it can ease >>> binary distribution). But satish needs to install such a machine first :) >>> There are issues out of our control if we want to mix GPUs in >>> execution. For example, how to do VexAXPY on a cuda vector and a hip >>> vector? Shall we do it on the host? Also, there are no gpu-aware MPI >>> implementations supporting messages between cuda memory and hip memory. >>> >>>> >>>> I just don't care about "binary packages" :-); I think they are an >>>> archaic and bad way of thinking about code distribution (but yes the >>>> alternatives need lots of work to make them flawless, but I think that is >>>> where the work should go in the packaging world.) >>>> >>>> I go further and think one should be able to automatically use a >>>> CUDA vector on a HIP device as well, it is not hard in theory but requires >>>> thinking about how we handle classes and subclasses a little to make it >>>> straightforward; or perhaps Jacob has fixed that also? >>> >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Jan 7 10:16:46 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 7 Jan 2023 10:16:46 -0600 (CST) Subject: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 In-Reply-To: <1A23D02E-6F85-465D-B02E-24AA72252206@gmail.com> References: <6209E74F-0437-4719-B382-49B269AE2FE6@gmail.com> <5A5A77E9-2AF3-4286-A518-9E45A548C2B8@gmail.com> <11AEF1F7-3C63-48D8-A7C6-7CA449668575@joliv.et> <82e62a5e-2a30-35be-cdc9-be4fbcff3489@mcs.anl.gov> <1A23D02E-6F85-465D-B02E-24AA72252206@gmail.com> Message-ID: <829f1180-695f-cb24-8f46-828099c7ba47@mcs.anl.gov> Glad it worked! Thanks for the update! Satish On Fri, 6 Jan 2023, Danyang Su wrote: > Hi All, > > Problem is resolved by Homebrew Gfortran. I use GNU Fortran (GCC) 8.2.0 before. Conda does not cause the problem. > > Thanks, > > Danyang > > ?On 2023-01-06, 2:24 PM, "Satish Balay" > wrote: > > > Likely your installed gfortran is incompatible with hdf5 > > > >>>> > Executing: gfortran --version > stdout: > GNU Fortran (GCC) 8.2.0 > <<<< > > > We generally use brew gfortran - and that works with hdf5 aswell > > > balay at ypro ~ % gfortran --version > GNU Fortran (Homebrew GCC 11.2.0_1) 11.2.0 > > > Satish > > > On Fri, 6 Jan 2023, Danyang Su wrote: > > > > Hi Pierre, > > > > > > > > I have tried to exclude Conda related environment variables but it does not work. Instead, if I include ?--download-hdf5=yes? but exclude ?--with-hdf5-fortran-bindings? in the configuration, PETSc can be configured and installed without problem, even with Conda related environment activated. However, since my code requires fortran interface to HDF5, I do need ?--with-hdf5-fortran-bindings?, otherwise, my code cannot be compiled. > > > > > > > > Any other suggestions? > > > > > > > > Thanks, > > > > > > > > Danyang > > > > > > > > From: Pierre Jolivet > > > Date: Friday, January 6, 2023 at 7:59 AM > > To: Danyang Su > > > Cc: > > > Subject: Re: [petsc-users] Error running configure on HDF5 in PETSc-3.18.3 > > > > > > > > > > > > > > > > On 6 Jan 2023, at 4:49 PM, Danyang Su > wrote: > > > > > > > > Hi All, > > > > > > > > I get ?Error running configure on HDF5? in PETSc-3.18.3 on MacOS, but no problem on Ubuntu. Attached is the configuration log file. > > > > > > > > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-debugging=0 --download-cmake --with-hdf5-fortran-bindings > > > > > > > > Any idea on this? > > > > > > > > Could you try to reconfigure in a shell without conda being activated? > > > > You have PATH=/Users/danyangsu/Soft/Anaconda3/bin:/Users/danyangsu/Soft/Anaconda3/condabin:[?] which typically results in a broken configuration. > > > > > > > > Thanks, > > > > Pierre > > > > > > > > Thanks, > > > > > > > > Danyang > > > > > > > > > > > > > > From junchao.zhang at gmail.com Sat Jan 7 10:39:23 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 7 Jan 2023 10:39:23 -0600 Subject: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse In-Reply-To: References: <87eds8hfxi.fsf@jedbrown.org> <141D5FC1-1BF9-4809-B67B-0726759E273A@petsc.dev> <7130D391-34D2-4459-99D9-E09F0D20A987@petsc.dev> Message-ID: I see. Thanks a lot. --Junchao Zhang On Sat, Jan 7, 2023 at 6:15 AM Mark Lohry wrote: > I've worked on a few different codes doing matrix assembly on GPU > independently of petsc. In all instances to plug into petsc all I need are > the device CSR pointers and some guarantee they don't move around (my first > try without setpreallocation on CPU I saw the value array pointer move > after the first solve). It would also be nice to have a guarantee there > aren't any unnecessary copies since memory constraints are always a concern. > > Here I call > MatCreateSeqAIJCUSPARSE > MatSeqAIJSetPreallocationCSR (filled using a preexisting CSR on host using > the correct index arrays and zeros for values) > MatSeqAIJGetCSRAndMemType (grab the allocated device CSR pointers and use > those directly) > > Then in the Jacobian evaluation routine I fill that CSR directly with no > calls to MatSetValues, just > > MatAssemblyBegin(J,MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(J,MAT_FINAL_ASSEMBLY); > > after to put it in the correct state. > > In this code to fill the CSR coefficients, each GPU thread gets one row > and fills it. No race conditions to contend with. Technically I'm > duplicating some computations (a given dof could fill its own row and > column) but this is much faster than the linear solver anyway. > > Other mesh based codes did GPU assembly using either coloring or mutexes, > but still just need the CSR value array to fill. > > > On Fri, Jan 6, 2023, 9:44 PM Junchao Zhang > wrote: > >> >> >> >> On Fri, Jan 6, 2023 at 7:35 PM Mark Lohry wrote: >> >>> Well, I think it's a moderately crazy idea unless it's less painful to >>> implement than I'm thinking. Is there a use case for a mixed device system >>> where one petsc executable might be addressing both a HIP and CUDA device >>> beyond some frankenstein test system somebody cooked up? In all my code I >>> implicitly assume I have either have one host with one device or one host >>> with zero devices. I guess you can support these weird scenarios, but why? >>> Life is hard enough supporting one device compiler with one host compiler. >>> >>> Many thanks Junchao -- with combinations of SetPreallocation I was able >>> to grab allocated pointers out of petsc. Now I have all the jacobian >>> construction on device with no copies. >>> >> Hi, Mark, could you say a few words about how you assemble matrices on >> GPUs? We ported MatSetValues like routines to GPUs but did not continue >> this approach since we have to resolve data races between GPU threads. >> >> >>> >>> On Fri, Jan 6, 2023 at 12:27 AM Barry Smith wrote: >>> >>>> >>>> So Jed's "everyone" now consists of "no one" and Jed can stop >>>> complaining that "everyone" thinks it is a bad idea. >>>> >>>> >>>> >>>> On Jan 5, 2023, at 11:50 PM, Junchao Zhang >>>> wrote: >>>> >>>> >>>> >>>> >>>> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith wrote: >>>> >>>>> >>>>> >>>>> > On Jan 5, 2023, at 3:42 PM, Jed Brown wrote: >>>>> > >>>>> > Mark Adams writes: >>>>> > >>>>> >> Support of HIP and CUDA hardware together would be crazy, >>>>> > >>>>> > I don't think it's remotely crazy. libCEED supports both together >>>>> and it's very convenient when testing on a development machine that has one >>>>> of each brand GPU and simplifies binary distribution for us and every >>>>> package that uses us. Every day I wish PETSc could build with both >>>>> simultaneously, but everyone tells me it's silly. >>>>> >>>>> Not everyone at all; just a subset of everyone. Junchao is really >>>>> the hold-out :-) >>>>> >>>> I am not, instead I think we should try (I fully agree it can ease >>>> binary distribution). But satish needs to install such a machine first :) >>>> There are issues out of our control if we want to mix GPUs in >>>> execution. For example, how to do VexAXPY on a cuda vector and a hip >>>> vector? Shall we do it on the host? Also, there are no gpu-aware MPI >>>> implementations supporting messages between cuda memory and hip memory. >>>> >>>>> >>>>> I just don't care about "binary packages" :-); I think they are an >>>>> archaic and bad way of thinking about code distribution (but yes the >>>>> alternatives need lots of work to make them flawless, but I think that is >>>>> where the work should go in the packaging world.) >>>>> >>>>> I go further and think one should be able to automatically use a >>>>> CUDA vector on a HIP device as well, it is not hard in theory but requires >>>>> thinking about how we handle classes and subclasses a little to make it >>>>> straightforward; or perhaps Jacob has fixed that also? >>>> >>>> >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Mon Jan 9 22:41:09 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Mon, 9 Jan 2023 23:41:09 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi Junchao Thanks again for your help in November. I've been using the your merge request branch quite heavily. Would it be possible to add a petscsfcreatesectionsf interface as well? I'm trying to write it myself using your commits as a guide but I have been struggling with handling the section parameter properly. Sincerely Nicholas On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang wrote: > > > > On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi >> >> Thanks, this is awesome. Thanks for the very prompt fix. Just one >> question: will the array outputs on the fortran side copies (and need to be >> deallocated) or direct access to the dmplex? >> > Direct access to internal data; no need to deallocate > > >> >> Sincerely >> Nicholas >> >> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang >> wrote: >> >>> Hi, Nicholas, >>> See this MR, https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>> It is in testing, but you can try branch >>> jczhang/add-petscsf-fortran to see if it works for you. >>> >>> Thanks. >>> --Junchao Zhang >>> >>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Junchao >>>> >>>> Thanks. I was wondering if there is any update on this. I may write a >>>> small interface for those two routines myself in the interim but I'd >>>> appreciate any insight you have. >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang >>>> wrote: >>>> >>>>> Hi, Nicholas, >>>>> I will have a look and get back to you. >>>>> Thanks. >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Petsc Users >>>>>> >>>>>> I'm in the process of adding some Petsc for mesh management into an >>>>>> existing Fortran Solver. It has been relatively straightforward so far but >>>>>> I am running into an issue with using PetscSF routines. Some like the >>>>>> PetscSFGetGraph work no problem but a few of my routines require the use of >>>>>> PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in >>>>>> the fortran interface and I just get a linking error. I also don't seem to >>>>>> see a PetscSF file in the finclude. Any clarification or assistance would >>>>>> be appreciated. >>>>>> >>>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Tue Jan 10 09:30:56 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Tue, 10 Jan 2023 15:30:56 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros Message-ID: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 10 10:04:28 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Jan 2023 11:04:28 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: Message-ID: https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Tue Jan 10 10:25:03 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Tue, 10 Jan 2023 16:25:03 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: Message-ID: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Tue Jan 10 11:09:48 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Tue, 10 Jan 2023 12:09:48 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi Junchao I think I'm almost there, but I could use some insight into how to use the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset parameter input so if another function comes up, I can add it myself without wasting your time. I am very grateful for your help and time. Sincerely Nicholas On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang wrote: > Hi, Nicholas, > I am not a fortran guy, but I will try to add petscsfcreatesectionsf. > > Thanks. > --Junchao Zhang > > > On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> I think it should be something like this, but I'm not very fluent in >> Fortran C interop syntax. Any advice would be appreciated. Thanks >> >> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >> { >> >> int * remoteOffsets; >> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >> leafSection,*sectionSF);if (*ierr) return; >> >> } >> >> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Junchao >>> >>> Thanks again for your help in November. I've been using the your merge >>> request branch quite heavily. Would it be possible to add a >>> petscsfcreatesectionsf interface as well? >>> I'm trying to write it myself using your commits as a guide but I have >>> been struggling with handling the section parameter properly. >>> >>> Sincerely >>> Nicholas >>> >>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang >>> wrote: >>> >>>> >>>> >>>> >>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi >>>>> >>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>> question: will the array outputs on the fortran side copies (and need to be >>>>> deallocated) or direct access to the dmplex? >>>>> >>>> Direct access to internal data; no need to deallocate >>>> >>>> >>>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> Hi, Nicholas, >>>>>> See this MR, https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>> It is in testing, but you can try branch >>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>> >>>>>> Thanks. >>>>>> --Junchao Zhang >>>>>> >>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Junchao >>>>>>> >>>>>>> Thanks. I was wondering if there is any update on this. I may write >>>>>>> a small interface for those two routines myself in the interim but I'd >>>>>>> appreciate any insight you have. >>>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> Hi, Nicholas, >>>>>>>> I will have a look and get back to you. >>>>>>>> Thanks. >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi Petsc Users >>>>>>>>> >>>>>>>>> I'm in the process of adding some Petsc for mesh management into >>>>>>>>> an existing Fortran Solver. It has been relatively straightforward so far >>>>>>>>> but I am running into an issue with using PetscSF routines. Some like the >>>>>>>>> PetscSFGetGraph work no problem but a few of my routines require the use of >>>>>>>>> PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in >>>>>>>>> the fortran interface and I just get a linking error. I also don't seem to >>>>>>>>> see a PetscSF file in the finclude. Any clarification or assistance would >>>>>>>>> be appreciated. >>>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Tue Jan 10 11:32:12 2023 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 10 Jan 2023 12:32:12 -0500 Subject: [petsc-users] GPU implementation of serial smoothers Message-ID: I'm running GAMG with CUDA, and I'm wondering how the nominally serial smoother algorithms are implemented on GPU? Specifically SOR/GS and ILU(0) -- in e.g. AMGx these are applied by first creating a coloring, and the smoother passes are done color by color. Is this how it's done in petsc AMG? Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal ILU. Is there an equivalent in petsc? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Jan 10 11:37:48 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 10 Jan 2023 11:37:48 -0600 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi, Nicholas, Could you make a merge request to PETSc and then our Fortran experts can comment on your MR? Thanks. --Junchao Zhang On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Junchao > > I think I'm almost there, but I could use some insight into how to use the > PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset parameter > input so if another function comes up, I can add it myself without wasting > your time. > I am very grateful for your help and time. > > Sincerely > Nicholas > > On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang > wrote: > >> Hi, Nicholas, >> I am not a fortran guy, but I will try to add petscsfcreatesectionsf. >> >> Thanks. >> --Junchao Zhang >> >> >> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> I think it should be something like this, but I'm not very fluent in >>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>> >>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>> { >>> >>> int * remoteOffsets; >>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>> leafSection,*sectionSF);if (*ierr) return; >>> >>> } >>> >>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Junchao >>>> >>>> Thanks again for your help in November. I've been using the your merge >>>> request branch quite heavily. Would it be possible to add a >>>> petscsfcreatesectionsf interface as well? >>>> I'm trying to write it myself using your commits as a guide but I have >>>> been struggling with handling the section parameter properly. >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>> deallocated) or direct access to the dmplex? >>>>>> >>>>> Direct access to internal data; no need to deallocate >>>>> >>>>> >>>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> Hi, Nicholas, >>>>>>> See this MR, https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>> It is in testing, but you can try branch >>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>> >>>>>>> Thanks. >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Hi Junchao >>>>>>>> >>>>>>>> Thanks. I was wondering if there is any update on this. I may write >>>>>>>> a small interface for those two routines myself in the interim but I'd >>>>>>>> appreciate any insight you have. >>>>>>>> >>>>>>>> Sincerely >>>>>>>> Nicholas >>>>>>>> >>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, Nicholas, >>>>>>>>> I will have a look and get back to you. >>>>>>>>> Thanks. >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi Petsc Users >>>>>>>>>> >>>>>>>>>> I'm in the process of adding some Petsc for mesh management into >>>>>>>>>> an existing Fortran Solver. It has been relatively straightforward so far >>>>>>>>>> but I am running into an issue with using PetscSF routines. Some like the >>>>>>>>>> PetscSFGetGraph work no problem but a few of my routines require the use of >>>>>>>>>> PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in >>>>>>>>>> the fortran interface and I just get a linking error. I also don't seem to >>>>>>>>>> see a PetscSF file in the finclude. Any clarification or assistance would >>>>>>>>>> be appreciated. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sincerely >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Tue Jan 10 11:44:36 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Tue, 10 Jan 2023 12:44:36 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Er to be honest I still can't get my stub to compile properly, and I don't know how to go about making a merge request. But here is what I am attempting right now. Let me know how best to proceed Its not exactly clear to me how to setup up the remote offset properly. in src/vec/is/sf/interface/ftn-custom/zsf.c PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) { int * remoteOffsets; *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; } This is the sticking point. Sincerely Nicholas On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang wrote: > Hi, Nicholas, > Could you make a merge request to PETSc and then our Fortran experts can > comment on your MR? > Thanks. > > --Junchao Zhang > > > On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Junchao >> >> I think I'm almost there, but I could use some insight into how to use >> the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >> parameter input so if another function comes up, I can add it myself >> without wasting your time. >> I am very grateful for your help and time. >> >> Sincerely >> Nicholas >> >> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang >> wrote: >> >>> Hi, Nicholas, >>> I am not a fortran guy, but I will try to add petscsfcreatesectionsf. >>> >>> Thanks. >>> --Junchao Zhang >>> >>> >>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> I think it should be something like this, but I'm not very fluent in >>>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>>> >>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>> { >>>> >>>> int * remoteOffsets; >>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>> leafSection,*sectionSF);if (*ierr) return; >>>> >>>> } >>>> >>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Junchao >>>>> >>>>> Thanks again for your help in November. I've been using the your merge >>>>> request branch quite heavily. Would it be possible to add a >>>>> petscsfcreatesectionsf interface as well? >>>>> I'm trying to write it myself using your commits as a guide but I have >>>>> been struggling with handling the section parameter properly. >>>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>>> deallocated) or direct access to the dmplex? >>>>>>> >>>>>> Direct access to internal data; no need to deallocate >>>>>> >>>>>> >>>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> Hi, Nicholas, >>>>>>>> See this MR, https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>> It is in testing, but you can try branch >>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi Junchao >>>>>>>>> >>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>> I'd appreciate any insight you have. >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, Nicholas, >>>>>>>>>> I will have a look and get back to you. >>>>>>>>>> Thanks. >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Petsc Users >>>>>>>>>>> >>>>>>>>>>> I'm in the process of adding some Petsc for mesh management into >>>>>>>>>>> an existing Fortran Solver. It has been relatively straightforward so far >>>>>>>>>>> but I am running into an issue with using PetscSF routines. Some like the >>>>>>>>>>> PetscSFGetGraph work no problem but a few of my routines require the use of >>>>>>>>>>> PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in >>>>>>>>>>> the fortran interface and I just get a linking error. I also don't seem to >>>>>>>>>>> see a PetscSF file in the finclude. Any clarification or assistance would >>>>>>>>>>> be appreciated. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sincerely >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jan 10 11:56:28 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 10 Jan 2023 10:56:28 -0700 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: References: Message-ID: <87eds2nuib.fsf@jedbrown.org> Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the node size is not uniform). The are good choices for scale-resolving CFD on GPUs. Mark Lohry writes: > I'm running GAMG with CUDA, and I'm wondering how the nominally serial > smoother algorithms are implemented on GPU? Specifically SOR/GS and ILU(0) > -- in e.g. AMGx these are applied by first creating a coloring, and the > smoother passes are done color by color. Is this how it's done in petsc AMG? > > Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal ILU. > Is there an equivalent in petsc? > > Thanks, > Mark From bsmith at petsc.dev Tue Jan 10 12:50:25 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Jan 2023 13:50:25 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: Message-ID: Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 10 12:52:06 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Jan 2023 13:52:06 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: <87eds2nuib.fsf@jedbrown.org> References: <87eds2nuib.fsf@jedbrown.org> Message-ID: We don't have colored smoothers currently in PETSc. > On Jan 10, 2023, at 12:56 PM, Jed Brown wrote: > > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the node size is not uniform). The are good choices for scale-resolving CFD on GPUs. > > Mark Lohry writes: > >> I'm running GAMG with CUDA, and I'm wondering how the nominally serial >> smoother algorithms are implemented on GPU? Specifically SOR/GS and ILU(0) >> -- in e.g. AMGx these are applied by first creating a coloring, and the >> smoother passes are done color by color. Is this how it's done in petsc AMG? >> >> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal ILU. >> Is there an equivalent in petsc? >> >> Thanks, >> Mark From mlohry at gmail.com Tue Jan 10 13:19:42 2023 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 10 Jan 2023 14:19:42 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: References: <87eds2nuib.fsf@jedbrown.org> Message-ID: > > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if > the node size is not uniform). The are good choices for scale-resolving CFD > on GPUs. > I was hoping you'd know :) pbjacobi is underperforming ilu by a pretty wide margin on some of the systems i'm looking at. We don't have colored smoothers currently in PETSc. > So what happens under the hood when I run -mg_levels_pc_type sor on GPU? Are you actually decomposing the matrix into lower and computing updates with matrix multiplications? Or is it just the standard serial algorithm with thread safety ignored? On Tue, Jan 10, 2023 at 1:52 PM Barry Smith wrote: > > We don't have colored smoothers currently in PETSc. > > > On Jan 10, 2023, at 12:56 PM, Jed Brown wrote: > > > > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi > if the node size is not uniform). The are good choices for scale-resolving > CFD on GPUs. > > > > Mark Lohry writes: > > > >> I'm running GAMG with CUDA, and I'm wondering how the nominally serial > >> smoother algorithms are implemented on GPU? Specifically SOR/GS and > ILU(0) > >> -- in e.g. AMGx these are applied by first creating a coloring, and the > >> smoother passes are done color by color. Is this how it's done in petsc > AMG? > >> > >> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal > ILU. > >> Is there an equivalent in petsc? > >> > >> Thanks, > >> Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 10 13:23:21 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Jan 2023 14:23:21 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: References: <87eds2nuib.fsf@jedbrown.org> Message-ID: <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> > On Jan 10, 2023, at 2:19 PM, Mark Lohry wrote: > >> Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the node size is not uniform). The are good choices for scale-resolving CFD on GPUs. > > I was hoping you'd know :) pbjacobi is underperforming ilu by a pretty wide margin on some of the systems i'm looking at. > >> We don't have colored smoothers currently in PETSc. > > So what happens under the hood when I run -mg_levels_pc_type sor on GPU? Are you actually decomposing the matrix into lower and computing updates with matrix multiplications? Or is it just the standard serial algorithm with thread safety ignored? It is running the regular SOR on the CPU and needs to copy up the vector and copy down the result. > > On Tue, Jan 10, 2023 at 1:52 PM Barry Smith > wrote: >> >> We don't have colored smoothers currently in PETSc. >> >> > On Jan 10, 2023, at 12:56 PM, Jed Brown > wrote: >> > >> > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the node size is not uniform). The are good choices for scale-resolving CFD on GPUs. >> > >> > Mark Lohry > writes: >> > >> >> I'm running GAMG with CUDA, and I'm wondering how the nominally serial >> >> smoother algorithms are implemented on GPU? Specifically SOR/GS and ILU(0) >> >> -- in e.g. AMGx these are applied by first creating a coloring, and the >> >> smoother passes are done color by color. Is this how it's done in petsc AMG? >> >> >> >> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal ILU. >> >> Is there an equivalent in petsc? >> >> >> >> Thanks, >> >> Mark >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Tue Jan 10 13:36:39 2023 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 10 Jan 2023 14:36:39 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> Message-ID: Well that's suboptimal. What are my options for 100% GPU solves with no host transfers? On Tue, Jan 10, 2023, 2:23 PM Barry Smith wrote: > > > On Jan 10, 2023, at 2:19 PM, Mark Lohry wrote: > > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if >> the node size is not uniform). The are good choices for scale-resolving CFD >> on GPUs. >> > > I was hoping you'd know :) pbjacobi is underperforming ilu by a pretty > wide margin on some of the systems i'm looking at. > > We don't have colored smoothers currently in PETSc. >> > > So what happens under the hood when I run -mg_levels_pc_type sor on GPU? > Are you actually decomposing the matrix into lower and computing updates > with matrix multiplications? Or is it just the standard serial algorithm > with thread safety ignored? > > > It is running the regular SOR on the CPU and needs to copy up the vector > and copy down the result. > > > On Tue, Jan 10, 2023 at 1:52 PM Barry Smith wrote: > >> >> We don't have colored smoothers currently in PETSc. >> >> > On Jan 10, 2023, at 12:56 PM, Jed Brown wrote: >> > >> > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi >> if the node size is not uniform). The are good choices for scale-resolving >> CFD on GPUs. >> > >> > Mark Lohry writes: >> > >> >> I'm running GAMG with CUDA, and I'm wondering how the nominally serial >> >> smoother algorithms are implemented on GPU? Specifically SOR/GS and >> ILU(0) >> >> -- in e.g. AMGx these are applied by first creating a coloring, and the >> >> smoother passes are done color by color. Is this how it's done in >> petsc AMG? >> >> >> >> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal >> ILU. >> >> Is there an equivalent in petsc? >> >> >> >> Thanks, >> >> Mark >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jan 10 13:47:51 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 10 Jan 2023 12:47:51 -0700 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> Message-ID: <87v8lemas8.fsf@jedbrown.org> The joy of GPUs. You can use sparse triangular kernels like ILU (provided by cuBLAS), but they are so mindbogglingly slow that you'll go back to the drawing board and try to use a multigrid method of some sort with polynomial/point-block smoothing. BTW, on unstructured grids, coloring requires a lot of colors and thus many times more bandwidth (due to multiple passes) than the operator itself. Mark Lohry writes: > Well that's suboptimal. What are my options for 100% GPU solves with no > host transfers? > > On Tue, Jan 10, 2023, 2:23 PM Barry Smith wrote: > >> >> >> On Jan 10, 2023, at 2:19 PM, Mark Lohry wrote: >> >> Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if >>> the node size is not uniform). The are good choices for scale-resolving CFD >>> on GPUs. >>> >> >> I was hoping you'd know :) pbjacobi is underperforming ilu by a pretty >> wide margin on some of the systems i'm looking at. >> >> We don't have colored smoothers currently in PETSc. >>> >> >> So what happens under the hood when I run -mg_levels_pc_type sor on GPU? >> Are you actually decomposing the matrix into lower and computing updates >> with matrix multiplications? Or is it just the standard serial algorithm >> with thread safety ignored? >> >> >> It is running the regular SOR on the CPU and needs to copy up the vector >> and copy down the result. >> >> >> On Tue, Jan 10, 2023 at 1:52 PM Barry Smith wrote: >> >>> >>> We don't have colored smoothers currently in PETSc. >>> >>> > On Jan 10, 2023, at 12:56 PM, Jed Brown wrote: >>> > >>> > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi >>> if the node size is not uniform). The are good choices for scale-resolving >>> CFD on GPUs. >>> > >>> > Mark Lohry writes: >>> > >>> >> I'm running GAMG with CUDA, and I'm wondering how the nominally serial >>> >> smoother algorithms are implemented on GPU? Specifically SOR/GS and >>> ILU(0) >>> >> -- in e.g. AMGx these are applied by first creating a coloring, and the >>> >> smoother passes are done color by color. Is this how it's done in >>> petsc AMG? >>> >> >>> >> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal >>> ILU. >>> >> Is there an equivalent in petsc? >>> >> >>> >> Thanks, >>> >> Mark >>> >>> >> From mlohry at gmail.com Tue Jan 10 13:54:23 2023 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 10 Jan 2023 14:54:23 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: <87v8lemas8.fsf@jedbrown.org> References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> Message-ID: > > BTW, on unstructured grids, coloring requires a lot of colors and thus > many times more bandwidth (due to multiple passes) than the operator itself. I've noticed -- in AMGx the multicolor GS was generally dramatically slower than jacobi because of lots of colors with few elements. You can use sparse triangular kernels like ILU (provided by cuBLAS), but > they are so mindbogglingly slow that you'll go back to the drawing board > and try to use a multigrid method of some sort with polynomial/point-block > smoothing. > I definitely need multigrid. I was under the impression that GAMG was relatively cuda-complete, is that not the case? What functionality works fully on GPU and what doesn't, without any host transfers (aside from what's needed for MPI)? If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi -mg_levels_ksp_type richardson is that fully on device, but -mg_levels_pc_type ilu or -mg_levels_pc_type sor require transfers? On Tue, Jan 10, 2023 at 2:47 PM Jed Brown wrote: > The joy of GPUs. You can use sparse triangular kernels like ILU (provided > by cuBLAS), but they are so mindbogglingly slow that you'll go back to the > drawing board and try to use a multigrid method of some sort with > polynomial/point-block smoothing. > > BTW, on unstructured grids, coloring requires a lot of colors and thus > many times more bandwidth (due to multiple passes) than the operator itself. > > Mark Lohry writes: > > > Well that's suboptimal. What are my options for 100% GPU solves with no > > host transfers? > > > > On Tue, Jan 10, 2023, 2:23 PM Barry Smith wrote: > > > >> > >> > >> On Jan 10, 2023, at 2:19 PM, Mark Lohry wrote: > >> > >> Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi > if > >>> the node size is not uniform). The are good choices for > scale-resolving CFD > >>> on GPUs. > >>> > >> > >> I was hoping you'd know :) pbjacobi is underperforming ilu by a pretty > >> wide margin on some of the systems i'm looking at. > >> > >> We don't have colored smoothers currently in PETSc. > >>> > >> > >> So what happens under the hood when I run -mg_levels_pc_type sor on GPU? > >> Are you actually decomposing the matrix into lower and computing updates > >> with matrix multiplications? Or is it just the standard serial algorithm > >> with thread safety ignored? > >> > >> > >> It is running the regular SOR on the CPU and needs to copy up the > vector > >> and copy down the result. > >> > >> > >> On Tue, Jan 10, 2023 at 1:52 PM Barry Smith wrote: > >> > >>> > >>> We don't have colored smoothers currently in PETSc. > >>> > >>> > On Jan 10, 2023, at 12:56 PM, Jed Brown wrote: > >>> > > >>> > Is DILU a point-block method? We have -pc_type pbjacobi (and > vpbjacobi > >>> if the node size is not uniform). The are good choices for > scale-resolving > >>> CFD on GPUs. > >>> > > >>> > Mark Lohry writes: > >>> > > >>> >> I'm running GAMG with CUDA, and I'm wondering how the nominally > serial > >>> >> smoother algorithms are implemented on GPU? Specifically SOR/GS and > >>> ILU(0) > >>> >> -- in e.g. AMGx these are applied by first creating a coloring, and > the > >>> >> smoother passes are done color by color. Is this how it's done in > >>> petsc AMG? > >>> >> > >>> >> Tangential, AMGx and OpenFOAM offer something called "DILU", > diagonal > >>> ILU. > >>> >> Is there an equivalent in petsc? > >>> >> > >>> >> Thanks, > >>> >> Mark > >>> > >>> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jan 10 14:03:38 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 10 Jan 2023 13:03:38 -0700 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> Message-ID: <87mt6qma1x.fsf@jedbrown.org> Mark Lohry writes: > I definitely need multigrid. I was under the impression that GAMG was > relatively cuda-complete, is that not the case? What functionality works > fully on GPU and what doesn't, without any host transfers (aside from > what's needed for MPI)? > > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi -mg_levels_ksp_type > richardson is that fully on device, but -mg_levels_pc_type ilu or > -mg_levels_pc_type sor require transfers? You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like 20x slower than an operator apply). One can use Krylov smoothers, though that's more synchronization. Automatic construction of operator-dependent multistage smoothers for linear multigrid (because Chebyshev only works for problems that have eigenvalues near the real axis) is something I've wanted to develop for at least a decade, but time is always short. I might put some effort into p-MG with such smoothers this year as we add DDES to our scale-resolving compressible solver. From mlohry at gmail.com Tue Jan 10 14:31:41 2023 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 10 Jan 2023 15:31:41 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: <87mt6qma1x.fsf@jedbrown.org> References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> <87mt6qma1x.fsf@jedbrown.org> Message-ID: So what are people using for GAMG configs on GPU? I was hoping petsc today would be performance competitive with AMGx but it sounds like that's not the case? On Tue, Jan 10, 2023 at 3:03 PM Jed Brown wrote: > Mark Lohry writes: > > > I definitely need multigrid. I was under the impression that GAMG was > > relatively cuda-complete, is that not the case? What functionality works > > fully on GPU and what doesn't, without any host transfers (aside from > > what's needed for MPI)? > > > > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi > -mg_levels_ksp_type > > richardson is that fully on device, but -mg_levels_pc_type ilu or > > -mg_levels_pc_type sor require transfers? > > You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like 20x > slower than an operator apply). One can use Krylov smoothers, though that's > more synchronization. Automatic construction of operator-dependent > multistage smoothers for linear multigrid (because Chebyshev only works for > problems that have eigenvalues near the real axis) is something I've wanted > to develop for at least a decade, but time is always short. I might put > some effort into p-MG with such smoothers this year as we add DDES to our > scale-resolving compressible solver. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 10 14:44:47 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Jan 2023 15:44:47 -0500 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> <87mt6qma1x.fsf@jedbrown.org> Message-ID: <8BFC6EE6-E57B-4236-A9CF-683162CC7ABF@petsc.dev> The default is some kind of Jacobi plus Chebyshev, for a certain class of problems, it is quite good. > On Jan 10, 2023, at 3:31 PM, Mark Lohry wrote: > > So what are people using for GAMG configs on GPU? I was hoping petsc today would be performance competitive with AMGx but it sounds like that's not the case? > > On Tue, Jan 10, 2023 at 3:03 PM Jed Brown > wrote: >> Mark Lohry > writes: >> >> > I definitely need multigrid. I was under the impression that GAMG was >> > relatively cuda-complete, is that not the case? What functionality works >> > fully on GPU and what doesn't, without any host transfers (aside from >> > what's needed for MPI)? >> > >> > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi -mg_levels_ksp_type >> > richardson is that fully on device, but -mg_levels_pc_type ilu or >> > -mg_levels_pc_type sor require transfers? >> >> You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like 20x slower than an operator apply). One can use Krylov smoothers, though that's more synchronization. Automatic construction of operator-dependent multistage smoothers for linear multigrid (because Chebyshev only works for problems that have eigenvalues near the real axis) is something I've wanted to develop for at least a decade, but time is always short. I might put some effort into p-MG with such smoothers this year as we add DDES to our scale-resolving compressible solver. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Jan 10 14:59:50 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 10 Jan 2023 23:59:50 +0300 Subject: [petsc-users] GPU implementation of serial smoothers In-Reply-To: <8BFC6EE6-E57B-4236-A9CF-683162CC7ABF@petsc.dev> References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> <87mt6qma1x.fsf@jedbrown.org> <8BFC6EE6-E57B-4236-A9CF-683162CC7ABF@petsc.dev> Message-ID: DILU in openfoam is our block Jacobi ilu subdomain solvers On Tue, Jan 10, 2023, 23:45 Barry Smith wrote: > > The default is some kind of Jacobi plus Chebyshev, for a certain class > of problems, it is quite good. > > > > On Jan 10, 2023, at 3:31 PM, Mark Lohry wrote: > > So what are people using for GAMG configs on GPU? I was hoping petsc today > would be performance competitive with AMGx but it sounds like that's not > the case? > > On Tue, Jan 10, 2023 at 3:03 PM Jed Brown wrote: > >> Mark Lohry writes: >> >> > I definitely need multigrid. I was under the impression that GAMG was >> > relatively cuda-complete, is that not the case? What functionality works >> > fully on GPU and what doesn't, without any host transfers (aside from >> > what's needed for MPI)? >> > >> > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi >> -mg_levels_ksp_type >> > richardson is that fully on device, but -mg_levels_pc_type ilu or >> > -mg_levels_pc_type sor require transfers? >> >> You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like >> 20x slower than an operator apply). One can use Krylov smoothers, though >> that's more synchronization. Automatic construction of operator-dependent >> multistage smoothers for linear multigrid (because Chebyshev only works for >> problems that have eigenvalues near the real axis) is something I've wanted >> to develop for at least a decade, but time is always short. I might put >> some effort into p-MG with such smoothers this year as we add DDES to our >> scale-resolving compressible solver. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangc20 at rpi.edu Tue Jan 10 15:03:58 2023 From: zhangc20 at rpi.edu (Zhang, Chonglin) Date: Tue, 10 Jan 2023 21:03:58 +0000 Subject: [petsc-users] [EXTERNAL] GPU implementation of serial smoothers In-Reply-To: References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> <87mt6qma1x.fsf@jedbrown.org> Message-ID: <93B671CD-8672-487E-B700-40AB9988E3B9@rpi.edu> I am using the following in my Poisson solver running on GPU, which were suggested by Barry and Mark (Dr. Mark Adams). -ksp_type cg -pc_type gamg -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi On Jan 10, 2023, at 3:31 PM, Mark Lohry wrote: So what are people using for GAMG configs on GPU? I was hoping petsc today would be performance competitive with AMGx but it sounds like that's not the case? On Tue, Jan 10, 2023 at 3:03 PM Jed Brown > wrote: Mark Lohry > writes: > I definitely need multigrid. I was under the impression that GAMG was > relatively cuda-complete, is that not the case? What functionality works > fully on GPU and what doesn't, without any host transfers (aside from > what's needed for MPI)? > > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi -mg_levels_ksp_type > richardson is that fully on device, but -mg_levels_pc_type ilu or > -mg_levels_pc_type sor require transfers? You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like 20x slower than an operator apply). One can use Krylov smoothers, though that's more synchronization. Automatic construction of operator-dependent multistage smoothers for linear multigrid (because Chebyshev only works for problems that have eigenvalues near the real axis) is something I've wanted to develop for at least a decade, but time is always short. I might put some effort into p-MG with such smoothers this year as we add DDES to our scale-resolving compressible solver. -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Jan 10 15:42:33 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 10 Jan 2023 15:42:33 -0600 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi, Nicholas, It seems we have implemented it, but with another name, PetscSFCreateSectionSFF90, see https://gitlab.com/petsc/petsc/-/merge_requests/5386 Try it to see if it works! --Junchao Zhang On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Er to be honest I still can't get my stub to compile properly, and I don't > know how to go about making a merge request. But here is what I am > attempting right now. Let me know how best to proceed > > > Its not exactly clear to me how to setup up the remote offset properly. > > in src/vec/is/sf/interface/ftn-custom/zsf.c > > PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection > *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, > PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) > { > > int * remoteOffsets; > *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) > &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; > *ierr = PetscSFCreateSectionSF(*sf,*rootSection, > &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; > > } > > This is the sticking point. > > Sincerely > Nicholas > > > On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang > wrote: > >> Hi, Nicholas, >> Could you make a merge request to PETSc and then our Fortran experts >> can comment on your MR? >> Thanks. >> >> --Junchao Zhang >> >> >> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Junchao >>> >>> I think I'm almost there, but I could use some insight into how to use >>> the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>> parameter input so if another function comes up, I can add it myself >>> without wasting your time. >>> I am very grateful for your help and time. >>> >>> Sincerely >>> Nicholas >>> >>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang >>> wrote: >>> >>>> Hi, Nicholas, >>>> I am not a fortran guy, but I will try to add petscsfcreatesectionsf. >>>> >>>> Thanks. >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> I think it should be something like this, but I'm not very fluent in >>>>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>> >>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>>>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>>> { >>>>> >>>>> int * remoteOffsets; >>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>> leafSection,*sectionSF);if (*ierr) return; >>>>> >>>>> } >>>>> >>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Junchao >>>>>> >>>>>> Thanks again for your help in November. I've been using the your >>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>> petscsfcreatesectionsf interface as well? >>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>> have been struggling with handling the section parameter properly. >>>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>>>> deallocated) or direct access to the dmplex? >>>>>>>> >>>>>>> Direct access to internal data; no need to deallocate >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Sincerely >>>>>>>> Nicholas >>>>>>>> >>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, Nicholas, >>>>>>>>> See this MR, >>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>> It is in testing, but you can try branch >>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi Junchao >>>>>>>>>> >>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>> >>>>>>>>>> Sincerely >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>> Thanks. >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>> >>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>> would be appreciated. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Sincerely >>>>>>>>>>>> Nicholas >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>> >>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>> University of Michigan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Tue Jan 10 15:58:36 2023 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 10 Jan 2023 16:58:36 -0500 Subject: [petsc-users] [EXTERNAL] GPU implementation of serial smoothers In-Reply-To: <93B671CD-8672-487E-B700-40AB9988E3B9@rpi.edu> References: <87eds2nuib.fsf@jedbrown.org> <42DC0F2B-577F-4D61-9F5E-E1E2E0E55E5B@petsc.dev> <87v8lemas8.fsf@jedbrown.org> <87mt6qma1x.fsf@jedbrown.org> <93B671CD-8672-487E-B700-40AB9988E3B9@rpi.edu> Message-ID: Thanks Stefano and Chonglin! DILU in openfoam is our block Jacobi ilu subdomain solvers > are you saying that -pc_type gang -mg_levels_pc_type -mg_levels_ksp_type richardson gives you something exactly equivalent to DILU? On Tue, Jan 10, 2023 at 4:04 PM Zhang, Chonglin wrote: > I am using the following in my Poisson solver running on GPU, which were > suggested by Barry and Mark (Dr. Mark Adams). > -ksp_type cg > -pc_type gamg > -mg_levels_ksp_type chebyshev > -mg_levels_pc_type jacobi > > > On Jan 10, 2023, at 3:31 PM, Mark Lohry wrote: > > So what are people using for GAMG configs on GPU? I was hoping petsc today > would be performance competitive with AMGx but it sounds like that's not > the case? > > On Tue, Jan 10, 2023 at 3:03 PM Jed Brown wrote: > >> Mark Lohry writes: >> >> > I definitely need multigrid. I was under the impression that GAMG was >> > relatively cuda-complete, is that not the case? What functionality works >> > fully on GPU and what doesn't, without any host transfers (aside from >> > what's needed for MPI)? >> > >> > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi >> -mg_levels_ksp_type >> > richardson is that fully on device, but -mg_levels_pc_type ilu or >> > -mg_levels_pc_type sor require transfers? >> >> You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like >> 20x slower than an operator apply). One can use Krylov smoothers, though >> that's more synchronization. Automatic construction of operator-dependent >> multistage smoothers for linear multigrid (because Chebyshev only works for >> problems that have eigenvalues near the real axis) is something I've wanted >> to develop for at least a decade, but time is always short. I might put >> some effort into p-MG with such smoothers this year as we add DDES to our >> scale-resolving compressible solver. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Wed Jan 11 00:58:02 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Wed, 11 Jan 2023 01:58:02 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi Junchao Apologies for not seeing that. Usually, the fortran90-specific functions have notes on the original C version, and I also can't see PetscSFCreateSectionSFF90 on the function list on the doc site. Thanks so much, and I saw your notes on the merge request. I don't suppose PetscSFReduceBegin and End are likewise hidden somewhere. I'm moving between distributions, and I can go forward with PetscSFBcastBegin, but I also need to go backward with Reduce. I feel like this is a one-to-one change from Bcast to Reduce, and I've added the relevant lines in src/vec/is/sf/interface/ftn-custom/zsf.c and src/vec/f90-mod/petscvec.h90 and it compiles fine, but I'm still getting a linking error for the Reduce routines. I need some input on what I'm missing here. I hope I didn't miss that this routine exists elsewhere. I've attached the two files, but it's not an ideal way to transmit changes. If I get some instructions on contributing, I can make a merge request for the changes if they are helpful. Thanks Nicholas On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang wrote: > Hi, Nicholas, > It seems we have implemented it, but with another name, > PetscSFCreateSectionSFF90, see > https://gitlab.com/petsc/petsc/-/merge_requests/5386 > Try it to see if it works! > > --Junchao Zhang > > > On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Er to be honest I still can't get my stub to compile properly, and I >> don't know how to go about making a merge request. But here is what I am >> attempting right now. Let me know how best to proceed >> >> >> Its not exactly clear to me how to setup up the remote offset properly. >> >> in src/vec/is/sf/interface/ftn-custom/zsf.c >> >> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >> { >> >> int * remoteOffsets; >> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >> >> } >> >> This is the sticking point. >> >> Sincerely >> Nicholas >> >> >> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang >> wrote: >> >>> Hi, Nicholas, >>> Could you make a merge request to PETSc and then our Fortran experts >>> can comment on your MR? >>> Thanks. >>> >>> --Junchao Zhang >>> >>> >>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Junchao >>>> >>>> I think I'm almost there, but I could use some insight into how to use >>>> the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>> parameter input so if another function comes up, I can add it myself >>>> without wasting your time. >>>> I am very grateful for your help and time. >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang >>>> wrote: >>>> >>>>> Hi, Nicholas, >>>>> I am not a fortran guy, but I will try to add >>>>> petscsfcreatesectionsf. >>>>> >>>>> Thanks. >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> I think it should be something like this, but I'm not very fluent in >>>>>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>> >>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>>>>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>>>> { >>>>>> >>>>>> int * remoteOffsets; >>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return >>>>>> ; >>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>>> leafSection,*sectionSF);if (*ierr) return; >>>>>> >>>>>> } >>>>>> >>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Junchao >>>>>>> >>>>>>> Thanks again for your help in November. I've been using the your >>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>> petscsfcreatesectionsf interface as well? >>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>> have been struggling with handling the section parameter properly. >>>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>>>>> deallocated) or direct access to the dmplex? >>>>>>>>> >>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, Nicholas, >>>>>>>>>> See this MR, >>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Junchao >>>>>>>>>>> >>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>> >>>>>>>>>>> Sincerely >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>> Thanks. >>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>> >>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely >>>>>>>>>>>>> Nicholas >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>> >>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #include #include #if defined(PETSC_HAVE_FORTRAN_CAPS) #define petscsfview_ PETSCSFVIEW #define petscsfgetgraph_ PETSCSFGETGRAPH #define petscsfbcastbegin_ PETSCSFBCASTBEGIN #define petscsfbcastend_ PETSCSFBCASTEND #define petscsfreducebegin_ PETSCSFREDUCEBEGIN #define petscsfreduceend_ PETSCSFREDUCEEND #define f90arraysfnodecreate_ F90ARRAYSFNODECREATE #define petscsfviewfromoptions_ PETSCSFVIEWFROMOPTIONS #define petscsfdestroy_ PETSCSFDESTROY #define petscsfsetgraph_ PETSCSFSETGRAPH #define petscsfgetleafranks_ PETSCSFGETLEAFRANKS #define petscsfgetrootranks_ PETSCSFGETROOTRANKS #elif !defined(PETSC_HAVE_FORTRAN_UNDERSCORE) #define petscsfgetgraph_ petscsfgetgraph #define petscsfview_ petscsfview #define petscsfbcastbegin_ petscsfbcastbegin #define petscsfbcastend_ petscsfbcastend #define petscsfreducebegin_ petscsfreducebegin #define petscsfreduceend_ petscsfreduceend #define f90arraysfnodecreate_ f90arraysfnodecreate #define petscsfviewfromoptions_ petscsfviewfromoptions #define petscsfdestroy_ petscsfdestroy #define petscsfsetgraph_ petscsfsetgraph #define petscsfgetleafranks_ petscsfgetleafranks #define petscsfgetrootranks_ petscsfgetrootranks #endif PETSC_EXTERN void f90arraysfnodecreate_(const PetscInt *,PetscInt *,void * PETSC_F90_2PTR_PROTO_NOVAR); PETSC_EXTERN void petscsfsetgraph_(PetscSF *sf,PetscInt *nroots,PetscInt *nleaves, PetscInt *ilocal,PetscCopyMode *localmode, PetscSFNode *iremote,PetscCopyMode *remotemode, int *ierr) { if (ilocal == PETSC_NULL_INTEGER_Fortran) ilocal = NULL; *ierr = PetscSFSetGraph(*sf,*nroots,*nleaves,ilocal,*localmode,iremote,*remotemode); } PETSC_EXTERN void petscsfview_(PetscSF *sf, PetscViewer *vin, PetscErrorCode *ierr) { PetscViewer v; PetscPatchDefaultViewers_Fortran(vin, v); *ierr = PetscSFView(*sf, v); } PETSC_EXTERN void petscsfgetgraph_(PetscSF *sf,PetscInt *nroots,PetscInt *nleaves, F90Array1d *ailocal, F90Array1d *airemote, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(pilocal) PETSC_F90_2PTR_PROTO(piremote)) { const PetscInt *ilocal; const PetscSFNode *iremote; PetscInt nl; *ierr = PetscSFGetGraph(*sf,nroots,nleaves,&ilocal,&iremote);if (*ierr) return; nl = *nleaves; if (!ilocal) nl = 0; *ierr = F90Array1dCreate((void*)ilocal,MPIU_INT,1,nl, ailocal PETSC_F90_2PTR_PARAM(pilocal)); /* this creates a memory leak */ f90arraysfnodecreate_((PetscInt*)iremote,nleaves, airemote PETSC_F90_2PTR_PARAM(piremote)); } PETSC_EXTERN void petscsfgetleafranks_(PetscSF *sf, PetscInt *niranks, F90Array1d *airanks, F90Array1d *aioffset, F90Array1d *airootloc, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(piranks) PETSC_F90_2PTR_PROTO(pioffset) PETSC_F90_2PTR_PROTO(pirootloc)) { const PetscMPIInt *iranks; const PetscInt *ioffset; const PetscInt *irootloc; *ierr = PetscSFGetLeafRanks(*sf, niranks, &iranks, &ioffset, &irootloc);if (*ierr) return; *ierr = F90Array1dCreate((void *)iranks, MPI_INT, 1, *niranks, airanks PETSC_F90_2PTR_PARAM(piranks));if (*ierr) return; *ierr = F90Array1dCreate((void*)ioffset, MPIU_INT, 1, *niranks+1, aioffset PETSC_F90_2PTR_PARAM(pioffset));if (*ierr) return; *ierr = F90Array1dCreate((void *)irootloc, MPIU_INT, 1, ioffset[*niranks], airootloc PETSC_F90_2PTR_PARAM(pirootloc));if (*ierr) return; } PETSC_EXTERN void petscsfgetrootranks_(PetscSF *sf, PetscInt *nranks, F90Array1d *aranks, F90Array1d *aroffset, F90Array1d *armine, F90Array1d *arremote, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(pranks) PETSC_F90_2PTR_PROTO(proffset) PETSC_F90_2PTR_PROTO(prmine) PETSC_F90_2PTR_PROTO(prremote)) { const PetscMPIInt *ranks; const PetscInt *roffset; const PetscInt *rmine; const PetscInt *rremote; *ierr = PetscSFGetRootRanks(*sf, nranks, &ranks, &roffset, &rmine, &rremote);if (*ierr) return; *ierr = F90Array1dCreate((void*)ranks, MPI_INT, 1, *nranks, aranks PETSC_F90_2PTR_PARAM(pranks));if (*ierr) return; *ierr = F90Array1dCreate((void*)roffset, MPIU_INT, 1, *nranks+1, aroffset PETSC_F90_2PTR_PARAM(proffset));if (*ierr) return; *ierr = F90Array1dCreate((void*)rmine, MPIU_INT, 1, roffset[*nranks], armine PETSC_F90_2PTR_PARAM(prmine));if (*ierr) return; *ierr = F90Array1dCreate((void*)rremote, MPIU_INT, 1, roffset[*nranks], arremote PETSC_F90_2PTR_PARAM(prremote));if (*ierr) return; } #if defined(PETSC_HAVE_F90_ASSUMED_TYPE_NOT_PTR) PETSC_EXTERN void petscsfbcastbegin_(PetscSF *sf, MPI_Fint *unit, const void *rptr, void *lptr, MPI_Fint *op, PetscErrorCode *ierr) { MPI_Datatype dtype; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = PetscSFBcastBegin(*sf, dtype, rptr, lptr, cop); } PETSC_EXTERN void petscsfbcastend_(PetscSF *sf, MPI_Fint *unit, const void *rptr, void *lptr, MPI_Fint *op, PetscErrorCode *ierr) { MPI_Datatype dtype; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = PetscSFBcastEnd(*sf, dtype, rptr, lptr, cop); } PETSC_EXTERN void petscsfreducebegin_(PetscSF *sf, MPI_Fint *unit, const void *lptr, void *rptr, MPI_Fint *op, PetscErrorCode *ierr) { MPI_Datatype dtype; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = PetscSFReduceBegin(*sf, dtype, lptr, rptr, cop); } PETSC_EXTERN void petscsfreduceend_(PetscSF *sf, MPI_Fint *unit, const void *lptr, void *rptr, MPI_Fint *op, PetscErrorCode *ierr) { MPI_Datatype dtype; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = PetscSFReduceEnd(*sf, dtype, lptr, rptr, cop); } #else PETSC_EXTERN void petscsfbcastbegin_(PetscSF *sf, MPI_Fint *unit,F90Array1d *rptr, F90Array1d *lptr, MPI_Fint *op, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(rptrd) PETSC_F90_2PTR_PROTO(lptrd)) { MPI_Datatype dtype; const void *rootdata; void *leafdata; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = F90Array1dAccess(rptr, dtype, (void**) &rootdata PETSC_F90_2PTR_PARAM(rptrd));if (*ierr) return; *ierr = F90Array1dAccess(lptr, dtype, (void**) &leafdata PETSC_F90_2PTR_PARAM(lptrd));if (*ierr) return; *ierr = PetscSFBcastBegin(*sf, dtype, rootdata, leafdata, cop); } PETSC_EXTERN void petscsfbcastend_(PetscSF *sf, MPI_Fint *unit,F90Array1d *rptr, F90Array1d *lptr, MPI_Fint *op, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(rptrd) PETSC_F90_2PTR_PROTO(lptrd)) { MPI_Datatype dtype; const void *rootdata; void *leafdata; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = F90Array1dAccess(rptr, dtype, (void**) &rootdata PETSC_F90_2PTR_PARAM(rptrd));if (*ierr) return; *ierr = F90Array1dAccess(lptr, dtype, (void**) &leafdata PETSC_F90_2PTR_PARAM(lptrd));if (*ierr) return; *ierr = PetscSFBcastEnd(*sf, dtype, rootdata, leafdata, cop); } PETSC_EXTERN void petscsfreducebegin_(PetscSF *sf, MPI_Fint *unit,F90Array1d *lptr, F90Array1d *rptr, MPI_Fint *op, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(lptrd) PETSC_F90_2PTR_PROTO(rptrd)) { MPI_Datatype dtype; const void *rootdata; void *leafdata; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = F90Array1dAccess(rptr, dtype, (void**) &rootdata PETSC_F90_2PTR_PARAM(rptrd));if (*ierr) return; *ierr = F90Array1dAccess(lptr, dtype, (void**) &leafdata PETSC_F90_2PTR_PARAM(lptrd));if (*ierr) return; *ierr = PetscSFReduceBegin(*sf, dtype, rootdata, leafdata, cop); } PETSC_EXTERN void petscsfreduceend_(PetscSF *sf, MPI_Fint *unit,F90Array1d *lptr, F90Array1d *rptr, MPI_Fint *op, PetscErrorCode *ierr PETSC_F90_2PTR_PROTO(lptrd) PETSC_F90_2PTR_PROTO(rptrd)) { MPI_Datatype dtype; const void *rootdata; void *leafdata; MPI_Op cop = MPI_Op_f2c(*op); *ierr = PetscMPIFortranDatatypeToC(*unit,&dtype);if (*ierr) return; *ierr = F90Array1dAccess(rptr, dtype, (void**) &rootdata PETSC_F90_2PTR_PARAM(rptrd));if (*ierr) return; *ierr = F90Array1dAccess(lptr, dtype, (void**) &leafdata PETSC_F90_2PTR_PARAM(lptrd));if (*ierr) return; *ierr = PetscSFReduceEnd(*sf, dtype, rootdata, leafdata, cop); } PETSC_EXTERN void petscsfviewfromoptions_(PetscSF *ao,PetscObject obj,char* type,PetscErrorCode *ierr,PETSC_FORTRAN_CHARLEN_T len) { char *t; FIXCHAR(type,len,t); CHKFORTRANNULLOBJECT(obj); *ierr = PetscSFViewFromOptions(*ao,obj,t);if (*ierr) return; FREECHAR(type,t); } PETSC_EXTERN void petscsfdestroy_(PetscSF *x,int *ierr) { PETSC_FORTRAN_OBJECT_F_DESTROYED_TO_C_NULL(x); *ierr = PetscSFDestroy(x); if (*ierr) return; PETSC_FORTRAN_OBJECT_C_NULL_TO_F_DESTROYED(x); } #endif -------------- next part -------------- A non-text attachment was scrubbed... Name: petscvec.h90 Type: application/octet-stream Size: 11554 bytes Desc: not available URL: From zhan2355 at purdue.edu Wed Jan 11 04:03:12 2023 From: zhan2355 at purdue.edu (Sijie Zhang) Date: Wed, 11 Jan 2023 10:03:12 +0000 Subject: [petsc-users] PETSC install Message-ID: Hi, When I try to install petsc on my workstation, I got the following error. Can you help me with that? Thank you and best regards. Sijie ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Running check examples to verify correct installation Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 *******************Error detected during compile or link!******************* See https://petsc.org/release/faq/ /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f ********************************************************* /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). Number of SNES iterations = 3 Completed test examples Error while running make check gmake[1]: *** [makefile:149: check] Error 1 make: *** [GNUmakefile:17: check] Error 2 -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 1062712 bytes Desc: configure.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 118435 bytes Desc: make.log URL: From bsmith at petsc.dev Wed Jan 11 07:22:09 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 11 Jan 2023 08:22:09 -0500 Subject: [petsc-users] PETSC install In-Reply-To: References: Message-ID: https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean > On Jan 11, 2023, at 5:03 AM, Sijie Zhang wrote: > > Hi, > > When I try to install petsc on my workstation, I got the following error. Can you help me with that? > > Thank you and best regards. > > Sijie > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Jan 11 08:16:01 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 11 Jan 2023 08:16:01 -0600 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi, Nicholas, See https://petsc.org/release/developers/contributing/#starting-a-new-feature-branch on how to contribute. You will need to create your own fork of petsc, then create a feature branch and your code, and then ask for a merge request to merge your branch to the petsc repo. Once you have the MR, we can figure out why it could not compile. Thanks for the contribution! --Junchao Zhang On Wed, Jan 11, 2023 at 12:58 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Junchao > > Apologies for not seeing that. Usually, the fortran90-specific functions > have notes on the original C version, and I also can't > see PetscSFCreateSectionSFF90 on the function list on the doc site. Thanks > so much, and I saw your notes on the merge request. > > I don't suppose PetscSFReduceBegin and End are likewise hidden somewhere. > I'm moving between distributions, and I can go forward with > PetscSFBcastBegin, but I also need to go backward with Reduce. > > I feel like this is a one-to-one change from Bcast to Reduce, and I've > added the relevant lines in src/vec/is/sf/interface/ftn-custom/zsf.c and > src/vec/f90-mod/petscvec.h90 and it compiles fine, but I'm still getting a > linking error for the Reduce routines. > > I need some input on what I'm missing here. I hope I didn't miss that this > routine exists elsewhere. > > I've attached the two files, but it's not an ideal way to transmit > changes. > > If I get some instructions on contributing, I can make a merge request for > the changes if they are helpful. > > > Thanks > > Nicholas > > On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang > wrote: > >> Hi, Nicholas, >> It seems we have implemented it, but with another name, >> PetscSFCreateSectionSFF90, see >> https://gitlab.com/petsc/petsc/-/merge_requests/5386 >> Try it to see if it works! >> >> --Junchao Zhang >> >> >> On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Er to be honest I still can't get my stub to compile properly, and I >>> don't know how to go about making a merge request. But here is what I am >>> attempting right now. Let me know how best to proceed >>> >>> >>> Its not exactly clear to me how to setup up the remote offset properly. >>> >>> in src/vec/is/sf/interface/ftn-custom/zsf.c >>> >>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>> { >>> >>> int * remoteOffsets; >>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >>> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >>> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >>> >>> } >>> >>> This is the sticking point. >>> >>> Sincerely >>> Nicholas >>> >>> >>> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang >>> wrote: >>> >>>> Hi, Nicholas, >>>> Could you make a merge request to PETSc and then our Fortran experts >>>> can comment on your MR? >>>> Thanks. >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Junchao >>>>> >>>>> I think I'm almost there, but I could use some insight into how to use >>>>> the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>>> parameter input so if another function comes up, I can add it myself >>>>> without wasting your time. >>>>> I am very grateful for your help and time. >>>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang < >>>>> junchao.zhang at gmail.com> wrote: >>>>> >>>>>> Hi, Nicholas, >>>>>> I am not a fortran guy, but I will try to add >>>>>> petscsfcreatesectionsf. >>>>>> >>>>>> Thanks. >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> I think it should be something like this, but I'm not very fluent in >>>>>>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>>> >>>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>>>>>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>>>>> { >>>>>>> >>>>>>> int * remoteOffsets; >>>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) >>>>>>> return; >>>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>>>> leafSection,*sectionSF);if (*ierr) return; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Hi Junchao >>>>>>>> >>>>>>>> Thanks again for your help in November. I've been using the your >>>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>>> petscsfcreatesectionsf interface as well? >>>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>>> have been struggling with handling the section parameter properly. >>>>>>>> >>>>>>>> Sincerely >>>>>>>> Nicholas >>>>>>>> >>>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>>>>>> deallocated) or direct access to the dmplex? >>>>>>>>>> >>>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sincerely >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>> See this MR, >>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Junchao >>>>>>>>>>>> >>>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>>> >>>>>>>>>>>> Sincerely >>>>>>>>>>>> Nicholas >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely >>>>>>>>>>>>>> Nicholas >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>> >>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>> University of Michigan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Wed Jan 11 10:29:58 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Wed, 11 Jan 2023 11:29:58 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi Junchao I hope I didn't make making any errors; I created the merge request following those instructions and commented @ you. https://gitlab.com/petsc/petsc/-/merge_requests/5969 Sincerely Nicholas On Wed, Jan 11, 2023 at 9:16 AM Junchao Zhang wrote: > Hi, Nicholas, > See > https://petsc.org/release/developers/contributing/#starting-a-new-feature-branch > on how to contribute. > > You will need to create your own fork of petsc, then create a feature > branch and your code, and then ask for a merge request to merge your branch > to the petsc repo. > > Once you have the MR, we can figure out why it could not compile. > > Thanks for the contribution! > > --Junchao Zhang > > > On Wed, Jan 11, 2023 at 12:58 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Junchao >> >> Apologies for not seeing that. Usually, the fortran90-specific functions >> have notes on the original C version, and I also can't >> see PetscSFCreateSectionSFF90 on the function list on the doc site. Thanks >> so much, and I saw your notes on the merge request. >> >> I don't suppose PetscSFReduceBegin and End are likewise hidden somewhere. >> I'm moving between distributions, and I can go forward with >> PetscSFBcastBegin, but I also need to go backward with Reduce. >> >> I feel like this is a one-to-one change from Bcast to Reduce, and I've >> added the relevant lines in src/vec/is/sf/interface/ftn-custom/zsf.c and >> src/vec/f90-mod/petscvec.h90 and it compiles fine, but I'm still getting a >> linking error for the Reduce routines. >> >> I need some input on what I'm missing here. I hope I didn't miss that >> this routine exists elsewhere. >> >> I've attached the two files, but it's not an ideal way to transmit >> changes. >> >> If I get some instructions on contributing, I can make a merge request >> for the changes if they are helpful. >> >> >> Thanks >> >> Nicholas >> >> On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang >> wrote: >> >>> Hi, Nicholas, >>> It seems we have implemented it, but with another name, >>> PetscSFCreateSectionSFF90, see >>> https://gitlab.com/petsc/petsc/-/merge_requests/5386 >>> Try it to see if it works! >>> >>> --Junchao Zhang >>> >>> >>> On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Er to be honest I still can't get my stub to compile properly, and I >>>> don't know how to go about making a merge request. But here is what I am >>>> attempting right now. Let me know how best to proceed >>>> >>>> >>>> Its not exactly clear to me how to setup up the remote offset properly. >>>> >>>> in src/vec/is/sf/interface/ftn-custom/zsf.c >>>> >>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>> { >>>> >>>> int * remoteOffsets; >>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >>>> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >>>> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >>>> >>>> } >>>> >>>> This is the sticking point. >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> >>>> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang >>>> wrote: >>>> >>>>> Hi, Nicholas, >>>>> Could you make a merge request to PETSc and then our Fortran experts >>>>> can comment on your MR? >>>>> Thanks. >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Junchao >>>>>> >>>>>> I think I'm almost there, but I could use some insight into how to >>>>>> use the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>>>> parameter input so if another function comes up, I can add it myself >>>>>> without wasting your time. >>>>>> I am very grateful for your help and time. >>>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> Hi, Nicholas, >>>>>>> I am not a fortran guy, but I will try to add >>>>>>> petscsfcreatesectionsf. >>>>>>> >>>>>>> Thanks. >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> I think it should be something like this, but I'm not very fluent >>>>>>>> in Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>>>> >>>>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>>>>>>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO >>>>>>>> (remoteoffsetsd)) >>>>>>>> { >>>>>>>> >>>>>>>> int * remoteOffsets; >>>>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) >>>>>>>> return; >>>>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>>>>> leafSection,*sectionSF);if (*ierr) return; >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi Junchao >>>>>>>>> >>>>>>>>> Thanks again for your help in November. I've been using the your >>>>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>>>> petscsfcreatesectionsf interface as well? >>>>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>>>> have been struggling with handling the section parameter properly. >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> >>>>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just >>>>>>>>>>> one question: will the array outputs on the fortran side copies (and need >>>>>>>>>>> to be deallocated) or direct access to the dmplex? >>>>>>>>>>> >>>>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sincerely >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>> See this MR, >>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Junchao >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely >>>>>>>>>>>>> Nicholas >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely >>>>>>>>>>>>>>> Nicholas >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>> >>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhan2355 at purdue.edu Wed Jan 11 17:51:29 2023 From: zhan2355 at purdue.edu (Sijie Zhang) Date: Wed, 11 Jan 2023 23:51:29 +0000 Subject: [petsc-users] PETSC install In-Reply-To: References: Message-ID: Hi, I tried that but it's showing the same error. Can you help me to take a look at that? Thanks. Sijie +++++++++++++++++++++++++++++++++++++++++++++++++++++ Running check examples to verify correct installation Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 *******************Error detected during compile or link!******************* See https://petsc.org/release/faq/ /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f ********************************************************* /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). Number of SNES iterations = 3 Completed test examples Error while running make check gmake[1]: *** [makefile:149: check] Error 1 make: *** [GNUmakefile:17: check] Error 2 ________________________________________ From: Barry Smith Sent: Wednesday, January 11, 2023 8:22 AM To: Sijie Zhang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC install ---- External Email: Use caution with attachments, links, or sharing data ---- https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean On Jan 11, 2023, at 5:03 AM, Sijie Zhang wrote: Hi, When I try to install petsc on my workstation, I got the following error. Can you help me with that? Thank you and best regards. Sijie ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Running check examples to verify correct installation Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 *******************Error detected during compile or link!******************* See https://petsc.org/release/faq/ /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f ********************************************************* /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). Number of SNES iterations = 3 Completed test examples Error while running make check gmake[1]: *** [makefile:149: check] Error 1 make: *** [GNUmakefile:17: check] Error 2 -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 1062771 bytes Desc: configure.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 118445 bytes Desc: make.log URL: From bsmith at petsc.dev Wed Jan 11 18:33:26 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 11 Jan 2023 19:33:26 -0500 Subject: [petsc-users] PETSC install In-Reply-To: References: Message-ID: <639C683F-C7F0-4A95-B0C0-91AEA3158DB1@petsc.dev> Did you do exactly: export HWLOC_HIDE_ERRORS=2 make check ? > On Jan 11, 2023, at 6:51 PM, Sijie Zhang wrote: > > Hi, > > I tried that but it's showing the same error. Can you help me to take a look at that? > > Thanks. > > Sijie > > +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > ________________________________________ > From: Barry Smith > Sent: Wednesday, January 11, 2023 8:22 AM > To: Sijie Zhang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC install > > ---- External Email: Use caution with attachments, links, or sharing data ---- > > > https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean > > On Jan 11, 2023, at 5:03 AM, Sijie Zhang wrote: > > Hi, > > When I try to install petsc on my workstation, I got the following error. Can you help me with that? > > Thank you and best regards. > > Sijie > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 11 18:40:32 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Jan 2023 14:40:32 -1000 Subject: [petsc-users] PETSC install In-Reply-To: <639C683F-C7F0-4A95-B0C0-91AEA3158DB1@petsc.dev> References: <639C683F-C7F0-4A95-B0C0-91AEA3158DB1@petsc.dev> Message-ID: On Wed, Jan 11, 2023 at 2:33 PM Barry Smith wrote: > > Did you do exactly: > > export HWLOC_HIDE_ERRORS=2 > > make check > > Also, what shell are you using? The command above is for bash, but if you use csh it is different. Thanks, Matt > ? > > > > On Jan 11, 2023, at 6:51 PM, Sijie Zhang wrote: > > Hi, > > I tried that but it's showing the same error. Can you help me to take a > look at that? > > Thanks. > > Sijie > > +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and > PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or > link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall > -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch > -Wno-unused-dummy-argument -g -O0 > -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include > -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include > -I/opt/intel/oneapi/mkl/2023.0.0/include > -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 > -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib > -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib > -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 > -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 > -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release > -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release > -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib > -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 > -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core > -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread > -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory > ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? > [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory > ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI > process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > ________________________________________ > From: Barry Smith > Sent: Wednesday, January 11, 2023 8:22 AM > To: Sijie Zhang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC install > > ---- External Email: Use caution with attachments, links, or sharing data > ---- > > > > https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean > > On Jan 11, 2023, at 5:03 AM, Sijie Zhang wrote: > > Hi, > > When I try to install petsc on my workstation, I got the following error. > Can you help me with that? > > Thank you and best regards. > > Sijie > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and > PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or > link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall > -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch > -Wno-unused-dummy-argument -g -O0 > -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include > -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include > -I/opt/intel/oneapi/mkl/2023.0.0/include > -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 > -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib > -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib > -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 > -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 > -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release > -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release > -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib > -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 > -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core > -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread > -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory > ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? > [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory > ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI > process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really > needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shapero at uw.edu Wed Jan 11 18:42:54 2023 From: shapero at uw.edu (Daniel R. Shapero) Date: Wed, 11 Jan 2023 16:42:54 -0800 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh Message-ID: Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes that are generated by gmsh and I don't understand how the coordinates are stored in the plex. I've been discussing this with Matt Knepley here as it pertains to Firedrake but I think this is more an issue at the PETSc level. This code uses gmsh to generate a 2nd-order mesh of the unit disk, read it into a DMPlex, print out the number of cells in each depth stratum, and finally print a view of the coordinate DM's section. The resulting mesh has 64 triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd expected there to be 2 degrees of freedom at each node and 2 at each edge. The output is: ``` Depth strata: [(64, 105), (105, 209), (0, 64)] PetscSection Object: 1 MPI process type not yet set 1 fields field 0 with 2 components Process 0: ( 0) dim 12 offset 0 ( 1) dim 12 offset 12 ( 2) dim 12 offset 24 ... ( 62) dim 12 offset 744 ( 63) dim 12 offset 756 ( 64) dim 0 offset 768 ( 65) dim 0 offset 768 ... ( 207) dim 0 offset 768 ( 208) dim 0 offset 768 PetscSectionSym Object: 1 MPI process type: label Label 'depth' Symmetry for stratum value 0 (0 dofs per point): no symmetries Symmetry for stratum value 1 (0 dofs per point): no symmetries Symmetry for stratum value 2 (12 dofs per point): Orientation range: [-3, 3) Symmetry for stratum value -1 (0 dofs per point): no symmetries ``` The output suggests that there are 12 degrees of freedom in each triangle. That would mean the coordinate field is discontinuous across cell boundaries. Can someone explain what's going on? I tried reading the .msh file but it's totally inscrutable to me. I'm happy to RTFSC if someone points me in the right direction. Matt tells me that the coordinate field should only be discontinuous if the mesh is periodic, but this mesh shouldn't be periodic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 11 18:54:42 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Jan 2023 14:54:42 -1000 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: References: Message-ID: Can you send the .msh file? I still have not installed Gmsh :) Thanks, Matt On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero wrote: > Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes > that are generated by gmsh and I don't understand how the coordinates are > stored in the plex. I've been discussing this with Matt Knepley here > as it pertains > to Firedrake but I think this is more an issue at the PETSc level. > > This code > > uses gmsh to generate a 2nd-order mesh of the unit disk, read it into a > DMPlex, print out the number of cells in each depth stratum, and finally > print a view of the coordinate DM's section. The resulting mesh has 64 > triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd expected > there to be 2 degrees of freedom at each node and 2 at each edge. The > output is: > > ``` > Depth strata: [(64, 105), (105, 209), (0, 64)] > > PetscSection Object: 1 MPI process > type not yet set > 1 fields > field 0 with 2 components > Process 0: > ( 0) dim 12 offset 0 > ( 1) dim 12 offset 12 > ( 2) dim 12 offset 24 > ... > ( 62) dim 12 offset 744 > ( 63) dim 12 offset 756 > ( 64) dim 0 offset 768 > ( 65) dim 0 offset 768 > ... > ( 207) dim 0 offset 768 > ( 208) dim 0 offset 768 > PetscSectionSym Object: 1 MPI process > type: label > Label 'depth' > Symmetry for stratum value 0 (0 dofs per point): no symmetries > Symmetry for stratum value 1 (0 dofs per point): no symmetries > Symmetry for stratum value 2 (12 dofs per point): > Orientation range: [-3, 3) > Symmetry for stratum value -1 (0 dofs per point): no symmetries > ``` > > The output suggests that there are 12 degrees of freedom in each triangle. > That would mean the coordinate field is discontinuous across cell > boundaries. Can someone explain what's going on? I tried reading the .msh > file but it's totally inscrutable to me. I'm happy to RTFSC if someone > points me in the right direction. Matt tells me that the coordinate field > should only be discontinuous if the mesh is periodic, but this mesh > shouldn't be periodic. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shapero at uw.edu Wed Jan 11 19:33:24 2023 From: shapero at uw.edu (Daniel R. Shapero) Date: Wed, 11 Jan 2023 17:33:24 -0800 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: References: Message-ID: Sorry either your mail system or mine prevented me from attaching the file, so I put it on pastebin: https://pastebin.com/awFpc1Js On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley wrote: > Can you send the .msh file? I still have not installed Gmsh :) > > Thanks, > > Matt > > On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero wrote: > >> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes >> that are generated by gmsh and I don't understand how the coordinates are >> stored in the plex. I've been discussing this with Matt Knepley here >> >> as it pertains to Firedrake but I think this is more an issue at the PETSc >> level. >> >> This code >> >> uses gmsh to generate a 2nd-order mesh of the unit disk, read it into a >> DMPlex, print out the number of cells in each depth stratum, and finally >> print a view of the coordinate DM's section. The resulting mesh has 64 >> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd expected >> there to be 2 degrees of freedom at each node and 2 at each edge. The >> output is: >> >> ``` >> Depth strata: [(64, 105), (105, 209), (0, 64)] >> >> PetscSection Object: 1 MPI process >> type not yet set >> 1 fields >> field 0 with 2 components >> Process 0: >> ( 0) dim 12 offset 0 >> ( 1) dim 12 offset 12 >> ( 2) dim 12 offset 24 >> ... >> ( 62) dim 12 offset 744 >> ( 63) dim 12 offset 756 >> ( 64) dim 0 offset 768 >> ( 65) dim 0 offset 768 >> ... >> ( 207) dim 0 offset 768 >> ( 208) dim 0 offset 768 >> PetscSectionSym Object: 1 MPI process >> type: label >> Label 'depth' >> Symmetry for stratum value 0 (0 dofs per point): no symmetries >> Symmetry for stratum value 1 (0 dofs per point): no symmetries >> Symmetry for stratum value 2 (12 dofs per point): >> Orientation range: [-3, 3) >> Symmetry for stratum value -1 (0 dofs per point): no symmetries >> ``` >> >> The output suggests that there are 12 degrees of freedom in each >> triangle. That would mean the coordinate field is discontinuous across cell >> boundaries. Can someone explain what's going on? I tried reading the .msh >> file but it's totally inscrutable to me. I'm happy to RTFSC if someone >> points me in the right direction. Matt tells me that the coordinate field >> should only be discontinuous if the mesh is periodic, but this mesh >> shouldn't be periodic. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhan2355 at purdue.edu Wed Jan 11 22:52:27 2023 From: zhan2355 at purdue.edu (Sijie Zhang) Date: Thu, 12 Jan 2023 04:52:27 +0000 Subject: [petsc-users] PETSC install In-Reply-To: References: <639C683F-C7F0-4A95-B0C0-91AEA3158DB1@petsc.dev> Message-ID: Yes, I followed the exact instructions. I?m using bash. I put the environmental variable in the .bashrc file. This only happens to my intel i129700 workstation. Is it because of the hardware? Thanks. Sijie Sent from Mail for Windows From: Matthew Knepley Sent: Wednesday, January 11, 2023 7:40 PM To: Barry Smith Cc: Sijie Zhang; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC install ---- External Email: Use caution with attachments, links, or sharing data ---- On Wed, Jan 11, 2023 at 2:33 PM Barry Smith > wrote: Did you do exactly: export HWLOC_HIDE_ERRORS=2 make check Also, what shell are you using? The command above is for bash, but if you use csh it is different. Thanks, Matt ? On Jan 11, 2023, at 6:51 PM, Sijie Zhang > wrote: Hi, I tried that but it's showing the same error. Can you help me to take a look at that? Thanks. Sijie +++++++++++++++++++++++++++++++++++++++++++++++++++++ Running check examples to verify correct installation Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 *******************Error detected during compile or link!******************* See https://petsc.org/release/faq/ /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f ********************************************************* /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). Number of SNES iterations = 3 Completed test examples Error while running make check gmake[1]: *** [makefile:149: check] Error 1 make: *** [GNUmakefile:17: check] Error 2 ________________________________________ From: Barry Smith > Sent: Wednesday, January 11, 2023 8:22 AM To: Sijie Zhang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC install ---- External Email: Use caution with attachments, links, or sharing data ---- https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean On Jan 11, 2023, at 5:03 AM, Sijie Zhang > wrote: Hi, When I try to install petsc on my workstation, I got the following error. Can you help me with that? Thank you and best regards. Sijie ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Running check examples to verify correct installation Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 *******************Error detected during compile or link!******************* See https://petsc.org/release/faq/ /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f ********************************************************* /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). Number of SNES iterations = 3 Completed test examples Error while running make check gmake[1]: *** [makefile:149: check] Error 1 make: *** [GNUmakefile:17: check] Error 2 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jan 11 23:41:26 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 11 Jan 2023 23:41:26 -0600 (CST) Subject: [petsc-users] PETSC install In-Reply-To: References: <639C683F-C7F0-4A95-B0C0-91AEA3158DB1@petsc.dev> Message-ID: <9cafb5cc-4f35-5e9e-7e41-e8e0eb6603d9@mcs.anl.gov> The examples appear to run [with extra warnings]. You could assume you have a working install of petsc and start using it [but you will continue to get these messages].. Wrt the warnings - You have 2 sets of messages. 1. hwloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). 2. > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] The env variable HWLOC_HIDE_ERRORS=2 should fix the first one. Are you still getting it? The second one is due to spurious stuff from your MPI wrappers: >>> Executing: /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -c -o /tmp/petsc-v6nugevd/config.setCompilers/conftest.o -I/tmp/petsc-v6nugevd/config.setCompilers /tmp/petsc-v6nugevd/config.setCompilers/conftest.F90 Possible ERROR while running compiler:exit code 0 stderr: f951: Warning: Nonexistent include directory ???I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0??? [-Wmissing-include-dirs] f951: Warning: Nonexistent include directory ???I_MPI_SUBSTITUTE_INSTALLDIR/include??? [-Wmissing-include-dirs] <<<< Also: >>>>> Executing: /opt/intel/oneapi/mpi/2021.8.0/bin/mpiifort -c -o /tmp/petsc-v6nugevd/config.setCompilers/conftest.o -I/tmp/petsc-v6nugevd/config.setCompilers /tmp/petsc-v6nugevd/config.setCompilers/conftest.F90 Possible ERROR while running compiler: exit code 127 stderr: /opt/intel/oneapi/mpi/2021.8.0/bin/mpiifort: 1: eval: ifort: not found <<<<<< So it would be good if you can fix your Intel compilers and Intel-MPI [to not give the above warnings/errors] - before installing/using PETSc with it. Or - just use gnu compilers (with mpich) - unless you have a really good reason to use Intel compilers/MPI Satish On Thu, 12 Jan 2023, Sijie Zhang wrote: > Yes, I followed the exact instructions. I?m using bash. I put the environmental variable in the .bashrc file. This only happens to my intel i129700 workstation. Is it because of the hardware? > > Thanks. > > Sijie > > Sent from Mail for Windows > > From: Matthew Knepley > Sent: Wednesday, January 11, 2023 7:40 PM > To: Barry Smith > Cc: Sijie Zhang; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC install > > ---- External Email: Use caution with attachments, links, or sharing data ---- > > On Wed, Jan 11, 2023 at 2:33 PM Barry Smith > wrote: > > Did you do exactly: > > > export HWLOC_HIDE_ERRORS=2 > > make check > > Also, what shell are you using? The command above is for bash, but if you use csh it is different. > > Thanks, > > Matt > > > ? > > > > > On Jan 11, 2023, at 6:51 PM, Sijie Zhang > wrote: > > Hi, > > I tried that but it's showing the same error. Can you help me to take a look at that? > > Thanks. > > Sijie > > +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Pack age/pets c-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > ________________________________________ > From: Barry Smith > > Sent: Wednesday, January 11, 2023 8:22 AM > To: Sijie Zhang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC install > > ---- External Email: Use caution with attachments, links, or sharing data ---- > > > https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean > > On Jan 11, 2023, at 5:03 AM, Sijie Zhang > wrote: > > Hi, > > When I try to install petsc on my workstation, I got the following error. Can you help me with that? > > Thank you and best regards. > > Sijie > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Pack age/pets c-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > From bsmith at petsc.dev Thu Jan 12 09:29:27 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 12 Jan 2023 10:29:27 -0500 Subject: [petsc-users] PETSC install In-Reply-To: References: <639C683F-C7F0-4A95-B0C0-91AEA3158DB1@petsc.dev> Message-ID: <2B420B2C-BE86-4145-B125-1D43C6629036@petsc.dev> If you put the variable in the .bashrc file then you mush source ~/.bashrc before running the make check Barry > On Jan 11, 2023, at 11:52 PM, Sijie Zhang wrote: > > Yes, I followed the exact instructions. I?m using bash. I put the environmental variable in the .bashrc file. This only happens to my intel i129700 workstation. Is it because of the hardware? > > Thanks. > > Sijie > > Sent from Mail for Windows > > From: Matthew Knepley > Sent: Wednesday, January 11, 2023 7:40 PM > To: Barry Smith > Cc: Sijie Zhang ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC install > > ---- External Email: Use caution with attachments, links, or sharing data ---- > > On Wed, Jan 11, 2023 at 2:33 PM Barry Smith > wrote: > > Did you do exactly: > > export HWLOC_HIDE_ERRORS=2 > make check > > Also, what shell are you using? The command above is for bash, but if you use csh it is different. > > Thanks, > > Matt > > ? > > > > > On Jan 11, 2023, at 6:51 PM, Sijie Zhang > wrote: > > Hi, > > I tried that but it's showing the same error. Can you help me to take a look at that? > > Thanks. > > Sijie > > +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > ________________________________________ > From: Barry Smith > > Sent: Wednesday, January 11, 2023 8:22 AM > To: Sijie Zhang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC install > > ---- External Email: Use caution with attachments, links, or sharing data ---- > > > https://petsc.org/release/faq/#what-does-the-message-hwloc-linux-ignoring-pci-device-with-non-16bit-domain-mean > > On Jan 11, 2023, at 5:03 AM, Sijie Zhang > wrote: > > Hi, > > When I try to install petsc on my workstation, I got the following error. Can you help me with that? > > Thank you and best regards. > > Sijie > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running check examples to verify correct installation > Using PETSC_DIR=/home/zhangsijie1995/Documents/Package/petsc-3.18.3 and PETSC_ARCH=arch-linux-c-debug > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > *******************Error detected during compile or link!******************* > See https://petsc.org/release/faq/ > /home/zhangsijie1995/Documents/Package/petsc-3.18.3/src/snes/tutorials ex5f > ********************************************************* > /opt/intel/oneapi/mpi/2021.8.0/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/include -I/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/include -I/opt/intel/oneapi/mkl/2023.0.0/include -I/opt/intel/oneapi/mpi/2021.8.0/include ex5f.F90 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/arch-linux-c-debug/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -L/opt/intel/oneapi/mkl/2023.0.0/lib/intel64 -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib/release -Wl,-rpath,/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -L/home/zhangsijie1995/Documents/Package/petsc-3.18.3/I_MPI_SUBSTITUTE_INSTALLDIR/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex5f > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include/gfortran/11.1.0? [-Wmissing-include-dirs] > f951: Warning: Nonexistent include directory ?I_MPI_SUBSTITUTE_INSTALLDIR/include? [-Wmissing-include-dirs] > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process > See https://petsc.org/release/faq/ > hwloc/linux: Ignoring PCI device with non-16bit domain. > Pass --enable-32bits-pci-domain to configure to support such devices > (warning: it would break the library ABI, don't enable unless really needed). > Number of SNES iterations = 3 > Completed test examples > Error while running make check > gmake[1]: *** [makefile:149: check] Error 1 > make: *** [GNUmakefile:17: check] Error 2 > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Thu Jan 12 11:14:57 2023 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Thu, 12 Jan 2023 18:14:57 +0100 Subject: [petsc-users] locate DMSwarm particles with respect to a background DMDA mesh In-Reply-To: References: <8a853d0e-b856-5dc2-5439-d25911d672e4@uninsubria.it> <35ebfa58-eed5-39fa-8b3d-918ff9d7e633@uninsubria.it> <52622e96-4dfe-9105-a5a2-8d60d2fc90a8@uninsubria.it> <869ea7bc-52fd-c876-4278-37a8e829af8a@uninsubria.it> <08423cfc-4301-6b76-a791-b5c642198ecf@uninsubria.it> Message-ID: <859b8c86-70d6-46a6-bdc2-0d54e565281e@uninsubria.it> Il 23/12/22 17:14, Matthew Knepley ha scritto: > On Thu, Dec 22, 2022 at 3:08 PM Matteo Semplice > wrote: > > > Il 22/12/22 20:06, Dave May ha scritto: >> >> >> On Thu 22. Dec 2022 at 10:27, Matteo Semplice >> wrote: >> >> Dear Dave and Matt, >> >> ??? I am really dealing with two different use cases in a >> code that will compute a levelset function passing through a >> large set of points. If I had DMSwarmSetMigrateType() and if >> it were safe to switch the migration mode back and forth in >> the same swarm, this would cover all my use cases here. Is it >> safe to add it back to petsc? Details below if you are curious. >> >> 1) During preprocessing I am loading a point cloud from disk >> (in whatever order it comes) and need to send the particles >> to the right ranks. Since the background DM is a DMDA I can >> easily figure out the destination rank. This would be covered >> by your suggestion not to attach the DM, except that later I >> need to locate these points with respect to the background >> cells in order to initialize data on the Vecs associated to >> the DMDA. >> >> 2) Then I need to implement a semilagrangian time evolution >> scheme. For this I'd like to send particles around at the >> "foot of characteristic", collect data there and then send >> them back to the originating point. The first migration would >> be based on particle coordinates >> (DMSwarmMigrate_DMNeighborScatter and the restriction to only >> neighbouring ranks is perfect), while for the second move it >> would be easier to just send them back to the originating >> rank, which I can easily store in an Int field in the swarm. >> Thus at each timestep I'd need to swap migrate types in this >> swarm (DMScatter for moving them to the feet and BASIC to >> send them back). >> >> >> When you use BASIC, you would have to explicitly call the point >> location routine from your code as BASIC does not interact with >> the DM. >> >> Based on what I see in the code, switching ?migrate modes between >> basic and dmneighbourscatter should be safe. >> >> If you are fine calling the point location from your side then >> what you propose should work. > > If I understood the code correctly, BASIC will just migrate > particles sending them to what is stored in DMSwarmField_rank, > right? That'd be easy since I can create a SWARM with all the data > I need and an extra int field (say "original_rank") and copy those > values into DMSwarmField_rank before calling migrate for the > "going back" step. After this backward migration I do not need to > locate particles again (e.g. I do not need DMSwarmSortGetAccess > after the BASIC migration, but only after the DMNeighborScatter one). > > Thus having back DMSwarmSetMigrateType() should be enough for me. > > Hi Matteo, > > I have done this in > > https://gitlab.com/petsc/petsc/-/merge_requests/5941 > > > I also hope to get the fix for your DMDA issue in there. Hi. I have finally got round to testing the updates and, using the main branch, my issues are fixed. Only, I have noticed that, after a DMSwarmMigrate_DMNeighborScatter, the field DMSwarmField_rank has the same content as the field DMSwarmPICField_cellid. It does not affect me, but it seems a little strange and might surprise users... In the long term, a word in the docs about the names/content of the fields that are automatically created in a swarm would be helpful. Thanks! ??? Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Thu Jan 12 13:27:44 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Thu, 12 Jan 2023 14:27:44 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: Hi Junchao Going back to this merge request. I'm not sure I follow exactly the usage described in the commit history. I have a c prototype of what I am trying to do which is PetscSectionCreate(PETSC_COMM_WORLD, &leafSection); PetscSFDistributeSection(redistributionSF, filteredSection_local, &remoteOffsets, leafSection); PetscSFCreateSectionSF(redistributionSF, filteredSection_local, remoteOffsets, leafSection, &redistributionSF_dof); But something seems unclear with the usage in fortran around the remoteoffsets. Do I have to insert the CreateRemoteOffsetsF90 like so? Any clarification would be greatly appreciated. call PetscSFDistributeSectionF90(distributionSF, section_filt_l, remoteoffsets, leafSection, ierr) call PetscSFCreateRemoteOffsetsf90(distributionSF, section_filt_l, leafSection, remoteoffsets, ierr ) call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, remoteoffsets, leafSection, distributionSF_dof, ierr) Sincerely Nicholas On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang wrote: > Hi, Nicholas, > It seems we have implemented it, but with another name, > PetscSFCreateSectionSFF90, see > https://gitlab.com/petsc/petsc/-/merge_requests/5386 > Try it to see if it works! > > --Junchao Zhang > > > On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Er to be honest I still can't get my stub to compile properly, and I >> don't know how to go about making a merge request. But here is what I am >> attempting right now. Let me know how best to proceed >> >> >> Its not exactly clear to me how to setup up the remote offset properly. >> >> in src/vec/is/sf/interface/ftn-custom/zsf.c >> >> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >> { >> >> int * remoteOffsets; >> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >> >> } >> >> This is the sticking point. >> >> Sincerely >> Nicholas >> >> >> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang >> wrote: >> >>> Hi, Nicholas, >>> Could you make a merge request to PETSc and then our Fortran experts >>> can comment on your MR? >>> Thanks. >>> >>> --Junchao Zhang >>> >>> >>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Junchao >>>> >>>> I think I'm almost there, but I could use some insight into how to use >>>> the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>> parameter input so if another function comes up, I can add it myself >>>> without wasting your time. >>>> I am very grateful for your help and time. >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang >>>> wrote: >>>> >>>>> Hi, Nicholas, >>>>> I am not a fortran guy, but I will try to add >>>>> petscsfcreatesectionsf. >>>>> >>>>> Thanks. >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> I think it should be something like this, but I'm not very fluent in >>>>>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>> >>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>>>>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>>>> { >>>>>> >>>>>> int * remoteOffsets; >>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return >>>>>> ; >>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>>> leafSection,*sectionSF);if (*ierr) return; >>>>>> >>>>>> } >>>>>> >>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Junchao >>>>>>> >>>>>>> Thanks again for your help in November. I've been using the your >>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>> petscsfcreatesectionsf interface as well? >>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>> have been struggling with handling the section parameter properly. >>>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>>>>> deallocated) or direct access to the dmplex? >>>>>>>>> >>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, Nicholas, >>>>>>>>>> See this MR, >>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Junchao >>>>>>>>>>> >>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>> >>>>>>>>>>> Sincerely >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>> Thanks. >>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>> >>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely >>>>>>>>>>>>> Nicholas >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>> >>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jan 12 17:33:04 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 12 Jan 2023 16:33:04 -0700 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: References: Message-ID: <87fscfpbv3.fsf@jedbrown.org> It's confusing, but this line makes high order simplices always read as discontinuous coordinate spaces. I would love if someone would revisit that, perhaps also using DMPlexSetIsoperiodicFaceSF(), which should simplify the code and avoid the confusing cell coordinates pattern. Sadly, I don't have time to dive in. https://gitlab.com/petsc/petsc/-/commit/066ea43f7f75752f012be6cd06b6107ebe84cc6d#3616cad8148970af5b97293c49492ff893e25b59_1552_1724 "Daniel R. Shapero" writes: > Sorry either your mail system or mine prevented me from attaching the file, > so I put it on pastebin: > https://pastebin.com/awFpc1Js > > On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley wrote: > >> Can you send the .msh file? I still have not installed Gmsh :) >> >> Thanks, >> >> Matt >> >> On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero wrote: >> >>> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes >>> that are generated by gmsh and I don't understand how the coordinates are >>> stored in the plex. I've been discussing this with Matt Knepley here >>> >>> as it pertains to Firedrake but I think this is more an issue at the PETSc >>> level. >>> >>> This code >>> >>> uses gmsh to generate a 2nd-order mesh of the unit disk, read it into a >>> DMPlex, print out the number of cells in each depth stratum, and finally >>> print a view of the coordinate DM's section. The resulting mesh has 64 >>> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd expected >>> there to be 2 degrees of freedom at each node and 2 at each edge. The >>> output is: >>> >>> ``` >>> Depth strata: [(64, 105), (105, 209), (0, 64)] >>> >>> PetscSection Object: 1 MPI process >>> type not yet set >>> 1 fields >>> field 0 with 2 components >>> Process 0: >>> ( 0) dim 12 offset 0 >>> ( 1) dim 12 offset 12 >>> ( 2) dim 12 offset 24 >>> ... >>> ( 62) dim 12 offset 744 >>> ( 63) dim 12 offset 756 >>> ( 64) dim 0 offset 768 >>> ( 65) dim 0 offset 768 >>> ... >>> ( 207) dim 0 offset 768 >>> ( 208) dim 0 offset 768 >>> PetscSectionSym Object: 1 MPI process >>> type: label >>> Label 'depth' >>> Symmetry for stratum value 0 (0 dofs per point): no symmetries >>> Symmetry for stratum value 1 (0 dofs per point): no symmetries >>> Symmetry for stratum value 2 (12 dofs per point): >>> Orientation range: [-3, 3) >>> Symmetry for stratum value -1 (0 dofs per point): no symmetries >>> ``` >>> >>> The output suggests that there are 12 degrees of freedom in each >>> triangle. That would mean the coordinate field is discontinuous across cell >>> boundaries. Can someone explain what's going on? I tried reading the .msh >>> file but it's totally inscrutable to me. I'm happy to RTFSC if someone >>> points me in the right direction. Matt tells me that the coordinate field >>> should only be discontinuous if the mesh is periodic, but this mesh >>> shouldn't be periodic. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> From knepley at gmail.com Thu Jan 12 18:13:43 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Jan 2023 14:13:43 -1000 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: <87fscfpbv3.fsf@jedbrown.org> References: <87fscfpbv3.fsf@jedbrown.org> Message-ID: On Thu, Jan 12, 2023 at 1:33 PM Jed Brown wrote: > It's confusing, but this line makes high order simplices always read as > discontinuous coordinate spaces. I would love if someone would revisit > that, perhaps also using DMPlexSetIsoperiodicFaceSF(), Perhaps as a switch, but there is no way I am getting rid of the current periodicity. As we have discussed before, breaking the topological relation is a non-starter for me. It does look like higher order Gmsh does read as DG. We can just project that to CG for non-periodic stuff. Thanks, Matt which should simplify the code and avoid the confusing cell coordinates > pattern. Sadly, I don't have time to dive in. > > > https://gitlab.com/petsc/petsc/-/commit/066ea43f7f75752f012be6cd06b6107ebe84cc6d#3616cad8148970af5b97293c49492ff893e25b59_1552_1724 > > "Daniel R. Shapero" writes: > > > Sorry either your mail system or mine prevented me from attaching the > file, > > so I put it on pastebin: > > https://pastebin.com/awFpc1Js > > > > On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley > wrote: > > > >> Can you send the .msh file? I still have not installed Gmsh :) > >> > >> Thanks, > >> > >> Matt > >> > >> On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero > wrote: > >> > >>> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes > >>> that are generated by gmsh and I don't understand how the coordinates > are > >>> stored in the plex. I've been discussing this with Matt Knepley here > >>> < > https://urldefense.com/v3/__https://github.com/firedrakeproject/firedrake/issues/982__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2gOStva7A$ > > > >>> as it pertains to Firedrake but I think this is more an issue at the > PETSc > >>> level. > >>> > >>> This code > >>> < > https://urldefense.com/v3/__https://gist.github.com/danshapero/a140daaf951ba58c48285ec29f5973cc__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2hho2eD1g$ > > > >>> uses gmsh to generate a 2nd-order mesh of the unit disk, read it into a > >>> DMPlex, print out the number of cells in each depth stratum, and > finally > >>> print a view of the coordinate DM's section. The resulting mesh has 64 > >>> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd > expected > >>> there to be 2 degrees of freedom at each node and 2 at each edge. The > >>> output is: > >>> > >>> ``` > >>> Depth strata: [(64, 105), (105, 209), (0, 64)] > >>> > >>> PetscSection Object: 1 MPI process > >>> type not yet set > >>> 1 fields > >>> field 0 with 2 components > >>> Process 0: > >>> ( 0) dim 12 offset 0 > >>> ( 1) dim 12 offset 12 > >>> ( 2) dim 12 offset 24 > >>> ... > >>> ( 62) dim 12 offset 744 > >>> ( 63) dim 12 offset 756 > >>> ( 64) dim 0 offset 768 > >>> ( 65) dim 0 offset 768 > >>> ... > >>> ( 207) dim 0 offset 768 > >>> ( 208) dim 0 offset 768 > >>> PetscSectionSym Object: 1 MPI process > >>> type: label > >>> Label 'depth' > >>> Symmetry for stratum value 0 (0 dofs per point): no symmetries > >>> Symmetry for stratum value 1 (0 dofs per point): no symmetries > >>> Symmetry for stratum value 2 (12 dofs per point): > >>> Orientation range: [-3, 3) > >>> Symmetry for stratum value -1 (0 dofs per point): no symmetries > >>> ``` > >>> > >>> The output suggests that there are 12 degrees of freedom in each > >>> triangle. That would mean the coordinate field is discontinuous across > cell > >>> boundaries. Can someone explain what's going on? I tried reading the > .msh > >>> file but it's totally inscrutable to me. I'm happy to RTFSC if someone > >>> points me in the right direction. Matt tells me that the coordinate > field > >>> should only be discontinuous if the mesh is periodic, but this mesh > >>> shouldn't be periodic. > >>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > >> < > https://urldefense.com/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2go23tjRg$ > > > >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Thu Jan 12 19:57:57 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Fri, 13 Jan 2023 01:57:57 +0000 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: References: <87fscfpbv3.fsf@jedbrown.org> Message-ID: <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 12 19:59:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Jan 2023 15:59:05 -1000 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> References: <87fscfpbv3.fsf@jedbrown.org> <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> Message-ID: On Thu, Jan 12, 2023 at 3:57 PM Blaise Bourdin wrote: > Out of curiosity, what is the rationale for _reading_ high order gmsh > meshes? > Is it so that one can write data back in native gmsh format? > So we can use meshes that other people design I think. Matt > Regards, > Blaise > > > On Jan 12, 2023, at 7:13 PM, Matthew Knepley wrote: > > On Thu, Jan 12, 2023 at 1:33 PM Jed Brown wrote: > >> It's confusing, but this line makes high order simplices always read as >> discontinuous coordinate spaces. I would love if someone would revisit >> that, perhaps also using DMPlexSetIsoperiodicFaceSF(), > > > Perhaps as a switch, but there is no way I am getting rid of the current > periodicity. As we have discussed before, breaking the topological relation > is a non-starter for me. > > It does look like higher order Gmsh does read as DG. We can just project > that to CG for non-periodic stuff. > > Thanks, > > Matt > > which should simplify the code and avoid the confusing cell coordinates >> pattern. Sadly, I don't have time to dive in. >> >> >> https://gitlab.com/petsc/petsc/-/commit/066ea43f7f75752f012be6cd06b6107ebe84cc6d#3616cad8148970af5b97293c49492ff893e25b59_1552_1724 >> >> "Daniel R. Shapero" writes: >> >> > Sorry either your mail system or mine prevented me from attaching the >> file, >> > so I put it on pastebin: >> > https://pastebin.com/awFpc1Js >> > >> > On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley >> wrote: >> > >> >> Can you send the .msh file? I still have not installed Gmsh :) >> >> >> >> Thanks, >> >> >> >> Matt >> >> >> >> On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero >> wrote: >> >> >> >>> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes >> >>> that are generated by gmsh and I don't understand how the coordinates >> are >> >>> stored in the plex. I've been discussing this with Matt Knepley here >> >>> < >> https://urldefense.com/v3/__https://github.com/firedrakeproject/firedrake/issues/982__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2gOStva7A$ >> > >> >>> as it pertains to Firedrake but I think this is more an issue at the >> PETSc >> >>> level. >> >>> >> >>> This code >> >>> < >> https://urldefense.com/v3/__https://gist.github.com/danshapero/a140daaf951ba58c48285ec29f5973cc__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2hho2eD1g$ >> > >> >>> uses gmsh to generate a 2nd-order mesh of the unit disk, read it into >> a >> >>> DMPlex, print out the number of cells in each depth stratum, and >> finally >> >>> print a view of the coordinate DM's section. The resulting mesh has 64 >> >>> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd >> expected >> >>> there to be 2 degrees of freedom at each node and 2 at each edge. The >> >>> output is: >> >>> >> >>> ``` >> >>> Depth strata: [(64, 105), (105, 209), (0, 64)] >> >>> >> >>> PetscSection Object: 1 MPI process >> >>> type not yet set >> >>> 1 fields >> >>> field 0 with 2 components >> >>> Process 0: >> >>> ( 0) dim 12 offset 0 >> >>> ( 1) dim 12 offset 12 >> >>> ( 2) dim 12 offset 24 >> >>> ... >> >>> ( 62) dim 12 offset 744 >> >>> ( 63) dim 12 offset 756 >> >>> ( 64) dim 0 offset 768 >> >>> ( 65) dim 0 offset 768 >> >>> ... >> >>> ( 207) dim 0 offset 768 >> >>> ( 208) dim 0 offset 768 >> >>> PetscSectionSym Object: 1 MPI process >> >>> type: label >> >>> Label 'depth' >> >>> Symmetry for stratum value 0 (0 dofs per point): no symmetries >> >>> Symmetry for stratum value 1 (0 dofs per point): no symmetries >> >>> Symmetry for stratum value 2 (12 dofs per point): >> >>> Orientation range: [-3, 3) >> >>> Symmetry for stratum value -1 (0 dofs per point): no symmetries >> >>> ``` >> >>> >> >>> The output suggests that there are 12 degrees of freedom in each >> >>> triangle. That would mean the coordinate field is discontinuous >> across cell >> >>> boundaries. Can someone explain what's going on? I tried reading the >> .msh >> >>> file but it's totally inscrutable to me. I'm happy to RTFSC if someone >> >>> points me in the right direction. Matt tells me that the coordinate >> field >> >>> should only be discontinuous if the mesh is periodic, but this mesh >> >>> shouldn't be periodic. >> >>> >> >> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> -- Norbert Wiener >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> < >> https://urldefense.com/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2go23tjRg$ >> > >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid > Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Thu Jan 12 20:10:38 2023 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 12 Jan 2023 18:10:38 -0800 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> References: <87fscfpbv3.fsf@jedbrown.org> <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> Message-ID: On Thu 12. Jan 2023 at 17:58, Blaise Bourdin wrote: > Out of curiosity, what is the rationale for _reading_ high order gmsh > meshes? > GMSH can use a CAD engine like OpenCascade. This provides geometric representations via things like BSplines. Such geometric representation are not exposed to the users application code, nor are they embedded in any mesh format GMSH emits. The next best thing is to use a high order representation of the mesh geometry and project the CAD geometry (say a BSpline) into this higher order function space. The projection of the geometry is a quantity that can be described with the .msh format. Is it so that one can write data back in native gmsh format? > No. Cheers, Dave > > Regards, > Blaise > > > On Jan 12, 2023, at 7:13 PM, Matthew Knepley wrote: > > On Thu, Jan 12, 2023 at 1:33 PM Jed Brown wrote: > >> It's confusing, but this line makes high order simplices always read as >> discontinuous coordinate spaces. I would love if someone would revisit >> that, perhaps also using DMPlexSetIsoperiodicFaceSF(), > > > Perhaps as a switch, but there is no way I am getting rid of the current > periodicity. As we have discussed before, breaking the topological relation > is a non-starter for me. > > It does look like higher order Gmsh does read as DG. We can just project > that to CG for non-periodic stuff. > > Thanks, > > Matt > > which should simplify the code and avoid the confusing cell coordinates >> pattern. Sadly, I don't have time to dive in. >> >> >> https://gitlab.com/petsc/petsc/-/commit/066ea43f7f75752f012be6cd06b6107ebe84cc6d#3616cad8148970af5b97293c49492ff893e25b59_1552_1724 >> >> "Daniel R. Shapero" writes: >> >> > Sorry either your mail system or mine prevented me from attaching the >> file, >> > so I put it on pastebin: >> > https://pastebin.com/awFpc1Js >> > >> > On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley >> wrote: >> > >> >> Can you send the .msh file? I still have not installed Gmsh :) >> >> >> >> Thanks, >> >> >> >> Matt >> >> >> >> On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero >> wrote: >> >> >> >>> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes >> >>> that are generated by gmsh and I don't understand how the coordinates >> are >> >>> stored in the plex. I've been discussing this with Matt Knepley here >> >>> < >> https://urldefense.com/v3/__https://github.com/firedrakeproject/firedrake/issues/982__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2gOStva7A$ >> > >> >>> as it pertains to Firedrake but I think this is more an issue at the >> PETSc >> >>> level. >> >>> >> >>> This code >> >>> < >> https://urldefense.com/v3/__https://gist.github.com/danshapero/a140daaf951ba58c48285ec29f5973cc__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2hho2eD1g$ >> > >> >>> uses gmsh to generate a 2nd-order mesh of the unit disk, read it into >> a >> >>> DMPlex, print out the number of cells in each depth stratum, and >> finally >> >>> print a view of the coordinate DM's section. The resulting mesh has 64 >> >>> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd >> expected >> >>> there to be 2 degrees of freedom at each node and 2 at each edge. The >> >>> output is: >> >>> >> >>> ``` >> >>> Depth strata: [(64, 105), (105, 209), (0, 64)] >> >>> >> >>> PetscSection Object: 1 MPI process >> >>> type not yet set >> >>> 1 fields >> >>> field 0 with 2 components >> >>> Process 0: >> >>> ( 0) dim 12 offset 0 >> >>> ( 1) dim 12 offset 12 >> >>> ( 2) dim 12 offset 24 >> >>> ... >> >>> ( 62) dim 12 offset 744 >> >>> ( 63) dim 12 offset 756 >> >>> ( 64) dim 0 offset 768 >> >>> ( 65) dim 0 offset 768 >> >>> ... >> >>> ( 207) dim 0 offset 768 >> >>> ( 208) dim 0 offset 768 >> >>> PetscSectionSym Object: 1 MPI process >> >>> type: label >> >>> Label 'depth' >> >>> Symmetry for stratum value 0 (0 dofs per point): no symmetries >> >>> Symmetry for stratum value 1 (0 dofs per point): no symmetries >> >>> Symmetry for stratum value 2 (12 dofs per point): >> >>> Orientation range: [-3, 3) >> >>> Symmetry for stratum value -1 (0 dofs per point): no symmetries >> >>> ``` >> >>> >> >>> The output suggests that there are 12 degrees of freedom in each >> >>> triangle. That would mean the coordinate field is discontinuous >> across cell >> >>> boundaries. Can someone explain what's going on? I tried reading the >> .msh >> >>> file but it's totally inscrutable to me. I'm happy to RTFSC if someone >> >>> points me in the right direction. Matt tells me that the coordinate >> field >> >>> should only be discontinuous if the mesh is periodic, but this mesh >> >>> shouldn't be periodic. >> >>> >> >> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> -- Norbert Wiener >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> < >> https://urldefense.com/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2go23tjRg$ >> > >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid > Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, > > Ontario L8S 4K1, Canada > > > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 12 20:14:34 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Jan 2023 16:14:34 -1000 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: References: <87fscfpbv3.fsf@jedbrown.org> <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> Message-ID: On Thu, Jan 12, 2023 at 4:10 PM Dave May wrote: > On Thu 12. Jan 2023 at 17:58, Blaise Bourdin wrote: > >> Out of curiosity, what is the rationale for _reading_ high order gmsh >> meshes? >> > > GMSH can use a CAD engine like OpenCascade. This provides geometric > representations via things like BSplines. Such geometric representation are > not exposed to the users application code, nor are they embedded in any > mesh format GMSH emits. The next best thing is to use a high order > representation of the mesh geometry and project the CAD geometry (say a > BSpline) into this higher order function space. The projection of the > geometry is a quantity that can be described with the .msh format. > Note that PETSc can directly read CAD files now and mesh over them. > Is it so that one can write data back in native gmsh format? >> > > No. > > Cheers, > Dave > >> > > > >> Regards, >> Blaise >> >> >> On Jan 12, 2023, at 7:13 PM, Matthew Knepley wrote: >> >> On Thu, Jan 12, 2023 at 1:33 PM Jed Brown wrote: >> >>> It's confusing, but this line makes high order simplices always read as >>> discontinuous coordinate spaces. I would love if someone would revisit >>> that, perhaps also using DMPlexSetIsoperiodicFaceSF(), >> >> >> Perhaps as a switch, but there is no way I am getting rid of the current >> periodicity. As we have discussed before, breaking the topological relation >> is a non-starter for me. >> >> It does look like higher order Gmsh does read as DG. We can just project >> that to CG for non-periodic stuff. >> >> Thanks, >> >> Matt >> >> which should simplify the code and avoid the confusing cell coordinates >>> pattern. Sadly, I don't have time to dive in. >>> >>> >>> https://gitlab.com/petsc/petsc/-/commit/066ea43f7f75752f012be6cd06b6107ebe84cc6d#3616cad8148970af5b97293c49492ff893e25b59_1552_1724 >>> >>> "Daniel R. Shapero" writes: >>> >>> > Sorry either your mail system or mine prevented me from attaching the >>> file, >>> > so I put it on pastebin: >>> > https://pastebin.com/awFpc1Js >>> > >>> > On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley >>> wrote: >>> > >>> >> Can you send the .msh file? I still have not installed Gmsh :) >>> >> >>> >> Thanks, >>> >> >>> >> Matt >>> >> >>> >> On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero >>> wrote: >>> >> >>> >>> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic >>> meshes >>> >>> that are generated by gmsh and I don't understand how the >>> coordinates are >>> >>> stored in the plex. I've been discussing this with Matt Knepley here >>> >>> < >>> https://urldefense.com/v3/__https://github.com/firedrakeproject/firedrake/issues/982__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2gOStva7A$ >>> > >>> >>> as it pertains to Firedrake but I think this is more an issue at the >>> PETSc >>> >>> level. >>> >>> >>> >>> This code >>> >>> < >>> https://urldefense.com/v3/__https://gist.github.com/danshapero/a140daaf951ba58c48285ec29f5973cc__;!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2hho2eD1g$ >>> > >>> >>> uses gmsh to generate a 2nd-order mesh of the unit disk, read it >>> into a >>> >>> DMPlex, print out the number of cells in each depth stratum, and >>> finally >>> >>> print a view of the coordinate DM's section. The resulting mesh has >>> 64 >>> >>> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd >>> expected >>> >>> there to be 2 degrees of freedom at each node and 2 at each edge. The >>> >>> output is: >>> >>> >>> >>> ``` >>> >>> Depth strata: [(64, 105), (105, 209), (0, 64)] >>> >>> >>> >>> PetscSection Object: 1 MPI process >>> >>> type not yet set >>> >>> 1 fields >>> >>> field 0 with 2 components >>> >>> Process 0: >>> >>> ( 0) dim 12 offset 0 >>> >>> ( 1) dim 12 offset 12 >>> >>> ( 2) dim 12 offset 24 >>> >>> ... >>> >>> ( 62) dim 12 offset 744 >>> >>> ( 63) dim 12 offset 756 >>> >>> ( 64) dim 0 offset 768 >>> >>> ( 65) dim 0 offset 768 >>> >>> ... >>> >>> ( 207) dim 0 offset 768 >>> >>> ( 208) dim 0 offset 768 >>> >>> PetscSectionSym Object: 1 MPI process >>> >>> type: label >>> >>> Label 'depth' >>> >>> Symmetry for stratum value 0 (0 dofs per point): no symmetries >>> >>> Symmetry for stratum value 1 (0 dofs per point): no symmetries >>> >>> Symmetry for stratum value 2 (12 dofs per point): >>> >>> Orientation range: [-3, 3) >>> >>> Symmetry for stratum value -1 (0 dofs per point): no symmetries >>> >>> ``` >>> >>> >>> >>> The output suggests that there are 12 degrees of freedom in each >>> >>> triangle. That would mean the coordinate field is discontinuous >>> across cell >>> >>> boundaries. Can someone explain what's going on? I tried reading the >>> .msh >>> >>> file but it's totally inscrutable to me. I'm happy to RTFSC if >>> someone >>> >>> points me in the right direction. Matt tells me that the coordinate >>> field >>> >>> should only be discontinuous if the mesh is periodic, but this mesh >>> >>> shouldn't be periodic. >>> >>> >>> >> >>> >> >>> >> -- >>> >> What most experimenters take for granted before they begin their >>> >> experiments is infinitely more interesting than any results to which >>> their >>> >> experiments lead. >>> >> -- Norbert Wiener >>> >> >>> >> https://www.cse.buffalo.edu/~knepley/ >>> >> < >>> https://urldefense.com/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!K-Hz7m0Vt54!hL9WLR51ieyHFZx8N9AjhDwJCRpvmQto9CL1XOTkkAxFfUbtsabHuBDOATnWyP6lQszhA2go23tjRg$ >>> > >>> >> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> ? >> Canada Research Chair in Mathematical and Computational Aspects of Solid >> Mechanics (Tier 1) >> Professor, Department of Mathematics & Statistics >> Hamilton Hall room 409A, McMaster University >> 1280 Main Street West, Hamilton, >> >> Ontario L8S 4K1, Canada >> >> >> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jan 12 20:29:42 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 12 Jan 2023 19:29:42 -0700 Subject: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh In-Reply-To: References: <87fscfpbv3.fsf@jedbrown.org> <9CFC1892-7D2F-4E6A-A46D-A9433D195CAE@mcmaster.ca> Message-ID: <877cxrp3op.fsf@jedbrown.org> Dave May writes: > On Thu 12. Jan 2023 at 17:58, Blaise Bourdin wrote: > >> Out of curiosity, what is the rationale for _reading_ high order gmsh >> meshes? >> > > GMSH can use a CAD engine like OpenCascade. This provides geometric > representations via things like BSplines. Such geometric representation are > not exposed to the users application code, nor are they embedded in any > mesh format GMSH emits. The next best thing is to use a high order > representation of the mesh geometry and project the CAD geometry (say a > BSpline) into this higher order function space. The projection of the > geometry is a quantity that can be described with the .msh format. Adding to this, efficient methods for volumes with concave surfaces *must* use at least quadratic geometry. See Figure 5, where "p-refinement with linear geometry" causes anti-convergence (due the spurious stress singularities from the linear geometry, visible in Figure 4) while p-refinement with quadratic geometry is vastly more efficient despite the physical stress singularities that prevent exponential convergence. https://arxiv.org/pdf/2204.01722.pdf From junchao.zhang at gmail.com Thu Jan 12 22:55:43 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 12 Jan 2023 22:55:43 -0600 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: On Thu, Jan 12, 2023 at 1:28 PM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Junchao > > Going back to this merge request. I'm not sure I follow exactly the usage > described in the commit history. I have a c prototype of what I am trying > to do which is > > PetscSectionCreate(PETSC_COMM_WORLD, &leafSection); > PetscSFDistributeSection(redistributionSF, filteredSection_local, &remoteOffsets, > leafSection); > PetscSFCreateSectionSF(redistributionSF, filteredSection_local, > remoteOffsets, leafSection, &redistributionSF_dof); > > But something seems unclear with the usage in fortran around the > remoteoffsets. Do I have to insert the CreateRemoteOffsetsF90 like so? Any > clarification would be greatly appreciated. > > call PetscSFDistributeSectionF90(distributionSF, section_filt_l, > remoteoffsets, leafSection, ierr) > call PetscSFCreateRemoteOffsetsf90(distributionSF, section_filt_l, > leafSection, remoteoffsets, ierr ) > call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, > remoteoffsets, leafSection, distributionSF_dof, ierr) > > Hi, Nicholas, Reading through comments at https://gitlab.com/petsc/petsc/-/merge_requests/5386#note_1022942470, I feel it should look like PetscInt, pointer :: remoteOffsets(:) call PetscSFDistributeSectionF90(distributionSF, section_filt_l, remoteoffsets, leafSection, ierr) // allocate remoteoffsets call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, remoteoffsets, leafSection, distributionSF_dof, ierr) call PetscIntArray1dDestroyF90(remoteOffsets,ierr) // free remoteoffsets when not needed Could you try it? Sincerely > Nicholas > > On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang > wrote: > >> Hi, Nicholas, >> It seems we have implemented it, but with another name, >> PetscSFCreateSectionSFF90, see >> https://gitlab.com/petsc/petsc/-/merge_requests/5386 >> Try it to see if it works! >> >> --Junchao Zhang >> >> >> On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Er to be honest I still can't get my stub to compile properly, and I >>> don't know how to go about making a merge request. But here is what I am >>> attempting right now. Let me know how best to proceed >>> >>> >>> Its not exactly clear to me how to setup up the remote offset properly. >>> >>> in src/vec/is/sf/interface/ftn-custom/zsf.c >>> >>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>> { >>> >>> int * remoteOffsets; >>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >>> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >>> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >>> >>> } >>> >>> This is the sticking point. >>> >>> Sincerely >>> Nicholas >>> >>> >>> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang >>> wrote: >>> >>>> Hi, Nicholas, >>>> Could you make a merge request to PETSc and then our Fortran experts >>>> can comment on your MR? >>>> Thanks. >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Junchao >>>>> >>>>> I think I'm almost there, but I could use some insight into how to use >>>>> the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>>> parameter input so if another function comes up, I can add it myself >>>>> without wasting your time. >>>>> I am very grateful for your help and time. >>>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang < >>>>> junchao.zhang at gmail.com> wrote: >>>>> >>>>>> Hi, Nicholas, >>>>>> I am not a fortran guy, but I will try to add >>>>>> petscsfcreatesectionsf. >>>>>> >>>>>> Thanks. >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> I think it should be something like this, but I'm not very fluent in >>>>>>> Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>>> >>>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection * >>>>>>> rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>>>>> { >>>>>>> >>>>>>> int * remoteOffsets; >>>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) >>>>>>> return; >>>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>>>> leafSection,*sectionSF);if (*ierr) return; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> Hi Junchao >>>>>>>> >>>>>>>> Thanks again for your help in November. I've been using the your >>>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>>> petscsfcreatesectionsf interface as well? >>>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>>> have been struggling with handling the section parameter properly. >>>>>>>> >>>>>>>> Sincerely >>>>>>>> Nicholas >>>>>>>> >>>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just one >>>>>>>>>> question: will the array outputs on the fortran side copies (and need to be >>>>>>>>>> deallocated) or direct access to the dmplex? >>>>>>>>>> >>>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sincerely >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>> See this MR, >>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Junchao >>>>>>>>>>>> >>>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>>> >>>>>>>>>>>> Sincerely >>>>>>>>>>>> Nicholas >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely >>>>>>>>>>>>>> Nicholas >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>> >>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>> University of Michigan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Thu Jan 12 23:10:16 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Fri, 13 Jan 2023 00:10:16 -0500 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: That is what I tried initially however, I get a segmentation fault. I can confirm it's due to the remote offsets because if I try and output remoteoffsets between the Distribute Section and Create Section it throws the same segmentation fault. Thanks for the help Nicholas On Thu, Jan 12, 2023 at 11:56 PM Junchao Zhang wrote: > > > On Thu, Jan 12, 2023 at 1:28 PM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Junchao >> >> Going back to this merge request. I'm not sure I follow exactly the usage >> described in the commit history. I have a c prototype of what I am trying >> to do which is >> >> PetscSectionCreate(PETSC_COMM_WORLD, &leafSection); >> PetscSFDistributeSection(redistributionSF, filteredSection_local, &remoteOffsets, >> leafSection); >> PetscSFCreateSectionSF(redistributionSF, filteredSection_local, >> remoteOffsets, leafSection, &redistributionSF_dof); >> >> But something seems unclear with the usage in fortran around the >> remoteoffsets. Do I have to insert the CreateRemoteOffsetsF90 like so? Any >> clarification would be greatly appreciated. >> >> call PetscSFDistributeSectionF90(distributionSF, section_filt_l, >> remoteoffsets, leafSection, ierr) >> call PetscSFCreateRemoteOffsetsf90(distributionSF, section_filt_l, >> leafSection, remoteoffsets, ierr ) >> call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, >> remoteoffsets, leafSection, distributionSF_dof, ierr) >> >> Hi, Nicholas, > Reading through comments at > https://gitlab.com/petsc/petsc/-/merge_requests/5386#note_1022942470, I > feel it should look like > > PetscInt, pointer :: remoteOffsets(:) > call PetscSFDistributeSectionF90(distributionSF, section_filt_l, > remoteoffsets, leafSection, ierr) // allocate remoteoffsets > call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, > remoteoffsets, leafSection, distributionSF_dof, ierr) > call PetscIntArray1dDestroyF90(remoteOffsets,ierr) // free remoteoffsets > when not needed > > Could you try it? > > > Sincerely >> Nicholas >> >> On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang >> wrote: >> >>> Hi, Nicholas, >>> It seems we have implemented it, but with another name, >>> PetscSFCreateSectionSFF90, see >>> https://gitlab.com/petsc/petsc/-/merge_requests/5386 >>> Try it to see if it works! >>> >>> --Junchao Zhang >>> >>> >>> On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Er to be honest I still can't get my stub to compile properly, and I >>>> don't know how to go about making a merge request. But here is what I am >>>> attempting right now. Let me know how best to proceed >>>> >>>> >>>> Its not exactly clear to me how to setup up the remote offset properly. >>>> >>>> in src/vec/is/sf/interface/ftn-custom/zsf.c >>>> >>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>> { >>>> >>>> int * remoteOffsets; >>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >>>> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >>>> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >>>> >>>> } >>>> >>>> This is the sticking point. >>>> >>>> Sincerely >>>> Nicholas >>>> >>>> >>>> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang >>>> wrote: >>>> >>>>> Hi, Nicholas, >>>>> Could you make a merge request to PETSc and then our Fortran experts >>>>> can comment on your MR? >>>>> Thanks. >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>>>> narnoldm at umich.edu> wrote: >>>>> >>>>>> Hi Junchao >>>>>> >>>>>> I think I'm almost there, but I could use some insight into how to >>>>>> use the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>>>> parameter input so if another function comes up, I can add it myself >>>>>> without wasting your time. >>>>>> I am very grateful for your help and time. >>>>>> >>>>>> Sincerely >>>>>> Nicholas >>>>>> >>>>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> Hi, Nicholas, >>>>>>> I am not a fortran guy, but I will try to add >>>>>>> petscsfcreatesectionsf. >>>>>>> >>>>>>> Thanks. >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>>>> narnoldm at umich.edu> wrote: >>>>>>> >>>>>>>> I think it should be something like this, but I'm not very fluent >>>>>>>> in Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>>>> >>>>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>>>>>>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO >>>>>>>> (remoteoffsetsd)) >>>>>>>> { >>>>>>>> >>>>>>>> int * remoteOffsets; >>>>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) >>>>>>>> return; >>>>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets,* >>>>>>>> leafSection,*sectionSF);if (*ierr) return; >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> Hi Junchao >>>>>>>>> >>>>>>>>> Thanks again for your help in November. I've been using the your >>>>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>>>> petscsfcreatesectionsf interface as well? >>>>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>>>> have been struggling with handling the section parameter properly. >>>>>>>>> >>>>>>>>> Sincerely >>>>>>>>> Nicholas >>>>>>>>> >>>>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> >>>>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just >>>>>>>>>>> one question: will the array outputs on the fortran side copies (and need >>>>>>>>>>> to be deallocated) or direct access to the dmplex? >>>>>>>>>>> >>>>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sincerely >>>>>>>>>>> Nicholas >>>>>>>>>>> >>>>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>> See this MR, >>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Junchao >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely >>>>>>>>>>>>> Nicholas >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely >>>>>>>>>>>>>>> Nicholas >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>> >>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>> >>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>> University of Michigan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>> >>>>>>>> Ph.D. Candidate >>>>>>>> Computational Aeroscience Lab >>>>>>>> University of Michigan >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Nicholas Arnold-Medabalimi >>>>>> >>>>>> Ph.D. Candidate >>>>>> Computational Aeroscience Lab >>>>>> University of Michigan >>>>>> >>>>> >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jan 12 23:29:45 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 12 Jan 2023 23:29:45 -0600 Subject: [petsc-users] PetscSF Fortran interface In-Reply-To: References: Message-ID: How about this? PetscInt, pointer :: remoteOffsets(:) call PetscSFCreateRemoteOffsetsf90(distributionSF, section_filt_l, leafSection, remoteoffsets, ierr ) call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, remoteoffsets, leafSection, distributionSF_dof, ierr) call PetscIntArray1dDestroyF90(remoteOffsets,ierr) // free remoteoffsets when not needed On Thu, Jan 12, 2023 at 11:11 PM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > That is what I tried initially however, I get a segmentation fault. I can > confirm it's due to the remote offsets because if I try and output > remoteoffsets between the Distribute Section and Create Section it throws > the same segmentation fault. > > Thanks for the help > Nicholas > > On Thu, Jan 12, 2023 at 11:56 PM Junchao Zhang > wrote: > >> >> >> On Thu, Jan 12, 2023 at 1:28 PM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Junchao >>> >>> Going back to this merge request. I'm not sure I follow exactly the >>> usage described in the commit history. I have a c prototype of what I am >>> trying to do which is >>> >>> PetscSectionCreate(PETSC_COMM_WORLD, &leafSection); >>> PetscSFDistributeSection(redistributionSF, filteredSection_local, &remoteOffsets, >>> leafSection); >>> PetscSFCreateSectionSF(redistributionSF, filteredSection_local, >>> remoteOffsets, leafSection, &redistributionSF_dof); >>> >>> But something seems unclear with the usage in fortran around the >>> remoteoffsets. Do I have to insert the CreateRemoteOffsetsF90 like so? Any >>> clarification would be greatly appreciated. >>> >>> call PetscSFDistributeSectionF90(distributionSF, section_filt_l, >>> remoteoffsets, leafSection, ierr) >>> call PetscSFCreateRemoteOffsetsf90(distributionSF, section_filt_l, >>> leafSection, remoteoffsets, ierr ) >>> call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, >>> remoteoffsets, leafSection, distributionSF_dof, ierr) >>> >>> Hi, Nicholas, >> Reading through comments at >> https://gitlab.com/petsc/petsc/-/merge_requests/5386#note_1022942470, I >> feel it should look like >> >> PetscInt, pointer :: remoteOffsets(:) >> call PetscSFDistributeSectionF90(distributionSF, section_filt_l, >> remoteoffsets, leafSection, ierr) // allocate remoteoffsets >> call PetscSFCreateSectionSFF90(distributionSF, section_filt_l, >> remoteoffsets, leafSection, distributionSF_dof, ierr) >> call PetscIntArray1dDestroyF90(remoteOffsets,ierr) // free remoteoffsets >> when not needed >> >> Could you try it? >> >> >> Sincerely >>> Nicholas >>> >>> On Tue, Jan 10, 2023 at 4:42 PM Junchao Zhang >>> wrote: >>> >>>> Hi, Nicholas, >>>> It seems we have implemented it, but with another name, >>>> PetscSFCreateSectionSFF90, see >>>> https://gitlab.com/petsc/petsc/-/merge_requests/5386 >>>> Try it to see if it works! >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Jan 10, 2023 at 11:45 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Er to be honest I still can't get my stub to compile properly, and I >>>>> don't know how to go about making a merge request. But here is what I am >>>>> attempting right now. Let me know how best to proceed >>>>> >>>>> >>>>> Its not exactly clear to me how to setup up the remote offset >>>>> properly. >>>>> >>>>> in src/vec/is/sf/interface/ftn-custom/zsf.c >>>>> >>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>>>> *rootSection, F90Array1d *aremoteOffsets, PetscSection *leafSection, >>>>> PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO(remoteoffsetsd)) >>>>> { >>>>> >>>>> int * remoteOffsets; >>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) >>>>> &remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) return; >>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, >>>>> &remoteOffsets,*leafSection,*sectionSF);if (*ierr) return; >>>>> >>>>> } >>>>> >>>>> This is the sticking point. >>>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> >>>>> On Tue, Jan 10, 2023 at 12:38 PM Junchao Zhang < >>>>> junchao.zhang at gmail.com> wrote: >>>>> >>>>>> Hi, Nicholas, >>>>>> Could you make a merge request to PETSc and then our Fortran >>>>>> experts can comment on your MR? >>>>>> Thanks. >>>>>> >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Tue, Jan 10, 2023 at 11:10 AM Nicholas Arnold-Medabalimi < >>>>>> narnoldm at umich.edu> wrote: >>>>>> >>>>>>> Hi Junchao >>>>>>> >>>>>>> I think I'm almost there, but I could use some insight into how to >>>>>>> use the PETSC_F90_2PTR_PROTO and F90Array1dAccess for the remoteOffset >>>>>>> parameter input so if another function comes up, I can add it myself >>>>>>> without wasting your time. >>>>>>> I am very grateful for your help and time. >>>>>>> >>>>>>> Sincerely >>>>>>> Nicholas >>>>>>> >>>>>>> On Tue, Jan 10, 2023 at 10:55 AM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> Hi, Nicholas, >>>>>>>> I am not a fortran guy, but I will try to add >>>>>>>> petscsfcreatesectionsf. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 10, 2023 at 12:50 AM Nicholas Arnold-Medabalimi < >>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>> >>>>>>>>> I think it should be something like this, but I'm not very fluent >>>>>>>>> in Fortran C interop syntax. Any advice would be appreciated. Thanks >>>>>>>>> >>>>>>>>> PETSC_EXTERN void petscsfcreatesectionsf(PetscSF *sf, PetscSection >>>>>>>>> *rootSection, F90Array1d *aremoteOffsets, PetscSection * >>>>>>>>> leafSection, PetscSF *sectionSF, int * ierr PETSC_F90_2PTR_PROTO >>>>>>>>> (remoteoffsetsd)) >>>>>>>>> { >>>>>>>>> >>>>>>>>> int * remoteOffsets; >>>>>>>>> *ierr = F90Array1dAccess(aremoteOffsets, PETSC_INT, (void**) & >>>>>>>>> remoteOffsets PETSC_F90_2PTR_PARAM(remoteoffsetsd));if (*ierr) >>>>>>>>> return; >>>>>>>>> *ierr = PetscSFCreateSectionSF(*sf,*rootSection, &remoteOffsets, >>>>>>>>> *leafSection,*sectionSF);if (*ierr) return; >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> On Mon, Jan 9, 2023 at 11:41 PM Nicholas Arnold-Medabalimi < >>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>> >>>>>>>>>> Hi Junchao >>>>>>>>>> >>>>>>>>>> Thanks again for your help in November. I've been using the your >>>>>>>>>> merge request branch quite heavily. Would it be possible to add a >>>>>>>>>> petscsfcreatesectionsf interface as well? >>>>>>>>>> I'm trying to write it myself using your commits as a guide but I >>>>>>>>>> have been struggling with handling the section parameter properly. >>>>>>>>>> >>>>>>>>>> Sincerely >>>>>>>>>> Nicholas >>>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2022 at 9:44 PM Junchao Zhang < >>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Nov 19, 2022 at 8:05 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> Thanks, this is awesome. Thanks for the very prompt fix. Just >>>>>>>>>>>> one question: will the array outputs on the fortran side copies (and need >>>>>>>>>>>> to be deallocated) or direct access to the dmplex? >>>>>>>>>>>> >>>>>>>>>>> Direct access to internal data; no need to deallocate >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Sincerely >>>>>>>>>>>> Nicholas >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Nov 19, 2022 at 8:21 PM Junchao Zhang < >>>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>>> See this MR, >>>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5860 >>>>>>>>>>>>> It is in testing, but you can try branch >>>>>>>>>>>>> jczhang/add-petscsf-fortran to see if it works for you. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Nov 19, 2022 at 4:16 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Junchao >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. I was wondering if there is any update on this. I may >>>>>>>>>>>>>> write a small interface for those two routines myself in the interim but >>>>>>>>>>>>>> I'd appreciate any insight you have. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely >>>>>>>>>>>>>> Nicholas >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 16, 2022 at 10:39 PM Junchao Zhang < >>>>>>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, Nicholas, >>>>>>>>>>>>>>> I will have a look and get back to you. >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < >>>>>>>>>>>>>>> narnoldm at umich.edu> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Petsc Users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm in the process of adding some Petsc for mesh management >>>>>>>>>>>>>>>> into an existing Fortran Solver. It has been relatively straightforward so >>>>>>>>>>>>>>>> far but I am running into an issue with using PetscSF routines. Some like >>>>>>>>>>>>>>>> the PetscSFGetGraph work no problem but a few of my routines require the >>>>>>>>>>>>>>>> use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to >>>>>>>>>>>>>>>> be in the fortran interface and I just get a linking error. I also don't >>>>>>>>>>>>>>>> seem to see a PetscSF file in the finclude. Any clarification or assistance >>>>>>>>>>>>>>>> would be appreciated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sincerely >>>>>>>>>>>>>>>> Nicholas >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>>>> University of Michigan >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>>>> >>>>>>>>>>>> Ph.D. Candidate >>>>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>>>> University of Michigan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>>> >>>>>>>>>> Ph.D. Candidate >>>>>>>>>> Computational Aeroscience Lab >>>>>>>>>> University of Michigan >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nicholas Arnold-Medabalimi >>>>>>>>> >>>>>>>>> Ph.D. Candidate >>>>>>>>> Computational Aeroscience Lab >>>>>>>>> University of Michigan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nicholas Arnold-Medabalimi >>>>>>> >>>>>>> Ph.D. Candidate >>>>>>> Computational Aeroscience Lab >>>>>>> University of Michigan >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Fri Jan 13 01:49:19 2023 From: mbuerkle at web.de (Marius Buerkle) Date: Fri, 13 Jan 2023 08:49:19 +0100 Subject: [petsc-users] MatConvert changes distribution of local rows Message-ID: An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri Jan 13 01:58:26 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 13 Jan 2023 08:58:26 +0100 Subject: [petsc-users] MatConvert changes distribution of local rows In-Reply-To: References: Message-ID: <5D47E796-E87B-458B-9DB0-3B46C9BCBF9E@joliv.et> > On 13 Jan 2023, at 8:49 AM, Marius Buerkle wrote: > > Hi, > > I have a matrix A for which I defined the number of local rows per process manually using MatSetSizes. When I use MatConvert to change the matrix type it changes the number of local rows (to what one would get if MatSetSize is called with PETSC_DECIDE for number of local rows), which causes problems when doing MatVec producs and stuff like that. Is there any way to preserve the the number of local rows when using MatConvert? This is most likely a bug, it?s not handled properly in some MatConvert() implementations. Could you please share either the matrix types or a minimal working example? Thanks, Pierre > > Best, > Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Fri Jan 13 02:18:41 2023 From: mbuerkle at web.de (Marius Buerkle) Date: Fri, 13 Jan 2023 09:18:41 +0100 Subject: [petsc-users] MatConvert changes distribution of local rows In-Reply-To: <5D47E796-E87B-458B-9DB0-3B46C9BCBF9E@joliv.et> References: <5D47E796-E87B-458B-9DB0-3B46C9BCBF9E@joliv.et> Message-ID: An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri Jan 13 02:25:35 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 13 Jan 2023 09:25:35 +0100 Subject: [petsc-users] MatConvert changes distribution of local rows In-Reply-To: References: <5D47E796-E87B-458B-9DB0-3B46C9BCBF9E@joliv.et> Message-ID: <0F9051DD-DEB9-429A-9884-EE784593F676@joliv.et> > On 13 Jan 2023, at 9:18 AM, Marius Buerkle wrote: > > Matrix types is from MATMPIDENSE to MATSCALAPACK, OK, that?s not possible, because PETSc and ScaLAPACK use different distributions for dense matrices. > but I think it happens also for other matrix types IIRC. Which one? Thanks, Pierre > > Gesendet: Freitag, 13. Januar 2023 um 16:58 Uhr > Von: "Pierre Jolivet" > An: "Marius Buerkle" > Cc: petsc-users at mcs.anl.gov > Betreff: Re: [petsc-users] MatConvert changes distribution of local rows > > On 13 Jan 2023, at 8:49 AM, Marius Buerkle wrote: > > Hi, > > I have a matrix A for which I defined the number of local rows per process manually using MatSetSizes. When I use MatConvert to change the matrix type it changes the number of local rows (to what one would get if MatSetSize is called with PETSC_DECIDE for number of local rows), which causes problems when doing MatVec producs and stuff like that. Is there any way to preserve the the number of local rows when using MatConvert? > > This is most likely a bug, it?s not handled properly in some MatConvert() implementations. > Could you please share either the matrix types or a minimal working example? > > Thanks, > Pierre > > > Best, > Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Fri Jan 13 02:56:19 2023 From: mbuerkle at web.de (Marius Buerkle) Date: Fri, 13 Jan 2023 09:56:19 +0100 Subject: [petsc-users] MatConvert changes distribution of local rows In-Reply-To: <0F9051DD-DEB9-429A-9884-EE784593F676@joliv.et> References: <5D47E796-E87B-458B-9DB0-3B46C9BCBF9E@joliv.et> <0F9051DD-DEB9-429A-9884-EE784593F676@joliv.et> Message-ID: An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Fri Jan 13 08:23:37 2023 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Fri, 13 Jan 2023 15:23:37 +0100 Subject: [petsc-users] Retreiving a PetscObject Message-ID: Hi, Is it possible with PETSc's API to query an objet by it's name? Is there a "global" database with all PETScObject created? Thanks, Eric -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 From knepley at gmail.com Fri Jan 13 08:39:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Jan 2023 04:39:58 -1000 Subject: [petsc-users] Retreiving a PetscObject In-Reply-To: References: Message-ID: On Fri, Jan 13, 2023 at 4:24 AM Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi, > > Is it possible with PETSc's API to query an objet by it's name? > > Is there a "global" database with all PETScObject created? > No, we do not store that information. What would the use case be? Maybe there is another way to do it. Thanks, Matt > Thanks, > > Eric > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Fri Jan 13 09:58:45 2023 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Fri, 13 Jan 2023 16:58:45 +0100 Subject: [petsc-users] Retreiving a PetscObject In-Reply-To: References: Message-ID: <9e6a46a0-07ea-1b42-5f00-40307d97b1cf@giref.ulaval.ca> ok, here is what I want to do: I am writing a brand new PC which needs some other PETSc objects like a Vec (for unit partition). We have a text based user file format in which the user can create arbitrary Petsc's Vec or Mat. Then, I would like the user to give the prefix of these Vec or Mat as a parameter for the new PC that we would like to configure completely via the "petsc options database". So in our PCApply_XXX we would like to retrieve some specific options which will simply give the Vec prefix to retreive. Thanks, Eric On 2023-01-13 15:39, Matthew Knepley wrote: > On Fri, Jan 13, 2023 at 4:24 AM Eric Chamberland > wrote: > > Hi, > > Is it possible with PETSc's API to query an objet by it's name? > > Is there a "global" database with all PETScObject created? > > > No, we do not store that information. What would the use case be? > Maybe there is another way to do it. > > ? Thanks, > > ? ? Matt > > Thanks, > > Eric > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 13 10:16:38 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Jan 2023 06:16:38 -1000 Subject: [petsc-users] Retreiving a PetscObject In-Reply-To: <9e6a46a0-07ea-1b42-5f00-40307d97b1cf@giref.ulaval.ca> References: <9e6a46a0-07ea-1b42-5f00-40307d97b1cf@giref.ulaval.ca> Message-ID: On Fri, Jan 13, 2023 at 5:58 AM Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > ok, here is what I want to do: > > I am writing a brand new PC which needs some other PETSc objects like a > Vec (for unit partition). > > We have a text based user file format in which the user can create > arbitrary Petsc's Vec or Mat. > > Then, I would like the user to give the prefix of these Vec or Mat as a > parameter for the new PC that we would like to configure completely via the > "petsc options database". > > So in our PCApply_XXX we would like to retrieve some specific options > which will simply give the Vec prefix to retreive. > > Okay, so a named vector is created when options are processed, and you want to use that vector by name inside your new PC. 1. You could add that vector to a list you manage at creation time. This means that you are in charge of that list. 2. DM has named vectors ( https://petsc.org/release/docs/manualpages/DM/DMGetNamedGlobalVector/) but not matrices You could probably just extract the DM code to manage your objects. Thanks, Matt > Thanks, > > Eric > > > On 2023-01-13 15:39, Matthew Knepley wrote: > > On Fri, Jan 13, 2023 at 4:24 AM Eric Chamberland < > Eric.Chamberland at giref.ulaval.ca> wrote: > >> Hi, >> >> Is it possible with PETSc's API to query an objet by it's name? >> >> Is there a "global" database with all PETScObject created? >> > > No, we do not store that information. What would the use case be? Maybe > there is another way to do it. > > Thanks, > > Matt > > >> Thanks, >> >> Eric >> >> -- >> Eric Chamberland, ing., M. Ing >> Professionnel de recherche >> GIREF/Universit? Laval >> (418) 656-2131 poste 41 22 42 >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis.marboeuf at hotmail.fr Fri Jan 13 15:21:46 2023 From: alexis.marboeuf at hotmail.fr (Alexis Marboeuf) Date: Fri, 13 Jan 2023 21:21:46 +0000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) Message-ID: Hi all, In a variational approach of brittle fracture setting, I try to solve a bound constraint minimization problem using TAO. I checkout on the main branch of Petsc. Minimization with respect to the bounded variable (damage) is achieved through the Bounded Newton Trust Region (TAOBNTR). All other TAO parameters are set by default. On a Linux machine, I get the following error with a 4 processors run: [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Nonconforming object sizes [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal input vector size 1161 [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [3]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [3]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [3]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [3]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [3]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [3]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [2]PETSC ERROR: Nonconforming object sizes [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal input vector size 1254 [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [2]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [2]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [2]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [2]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [2]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [2]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT 2023-01-12T17:21:07 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: bb01: tasks 0-1: Killed srun: error: bb01: tasks 2-3: Exited with exit code 1 The error is raised in the middle of the computation after many successful calls of TAOSolve and TAO iterations. My guess is that TAO computes the preconditioner during its first iteration with all variables in the active set. But the preconditioner is never updated when some variables are moved to the inactive set during the next TAO iterations. Am I right? Can you help me with that? Thanks a lot for your help and your time. Regards, Alexis -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 13 18:38:41 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Jan 2023 14:38:41 -1000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: References: Message-ID: On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf wrote: > Hi all, > > In a variational approach of brittle fracture setting, I try to solve a > bound constraint minimization problem using TAO. I checkout on the main > branch of Petsc. Minimization with respect to the bounded variable (damage) > is achieved through the Bounded Newton Trust Region (TAOBNTR). All other > TAO parameters are set by default. On a Linux machine, I get the following > error with a 4 processors run: > Can you view the solver? Thanks, Matt > > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Nonconforming object sizes > [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal > input vector size 1161 > [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 > GIT Date: 2023-01-04 13:37:04 +0000 > [3]PETSC ERROR: > /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a > arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none > --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" > --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 > --download-mumps=1 --download-chaco=1 --download-exodusii=1 > --download-hypre=1 --download-ml=1 --download-triangle > --download-scalapack=1 --download-superlu=1 --download-sowing=1 > --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc > --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ > --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 > --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 > --download-zlib=1 --with-cmake=1 --with-debugging=0 > --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib > --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic > --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c > [3]PETSC ERROR: #1 PCApply() at /1 > /home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 > [3]PETSC ERROR: #2 KSP_PCApply() at > /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 > [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 > [3]PETSC ERROR: #4 KSPSolve_Private() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 > [3]PETSC ERROR: #5 KSPSolve() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 > [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 > [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 > [3]PETSC ERROR: #8 TaoSolve() at /1 > /home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 > [2]PETSC ERROR: Nonconforming object sizes > [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal > input vector size 1254 > [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 > GIT Date: 2023-01-04 13:37:04 +0000 > [2]PETSC ERROR: > /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a > arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 > [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none > --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" > --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 > --download-mumps=1 --download-chaco=1 --download-exodusii=1 > --download-hypre=1 --download-ml=1 --download-triangle > --download-scalapack=1 --download-superlu=1 --download-sowing=1 > --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc > --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ > --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 > --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 > --download-zlib=1 --with-cmake=1 --with-debugging=0 > --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib > --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic > --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c > [2]PETSC ERROR: #1 PCApply() at /1 > /home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 > [2]PETSC ERROR: #2 KSP_PCApply() at > /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 > [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 > [2]PETSC ERROR: #4 KSPSolve_Private() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 > [2]PETSC ERROR: #5 KSPSolve() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 > [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 > [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 > [2]PETSC ERROR: #8 TaoSolve() at /1 > /home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 > [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT > 2023-01-12T17:21:07 *** > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > srun: error: bb01: tasks 0-1: Killed > srun: error: bb01: tasks 2-3: Exited with exit code 1 > > The error is raised in the middle of the computation after many successful > calls of TAOSolve and TAO iterations. My guess is that TAO computes the > preconditioner during its first iteration with all variables in the active > set. But the preconditioner is never updated when some variables are moved > to the inactive set during the next TAO iterations. Am I right? Can you > help me with that? > > Thanks a lot for your help and your time. > Regards, > Alexis > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis.marboeuf at hotmail.fr Fri Jan 13 19:21:10 2023 From: alexis.marboeuf at hotmail.fr (Alexis Marboeuf) Date: Sat, 14 Jan 2023 01:21:10 +0000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: References: Message-ID: Hi Matt, Here is the output from Petsc when I view the TAO solver: Tao Object: (Damage_) 4 MPI processes type: bntr Tao Object: (Damage_tao_bnk_cg_) 4 MPI processes type: bncg CG Type: ssml_bfgs Skipped Stepdirection Updates: 0 Scaled gradient steps: 0 Pure gradient steps: 0 Not a descent direction: 0 Line search fails: 0 Matrix has not been preallocated yet TaoLineSearch Object: (Damage_tao_bnk_cg_) 4 MPI processes type: more-thuente maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=0 total number of gradient evaluations=0 total number of function/gradient evaluations=0 Termination reason: 0 Active Set subset type: subvec convergence tolerances: gatol=1e-08, steptol=0., gttol=0. Residual in Function/Gradient:=0. Objective value=0. total number of iterations=0, (max: 2000) Solver terminated: 0 Rejected BFGS updates: 0 CG steps: 0 Newton steps: 11 BFGS steps: 0 Scaled gradient steps: 0 Gradient steps: 0 KSP termination reasons: atol: 4 rtol: 0 ctol: 7 negc: 0 dtol: 0 iter: 0 othr: 0 TaoLineSearch Object: (Damage_) 4 MPI processes type: more-thuente maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=0 total number of gradient evaluations=0 total number of function/gradient evaluations=0 using variable bounds Termination reason: 0 KSP Object: (Damage_tao_bnk_) 4 MPI processes type: stcg maximum iterations=10000, nonzero initial guess tolerances: relative=1e-08, absolute=1e-08, divergence=1e+10 left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: (Damage_tao_bnk_) 4 MPI processes type: lmvm Mat Object: (Damage_tao_bnk_pc_lmvm_) 4 MPI processes type: lmvmbfgs rows=30634, cols=30634 Scale type: DIAGONAL Scale history: 1 Scale params: alpha=1., beta=0.5, rho=1. Convex factors: phi=0., theta=0.125 Max. storage: 5 Used storage: 5 Number of updates: 11 Number of rejects: 0 Number of resets: 0 Mat Object: (Damage_tao_bnk_pc_lmvm_J0_) 4 MPI processes type: lmvmdiagbroyden rows=30634, cols=30634 Scale history: 1 Scale params: alpha=1., beta=0.5, rho=1. Convex factor: theta=0.125 Max. storage: 1 Used storage: 1 Number of updates: 11 Number of rejects: 0 Number of resets: 0 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=468, cols=468 total: nonzeros=2932, allocated nonzeros=2932 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines total KSP iterations: 103 Active Set subset type: subvec convergence tolerances: gatol=0.0001, steptol=0., gttol=1e-05 Residual in Function/Gradient:=9.11153e-05 Objective value=0.00665458 total number of iterations=11, (max: 50) total number of function evaluations=17, max: -1 total number of gradient evaluations=13, max: -1 total number of Hessian evaluations=12 Solution converged: ||g(X)|| <= gatol Thanks again for your help! Alexis ________________________________ De : Matthew Knepley Envoy? : samedi 14 janvier 2023 01:38 ? : Alexis Marboeuf Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf > wrote: Hi all, In a variational approach of brittle fracture setting, I try to solve a bound constraint minimization problem using TAO. I checkout on the main branch of Petsc. Minimization with respect to the bounded variable (damage) is achieved through the Bounded Newton Trust Region (TAOBNTR). All other TAO parameters are set by default. On a Linux machine, I get the following error with a 4 processors run: Can you view the solver? Thanks, Matt [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Nonconforming object sizes [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal input vector size 1161 [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [3]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [3]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [3]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [3]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [3]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [3]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [2]PETSC ERROR: Nonconforming object sizes [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal input vector size 1254 [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [2]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [2]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [2]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [2]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [2]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [2]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT 2023-01-12T17:21:07 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: bb01: tasks 0-1: Killed srun: error: bb01: tasks 2-3: Exited with exit code 1 The error is raised in the middle of the computation after many successful calls of TAOSolve and TAO iterations. My guess is that TAO computes the preconditioner during its first iteration with all variables in the active set. But the preconditioner is never updated when some variables are moved to the inactive set during the next TAO iterations. Am I right? Can you help me with that? Thanks a lot for your help and your time. Regards, Alexis -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 13 19:38:53 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Jan 2023 15:38:53 -1000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: References: Message-ID: On Fri, Jan 13, 2023 at 3:21 PM Alexis Marboeuf wrote: > Hi Matt, > > Here is the output from Petsc when I view the TAO solver: > Okay, there is no sophisticated caching going on. So, first I would get it to fail on1 process. It should if it just depends on the convergence (I hope). Then send the source so we can run it. It should be simple for us to find where system size changes and the KSP is not reset (if that indeed is what happens). Thanks, Matt > Tao Object: (Damage_) 4 MPI processes > > type: bntr > > Tao Object: (Damage_tao_bnk_cg_) 4 MPI processes > > type: bncg > > CG Type: ssml_bfgs > > Skipped Stepdirection Updates: 0 > > Scaled gradient steps: 0 > > Pure gradient steps: 0 > > Not a descent direction: 0 > > Line search fails: 0 > > Matrix has not been preallocated yet > > TaoLineSearch Object: (Damage_tao_bnk_cg_) 4 MPI processes > > type: more-thuente > > maximum function evaluations=30 > > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > > total number of function evaluations=0 > > total number of gradient evaluations=0 > > total number of function/gradient evaluations=0 > > Termination reason: 0 > > Active Set subset type: subvec > > convergence tolerances: gatol=1e-08, steptol=0., > gttol=0. > > Residual in Function/Gradient:=0. > > Objective value=0. > > total number of iterations=0, (max: > 2000) > > Solver terminated: 0 > > Rejected BFGS updates: 0 > > CG steps: 0 > > Newton steps: 11 > > BFGS steps: 0 > > Scaled gradient steps: 0 > > Gradient steps: 0 > > KSP termination reasons: > > atol: 4 > > rtol: 0 > > ctol: 7 > > negc: 0 > > dtol: 0 > > iter: 0 > > othr: 0 > > TaoLineSearch Object: (Damage_) 4 MPI processes > > type: more-thuente > > maximum function evaluations=30 > > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > > total number of function evaluations=0 > > total number of gradient evaluations=0 > > total number of function/gradient evaluations=0 > > using variable bounds > > Termination reason: 0 > > KSP Object: (Damage_tao_bnk_) 4 MPI processes > > type: stcg > > maximum iterations=10000, nonzero initial guess > > tolerances: relative=1e-08, absolute=1e-08, divergence=1e+10 > > left preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: (Damage_tao_bnk_) 4 MPI processes > > type: lmvm > > Mat Object: (Damage_tao_bnk_pc_lmvm_) 4 MPI processes > > type: lmvmbfgs > > rows=30634, cols=30634 > > Scale type: DIAGONAL > > Scale history: 1 > > Scale params: alpha=1., beta=0.5, rho=1. > > Convex factors: phi=0., theta=0.125 > > Max. storage: 5 > > Used storage: 5 > > Number of updates: 11 > > Number of rejects: 0 > > Number of resets: 0 > > Mat Object: (Damage_tao_bnk_pc_lmvm_J0_) 4 MPI processes > > type: lmvmdiagbroyden > > rows=30634, cols=30634 > > Scale history: 1 > > Scale params: alpha=1., beta=0.5, rho=1. > > Convex factor: theta=0.125 > > Max. storage: 1 > > Used storage: 1 > > Number of updates: 11 > > Number of rejects: 0 > > Number of resets: 0 > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=468, cols=468 > > total: nonzeros=2932, allocated nonzeros=2932 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > total KSP iterations: 103 > > Active Set subset type: subvec > > convergence tolerances: gatol=0.0001, steptol=0., gttol=1e-05 > > Residual in Function/Gradient:=9.11153e-05 > > Objective value=0.00665458 > > total number of iterations=11, (max: 50) > > total number of function evaluations=17, max: -1 > > total number of gradient evaluations=13, max: -1 > > total number of Hessian evaluations=12 > > Solution converged: ||g(X)|| <= gatol > > Thanks again for your help! > Alexis > > ------------------------------ > *De :* Matthew Knepley > *Envoy? :* samedi 14 janvier 2023 01:38 > *? :* Alexis Marboeuf > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) > > On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf < > alexis.marboeuf at hotmail.fr> wrote: > > Hi all, > > In a variational approach of brittle fracture setting, I try to solve a > bound constraint minimization problem using TAO. I checkout on the main > branch of Petsc. Minimization with respect to the bounded variable (damage) > is achieved through the Bounded Newton Trust Region (TAOBNTR). All other > TAO parameters are set by default. On a Linux machine, I get the following > error with a 4 processors run: > > > Can you view the solver? > > Thanks, > > Matt > > > > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Nonconforming object sizes > [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal > input vector size 1161 > [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 > GIT Date: 2023-01-04 13:37:04 +0000 > [3]PETSC ERROR: > /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a > arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none > --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" > --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 > --download-mumps=1 --download-chaco=1 --download-exodusii=1 > --download-hypre=1 --download-ml=1 --download-triangle > --download-scalapack=1 --download-superlu=1 --download-sowing=1 > --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc > --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ > --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 > --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 > --download-zlib=1 --with-cmake=1 --with-debugging=0 > --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib > --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic > --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c > [3]PETSC ERROR: #1 PCApply() at /1 > /home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 > [3]PETSC ERROR: #2 KSP_PCApply() at > /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 > [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 > [3]PETSC ERROR: #4 KSPSolve_Private() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 > [3]PETSC ERROR: #5 KSPSolve() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 > [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 > [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 > [3]PETSC ERROR: #8 TaoSolve() at /1 > /home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 > [2]PETSC ERROR: Nonconforming object sizes > [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal > input vector size 1254 > [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 > GIT Date: 2023-01-04 13:37:04 +0000 > [2]PETSC ERROR: > /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a > arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 > [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none > --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" > --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 > --download-mumps=1 --download-chaco=1 --download-exodusii=1 > --download-hypre=1 --download-ml=1 --download-triangle > --download-scalapack=1 --download-superlu=1 --download-sowing=1 > --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc > --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ > --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp > --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 > --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 > --download-zlib=1 --with-cmake=1 --with-debugging=0 > --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib > --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic > --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c > [2]PETSC ERROR: #1 PCApply() at /1 > /home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 > [2]PETSC ERROR: #2 KSP_PCApply() at > /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 > [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 > [2]PETSC ERROR: #4 KSPSolve_Private() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 > [2]PETSC ERROR: #5 KSPSolve() at /1 > /home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 > [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 > [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1 > /home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 > [2]PETSC ERROR: #8 TaoSolve() at /1 > /home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 > [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT > 2023-01-12T17:21:07 *** > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > srun: error: bb01: tasks 0-1: Killed > srun: error: bb01: tasks 2-3: Exited with exit code 1 > > The error is raised in the middle of the computation after many successful > calls of TAOSolve and TAO iterations. My guess is that TAO computes the > preconditioner during its first iteration with all variables in the active > set. But the preconditioner is never updated when some variables are moved > to the inactive set during the next TAO iterations. Am I right? Can you > help me with that? > > Thanks a lot for your help and your time. > Regards, > Alexis > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis.marboeuf at hotmail.fr Fri Jan 13 22:24:31 2023 From: alexis.marboeuf at hotmail.fr (Alexis Marboeuf) Date: Sat, 14 Jan 2023 04:24:31 +0000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: References: Message-ID: Hi Matt, Indeed, it fails on 1 process with the same error. The source code is available here: https://github.com/bourdin/mef90 (branch marboeuf/vdef-tao-test) [https://opengraph.githubassets.com/8f51eb183957c4e2f2dd59e2733f43a7bc667a50d4aaad934ebb3ac8f25a17ab/bourdin/mef90] GitHub - bourdin/mef90: Official repository for mef90/vDef mef90 / vDef: A reference implementation of the variational approach to fracture, as described in: Francfort, G. and Marigo, J.-J. (1998). Revisiting brittle fracture as an energy minimization problem. github.com I can share the details (installation + command line) for running it. But the ideal would be to reproduce this error with a Petsc example so it's easier for you to investigate. I looked for a bound constraint minimization problem with TAO and TS but I didn't find it. What example could I use? Thanks! Alexis ________________________________ De : Matthew Knepley Envoy? : samedi 14 janvier 2023 02:38 ? : Alexis Marboeuf Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) On Fri, Jan 13, 2023 at 3:21 PM Alexis Marboeuf > wrote: Hi Matt, Here is the output from Petsc when I view the TAO solver: Okay, there is no sophisticated caching going on. So, first I would get it to fail on1 process. It should if it just depends on the convergence (I hope). Then send the source so we can run it. It should be simple for us to find where system size changes and the KSP is not reset (if that indeed is what happens). Thanks, Matt Tao Object: (Damage_) 4 MPI processes type: bntr Tao Object: (Damage_tao_bnk_cg_) 4 MPI processes type: bncg CG Type: ssml_bfgs Skipped Stepdirection Updates: 0 Scaled gradient steps: 0 Pure gradient steps: 0 Not a descent direction: 0 Line search fails: 0 Matrix has not been preallocated yet TaoLineSearch Object: (Damage_tao_bnk_cg_) 4 MPI processes type: more-thuente maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=0 total number of gradient evaluations=0 total number of function/gradient evaluations=0 Termination reason: 0 Active Set subset type: subvec convergence tolerances: gatol=1e-08, steptol=0., gttol=0. Residual in Function/Gradient:=0. Objective value=0. total number of iterations=0, (max: 2000) Solver terminated: 0 Rejected BFGS updates: 0 CG steps: 0 Newton steps: 11 BFGS steps: 0 Scaled gradient steps: 0 Gradient steps: 0 KSP termination reasons: atol: 4 rtol: 0 ctol: 7 negc: 0 dtol: 0 iter: 0 othr: 0 TaoLineSearch Object: (Damage_) 4 MPI processes type: more-thuente maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=0 total number of gradient evaluations=0 total number of function/gradient evaluations=0 using variable bounds Termination reason: 0 KSP Object: (Damage_tao_bnk_) 4 MPI processes type: stcg maximum iterations=10000, nonzero initial guess tolerances: relative=1e-08, absolute=1e-08, divergence=1e+10 left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: (Damage_tao_bnk_) 4 MPI processes type: lmvm Mat Object: (Damage_tao_bnk_pc_lmvm_) 4 MPI processes type: lmvmbfgs rows=30634, cols=30634 Scale type: DIAGONAL Scale history: 1 Scale params: alpha=1., beta=0.5, rho=1. Convex factors: phi=0., theta=0.125 Max. storage: 5 Used storage: 5 Number of updates: 11 Number of rejects: 0 Number of resets: 0 Mat Object: (Damage_tao_bnk_pc_lmvm_J0_) 4 MPI processes type: lmvmdiagbroyden rows=30634, cols=30634 Scale history: 1 Scale params: alpha=1., beta=0.5, rho=1. Convex factor: theta=0.125 Max. storage: 1 Used storage: 1 Number of updates: 11 Number of rejects: 0 Number of resets: 0 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=468, cols=468 total: nonzeros=2932, allocated nonzeros=2932 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines total KSP iterations: 103 Active Set subset type: subvec convergence tolerances: gatol=0.0001, steptol=0., gttol=1e-05 Residual in Function/Gradient:=9.11153e-05 Objective value=0.00665458 total number of iterations=11, (max: 50) total number of function evaluations=17, max: -1 total number of gradient evaluations=13, max: -1 total number of Hessian evaluations=12 Solution converged: ||g(X)|| <= gatol Thanks again for your help! Alexis ________________________________ De : Matthew Knepley > Envoy? : samedi 14 janvier 2023 01:38 ? : Alexis Marboeuf > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf > wrote: Hi all, In a variational approach of brittle fracture setting, I try to solve a bound constraint minimization problem using TAO. I checkout on the main branch of Petsc. Minimization with respect to the bounded variable (damage) is achieved through the Bounded Newton Trust Region (TAOBNTR). All other TAO parameters are set by default. On a Linux machine, I get the following error with a 4 processors run: Can you view the solver? Thanks, Matt [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Nonconforming object sizes [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal input vector size 1161 [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [3]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [3]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [3]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [3]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [3]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [3]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [2]PETSC ERROR: Nonconforming object sizes [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal input vector size 1254 [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [2]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [2]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [2]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [2]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [2]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [2]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT 2023-01-12T17:21:07 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: bb01: tasks 0-1: Killed srun: error: bb01: tasks 2-3: Exited with exit code 1 The error is raised in the middle of the computation after many successful calls of TAOSolve and TAO iterations. My guess is that TAO computes the preconditioner during its first iteration with all variables in the active set. But the preconditioner is never updated when some variables are moved to the inactive set during the next TAO iterations. Am I right? Can you help me with that? Thanks a lot for your help and your time. Regards, Alexis -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis.marboeuf at hotmail.fr Mon Jan 16 14:14:54 2023 From: alexis.marboeuf at hotmail.fr (Alexis Marboeuf) Date: Mon, 16 Jan 2023 20:14:54 +0000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: References: Message-ID: Hi Matt, After investigation, it fails because, at some point, the boolean needH is set to PETSC_FALSE when initializing the BNK method with TAOBNKInitialize (line 103 of $PETSC_DIR/src/tao/bound/impls/bnk/bntr.c). The Hessian and the precondtitioner are thus not updated throughout the TAO iterations. It has something to do with the option BNK_INIT_INTERPOLATION set by default. It works when I choose BNK_INIT_CONSTANT. In my case, in all the successful calls of TAOSolve, the computed trial objective value is better than the current value which implies needH = PETSC_TRUE within TAOBNKInitialize. At some point, the trial value becomes equal to the current objective value up to machine precision and then, needH = PETSC_FALSE. I have to admit I am struggling understanding how that boolean needH is computed when BNK is initialized with BNK_INIT_INTERPOLATION. Can you help me with that? Thanks a lot. Alexis ________________________________ De : Alexis Marboeuf Envoy? : samedi 14 janvier 2023 05:24 ? : Matthew Knepley Cc : petsc-users at mcs.anl.gov Objet : RE: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) Hi Matt, Indeed, it fails on 1 process with the same error. The source code is available here: https://github.com/bourdin/mef90 (branch marboeuf/vdef-tao-test) [https://opengraph.githubassets.com/8f51eb183957c4e2f2dd59e2733f43a7bc667a50d4aaad934ebb3ac8f25a17ab/bourdin/mef90] GitHub - bourdin/mef90: Official repository for mef90/vDef mef90 / vDef: A reference implementation of the variational approach to fracture, as described in: Francfort, G. and Marigo, J.-J. (1998). Revisiting brittle fracture as an energy minimization problem. github.com I can share the details (installation + command line) for running it. But the ideal would be to reproduce this error with a Petsc example so it's easier for you to investigate. I looked for a bound constraint minimization problem with TAO and TS but I didn't find it. What example could I use? Thanks! Alexis ________________________________ De : Matthew Knepley Envoy? : samedi 14 janvier 2023 02:38 ? : Alexis Marboeuf Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) On Fri, Jan 13, 2023 at 3:21 PM Alexis Marboeuf > wrote: Hi Matt, Here is the output from Petsc when I view the TAO solver: Okay, there is no sophisticated caching going on. So, first I would get it to fail on1 process. It should if it just depends on the convergence (I hope). Then send the source so we can run it. It should be simple for us to find where system size changes and the KSP is not reset (if that indeed is what happens). Thanks, Matt Tao Object: (Damage_) 4 MPI processes type: bntr Tao Object: (Damage_tao_bnk_cg_) 4 MPI processes type: bncg CG Type: ssml_bfgs Skipped Stepdirection Updates: 0 Scaled gradient steps: 0 Pure gradient steps: 0 Not a descent direction: 0 Line search fails: 0 Matrix has not been preallocated yet TaoLineSearch Object: (Damage_tao_bnk_cg_) 4 MPI processes type: more-thuente maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=0 total number of gradient evaluations=0 total number of function/gradient evaluations=0 Termination reason: 0 Active Set subset type: subvec convergence tolerances: gatol=1e-08, steptol=0., gttol=0. Residual in Function/Gradient:=0. Objective value=0. total number of iterations=0, (max: 2000) Solver terminated: 0 Rejected BFGS updates: 0 CG steps: 0 Newton steps: 11 BFGS steps: 0 Scaled gradient steps: 0 Gradient steps: 0 KSP termination reasons: atol: 4 rtol: 0 ctol: 7 negc: 0 dtol: 0 iter: 0 othr: 0 TaoLineSearch Object: (Damage_) 4 MPI processes type: more-thuente maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=0 total number of gradient evaluations=0 total number of function/gradient evaluations=0 using variable bounds Termination reason: 0 KSP Object: (Damage_tao_bnk_) 4 MPI processes type: stcg maximum iterations=10000, nonzero initial guess tolerances: relative=1e-08, absolute=1e-08, divergence=1e+10 left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: (Damage_tao_bnk_) 4 MPI processes type: lmvm Mat Object: (Damage_tao_bnk_pc_lmvm_) 4 MPI processes type: lmvmbfgs rows=30634, cols=30634 Scale type: DIAGONAL Scale history: 1 Scale params: alpha=1., beta=0.5, rho=1. Convex factors: phi=0., theta=0.125 Max. storage: 5 Used storage: 5 Number of updates: 11 Number of rejects: 0 Number of resets: 0 Mat Object: (Damage_tao_bnk_pc_lmvm_J0_) 4 MPI processes type: lmvmdiagbroyden rows=30634, cols=30634 Scale history: 1 Scale params: alpha=1., beta=0.5, rho=1. Convex factor: theta=0.125 Max. storage: 1 Used storage: 1 Number of updates: 11 Number of rejects: 0 Number of resets: 0 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=468, cols=468 total: nonzeros=2932, allocated nonzeros=2932 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines total KSP iterations: 103 Active Set subset type: subvec convergence tolerances: gatol=0.0001, steptol=0., gttol=1e-05 Residual in Function/Gradient:=9.11153e-05 Objective value=0.00665458 total number of iterations=11, (max: 50) total number of function evaluations=17, max: -1 total number of gradient evaluations=13, max: -1 total number of Hessian evaluations=12 Solution converged: ||g(X)|| <= gatol Thanks again for your help! Alexis ________________________________ De : Matthew Knepley > Envoy? : samedi 14 janvier 2023 01:38 ? : Alexis Marboeuf > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf > wrote: Hi all, In a variational approach of brittle fracture setting, I try to solve a bound constraint minimization problem using TAO. I checkout on the main branch of Petsc. Minimization with respect to the bounded variable (damage) is achieved through the Bounded Newton Trust Region (TAOBNTR). All other TAO parameters are set by default. On a Linux machine, I get the following error with a 4 processors run: Can you view the solver? Thanks, Matt [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Nonconforming object sizes [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal input vector size 1161 [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [3]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [3]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [3]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [3]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [3]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [3]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [2]PETSC ERROR: Nonconforming object sizes [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal input vector size 1254 [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 [2]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c [2]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 [2]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 [2]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 [2]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 [2]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT 2023-01-12T17:21:07 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: bb01: tasks 0-1: Killed srun: error: bb01: tasks 2-3: Exited with exit code 1 The error is raised in the middle of the computation after many successful calls of TAOSolve and TAO iterations. My guess is that TAO computes the preconditioner during its first iteration with all variables in the active set. But the preconditioner is never updated when some variables are moved to the inactive set during the next TAO iterations. Am I right? Can you help me with that? Thanks a lot for your help and your time. Regards, Alexis -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Mon Jan 16 16:07:37 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Mon, 16 Jan 2023 22:07:37 +0000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: References: Message-ID: <69C797E7-6916-468D-A879-D261D07C458A@mcmaster.ca> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: eptorsion1.c Type: application/octet-stream Size: 21380 bytes Desc: eptorsion1.c URL: From FERRANJ2 at my.erau.edu Mon Jan 16 16:41:23 2023 From: FERRANJ2 at my.erau.edu (Ferrand, Jesus A.) Date: Mon, 16 Jan 2023 22:41:23 +0000 Subject: [petsc-users] DMPlex and CGNS Message-ID: Dear PETSc team: I would like to use DMPlex to partition a mesh stored as a CGNS file. I configured my installation with --download_cgns = 1, got me a .cgns file and called DMPlexCreateCGNSFromFile() on it. Doing so got me this error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [0]PETSC ERROR: CGNS file must have a single section, not 4 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon Jan 16 17:25:11 2023 [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-cgns=yes --download-metis=yes --download-parmetis=yes --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1 [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104 [0]PETSC ERROR: #2 DMPlexCreateCGNS() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60 [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27 [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29 I looked around mail archives for clused and found this one (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() and another (seemingly broken) link to CGNS files that can be opened with the former. After reading the source code I now understand that it is hardcoded for CGNS files that feature a single "base" and a single "section", whatever those are. After navigating the CGNS documentation, I can sympathize with the comments in the source code. Anyhow, I wanted to ask if I could be furnished with one such CGNS file that is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS files to conform to it. If not, I will look into building the DAG myself using DMPlex APIs. Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach FL M.Sc. Aerospace Engineering B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jan 16 17:15:07 2023 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Jan 2023 16:15:07 -0700 Subject: [petsc-users] DMPlex and CGNS In-Reply-To: References: Message-ID: <878ri23wck.fsf@jedbrown.org> How soon do you need this? I understand the grumbling about CGNS, but it's easy to build, uses HDF5 parallel IO in a friendly way, supports high order elements, and is generally pretty expressive. I wrote a parallel writer (with some limitations that I'll remove) and plan to replace the current reader with a parallel reader because I have need to read big CGNS meshes. Alas, my semester is starting so it's hard to make promises, but this is a high priority for my research group and if you share your mesh file, I hope to be able to make the parallel reader work in the next couple weeks. "Ferrand, Jesus A." writes: > Dear PETSc team: > > I would like to use DMPlex to partition a mesh stored as a CGNS file. I configured my installation with --download_cgns = 1, got me a .cgns file and called DMPlexCreateCGNSFromFile() on it. Doing so got me this error: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: CGNS file must have a single section, not 4 > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown > [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon Jan 16 17:25:11 2023 > [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-cgns=yes --download-metis=yes --download-parmetis=yes --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1 > [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104 > [0]PETSC ERROR: #2 DMPlexCreateCGNS() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60 > [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27 > [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29 > > I looked around mail archives for clused and found this one (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() and another (seemingly broken) link to CGNS files that can be opened with the former. After reading the source code I now understand that it is hardcoded for CGNS files that feature a single "base" and a single "section", whatever those are. > > After navigating the CGNS documentation, I can sympathize with the comments in the source code. > > Anyhow, I wanted to ask if I could be furnished with one such CGNS file that is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS files to conform to it. If not, I will look into building the DAG myself using DMPlex APIs. > > > Sincerely: > > J.A. Ferrand > > Embry-Riddle Aeronautical University - Daytona Beach FL > > M.Sc. Aerospace Engineering > > B.Sc. Aerospace Engineering > > B.Sc. Computational Mathematics > > > Phone: (386)-843-1829 > > Email(s): ferranj2 at my.erau.edu > > jesus.ferrand at gmail.com From knepley at gmail.com Mon Jan 16 19:16:50 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Jan 2023 20:16:50 -0500 Subject: [petsc-users] DMPlex and CGNS In-Reply-To: <878ri23wck.fsf@jedbrown.org> References: <878ri23wck.fsf@jedbrown.org> Message-ID: On Mon, Jan 16, 2023 at 6:15 PM Jed Brown wrote: > How soon do you need this? I understand the grumbling about CGNS, but it's > easy to build, uses HDF5 parallel IO in a friendly way, supports high order > elements, and is generally pretty expressive. I wrote a parallel writer > (with some limitations that I'll remove) and plan to replace the current > reader with a parallel reader because I have need to read big CGNS meshes. > Alas, my semester is starting so it's hard to make promises, but this is a > high priority for my research group and if you share your mesh file, I hope > to be able to make the parallel reader work in the next couple weeks. > The biggest hurdle for me is understanding the CGNS format. If you can tell me what is going on, I can help write the Plex code that interprets it. Thanks, Matt > "Ferrand, Jesus A." writes: > > > Dear PETSc team: > > > > I would like to use DMPlex to partition a mesh stored as a CGNS file. I > configured my installation with --download_cgns = 1, got me a .cgns file > and called DMPlexCreateCGNSFromFile() on it. Doing so got me this error: > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Error in external library > > [0]PETSC ERROR: CGNS file must have a single section, not 4 > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown > > [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus > Mon Jan 16 17:25:11 2023 > > [0]PETSC ERROR: Configure options --download-mpich=yes > --download-hdf5=yes --download-cgns=yes --download-metis=yes > --download-parmetis=yes --download-ptscotch=yes --download-chaco=yes > --with-32bits-pci-domain=1 > > [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at > /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104 > > [0]PETSC ERROR: #2 DMPlexCreateCGNS() at > /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60 > > [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at > /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27 > > [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at > /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29 > > > > I looked around mail archives for clused and found this one ( > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). > There, Matt provides a link to the source code for > DMPlexCreateCGNSFromFile() and another (seemingly broken) link to CGNS > files that can be opened with the former. After reading the source code I > now understand that it is hardcoded for CGNS files that feature a single > "base" and a single "section", whatever those are. > > > > After navigating the CGNS documentation, I can sympathize with the > comments in the source code. > > > > Anyhow, I wanted to ask if I could be furnished with one such CGNS file > that is compatible with DMPlexCreateCGNSFromFile() to see if I can modify > my CGNS files to conform to it. If not, I will look into building the DAG > myself using DMPlex APIs. > > > > > > Sincerely: > > > > J.A. Ferrand > > > > Embry-Riddle Aeronautical University - Daytona Beach FL > > > > M.Sc. Aerospace Engineering > > > > B.Sc. Aerospace Engineering > > > > B.Sc. Computational Mathematics > > > > > > Phone: (386)-843-1829 > > > > Email(s): ferranj2 at my.erau.edu > > > > jesus.ferrand at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jan 16 19:40:16 2023 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Jan 2023 18:40:16 -0700 Subject: [petsc-users] DMPlex and CGNS In-Reply-To: References: <878ri23wck.fsf@jedbrown.org> Message-ID: <87358a3pmn.fsf@jedbrown.org> Matthew Knepley writes: > On Mon, Jan 16, 2023 at 6:15 PM Jed Brown wrote: > >> How soon do you need this? I understand the grumbling about CGNS, but it's >> easy to build, uses HDF5 parallel IO in a friendly way, supports high order >> elements, and is generally pretty expressive. I wrote a parallel writer >> (with some limitations that I'll remove) and plan to replace the current >> reader with a parallel reader because I have need to read big CGNS meshes. >> Alas, my semester is starting so it's hard to make promises, but this is a >> high priority for my research group and if you share your mesh file, I hope >> to be able to make the parallel reader work in the next couple weeks. >> > > The biggest hurdle for me is understanding the CGNS format. If you can tell > me what is going on, I can help write the Plex code that > interprets it. I wrote the parallel writer and I'm pretty familiar with the data model. Let's connect in chat. From engbl7de at erau.edu Mon Jan 16 16:48:09 2023 From: engbl7de at erau.edu (Engblom, William A.) Date: Mon, 16 Jan 2023 22:48:09 +0000 Subject: [petsc-users] DMPlex and CGNS In-Reply-To: References: Message-ID: Jesus, The CGNS files we get from Pointwise have only one base, so that should not be an issue. However, sections are needed to contain each cell type, the BCs, and zonal boundaries. So, there are always several sections. The grid that Spencer made for you must have multiple sections. We have to be able to deal with grids like Spencer's example or else it's not useful. B. ________________________________ From: Ferrand, Jesus A. Sent: Monday, January 16, 2023 5:41 PM To: petsc-users at mcs.anl.gov Subject: DMPlex and CGNS Dear PETSc team: I would like to use DMPlex to partition a mesh stored as a CGNS file. I configured my installation with --download_cgns = 1, got me a .cgns file and called DMPlexCreateCGNSFromFile() on it. Doing so got me this error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [0]PETSC ERROR: CGNS file must have a single section, not 4 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon Jan 16 17:25:11 2023 [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-cgns=yes --download-metis=yes --download-parmetis=yes --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1 [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104 [0]PETSC ERROR: #2 DMPlexCreateCGNS() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60 [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27 [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29 I looked around mail archives for clused and found this one (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() and another (seemingly broken) link to CGNS files that can be opened with the former. After reading the source code I now understand that it is hardcoded for CGNS files that feature a single "base" and a single "section", whatever those are. After navigating the CGNS documentation, I can sympathize with the comments in the source code. Anyhow, I wanted to ask if I could be furnished with one such CGNS file that is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS files to conform to it. If not, I will look into building the DAG myself using DMPlex APIs. Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach FL M.Sc. Aerospace Engineering B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jan 17 12:23:53 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 17 Jan 2023 11:23:53 -0700 Subject: [petsc-users] DMPlex and CGNS In-Reply-To: References: Message-ID: <87ilh52f5y.fsf@jedbrown.org> Copying my private reply that appeared off-list. If you have one base with different element types, that's in scope for what I plan to develop soon. Congrats, you crashed cgnsview. $ cgnsview dl/HybridGrid.cgns Error in startup script: file was not found while executing "CGNSfile $ProgData(file)" (procedure "file_stats" line 4) invoked from within "file_stats" (procedure "file_load" line 53) invoked from within "file_load $fname" invoked from within "if {$argc} { set fname [lindex $argv [expr $argc - 1]] if {[file isfile $fname] && [file readable $fname]} { file_load $fname } }" (file "/usr/share/cgnstools/cgnsview.tcl" line 3013) This file looks okay in cgnscheck and paraview, but I don't have a need for multi-block and I'm stretched really thin so probably won't make it work any time soon. But if you make a single block with HexElements alongside PyramidElements and TetElements, I should be able to read it. If you don't mind prepping such a file (this size or smaller), it would help me test. "Engblom, William A." writes: > Jesus, > > The CGNS files we get from Pointwise have only one base, so that should not be an issue. However, sections are needed to contain each cell type, the BCs, and zonal boundaries. So, there are always several sections. The grid that Spencer made for you must have multiple sections. We have to be able to deal with grids like Spencer's example or else it's not useful. > > B. > > > > > > > ________________________________ > From: Ferrand, Jesus A. > Sent: Monday, January 16, 2023 5:41 PM > To: petsc-users at mcs.anl.gov > Subject: DMPlex and CGNS > > Dear PETSc team: > > I would like to use DMPlex to partition a mesh stored as a CGNS file. I configured my installation with --download_cgns = 1, got me a .cgns file and called DMPlexCreateCGNSFromFile() on it. Doing so got me this error: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: CGNS file must have a single section, not 4 > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown > [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon Jan 16 17:25:11 2023 > [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-cgns=yes --download-metis=yes --download-parmetis=yes --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1 > [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104 > [0]PETSC ERROR: #2 DMPlexCreateCGNS() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60 > [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27 > [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29 > > I looked around mail archives for clused and found this one (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() and another (seemingly broken) link to CGNS files that can be opened with the former. After reading the source code I now understand that it is hardcoded for CGNS files that feature a single "base" and a single "section", whatever those are. > > After navigating the CGNS documentation, I can sympathize with the comments in the source code. > > Anyhow, I wanted to ask if I could be furnished with one such CGNS file that is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS files to conform to it. If not, I will look into building the DAG myself using DMPlex APIs. > > > Sincerely: > > J.A. Ferrand > > Embry-Riddle Aeronautical University - Daytona Beach FL > > M.Sc. Aerospace Engineering > > B.Sc. Aerospace Engineering > > B.Sc. Computational Mathematics > > > Phone: (386)-843-1829 > > Email(s): ferranj2 at my.erau.edu > > jesus.ferrand at gmail.com From venugovh at mail.uc.edu Tue Jan 17 14:12:53 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Tue, 17 Jan 2023 20:12:53 +0000 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll Message-ID: Hi, I am doing the following thing. Step 1. Create DM object and get global vector 'V' using DMGetGlobalVector. Step 2. Doing some parallel operations on V. Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 'V_SEQ' using VecScatterBegin/End with SCATTER_FORWARD. Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ. Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from V_SEQ. Step 6. I continue using this new V on the rest of the parallelized program. Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once. Thanks, Vysakh Venugopal --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 17 14:28:23 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 17 Jan 2023 15:28:23 -0500 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll In-Reply-To: References: Message-ID: > On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users wrote: > > Hi, > > I am doing the following thing. > > Step 1. Create DM object and get global vector ?V? using DMGetGlobalVector. > Step 2. Doing some parallel operations on V. > Step 3. I am using VecScatterCreateToAll on V to create a sequential vector ?V_SEQ? using VecScatterBegin/End with SCATTER_FORWARD. > Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ. > Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from V_SEQ. > Step 6. I continue using this new V on the rest of the parallelized program. > > Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once. Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation. Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector. You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other ranks. The time for step 4 would be roughly the same. You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the data on a rank? Barry > > Thanks, > > Vysakh Venugopal > --- > Vysakh Venugopal > Ph.D. Candidate > Department of Mechanical Engineering > University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From venugovh at mail.uc.edu Tue Jan 17 15:38:48 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Tue, 17 Jan 2023 21:38:48 +0000 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll In-Reply-To: References: Message-ID: Thank you! I am doing a structural optimization filter that inherently cannot be parallelized. Vysakh From: Barry Smith Sent: Tuesday, January 17, 2023 3:28 PM To: Venugopal, Vysakh (venugovh) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll External Email: Use Caution On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users > wrote: Hi, I am doing the following thing. Step 1. Create DM object and get global vector ?V? using DMGetGlobalVector. Step 2. Doing some parallel operations on V. Step 3. I am using VecScatterCreateToAll on V to create a sequential vector ?V_SEQ? using VecScatterBegin/End with SCATTER_FORWARD. Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ. Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from V_SEQ. Step 6. I continue using this new V on the rest of the parallelized program. Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once. Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation. Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector. You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other ranks. The time for step 4 would be roughly the same. You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the data on a rank? Barry Thanks, Vysakh Venugopal --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Tue Jan 17 15:46:38 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Tue, 17 Jan 2023 21:46:38 +0000 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll In-Reply-To: References: Message-ID: <26A3167B-96C1-4140-B760-3979FCB812DC@mcmaster.ca> An HTML attachment was scrubbed... URL: From venugovh at mail.uc.edu Tue Jan 17 15:49:06 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Tue, 17 Jan 2023 21:49:06 +0000 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll In-Reply-To: <26A3167B-96C1-4140-B760-3979FCB812DC@mcmaster.ca> References: <26A3167B-96C1-4140-B760-3979FCB812DC@mcmaster.ca> Message-ID: This is the support structure minimization filter. So I need to go layer-by-layer from the bottommost slice of the array and update it as I move up. Every slice needs the updated values below that slice. Vysakh From: Blaise Bourdin Sent: Tuesday, January 17, 2023 4:47 PM To: Venugopal, Vysakh (venugovh) Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll External Email: Use Caution What type of filter are you implementing? Convolution filters are expensive to parallelize since you need an overlap of the size of the support of the filter, but it may still not be worst than doing it sequentially (typically the filter size is only one or 2 element diameters). Or you may be able to apply the filter in Fourier space. PDE-filters are typically elliptic and can be parallelized. Blaise On Jan 17, 2023, at 4:38 PM, Venugopal, Vysakh (venugovh) via petsc-users > wrote: Thank you! I am doing a structural optimization filter that inherently cannot be parallelized. Vysakh From: Barry Smith > Sent: Tuesday, January 17, 2023 3:28 PM To: Venugopal, Vysakh (venugovh) > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll External Email: Use Caution On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users > wrote: Hi, I am doing the following thing. Step 1. Create DM object and get global vector ?V? using DMGetGlobalVector. Step 2. Doing some parallel operations on V. Step 3. I am using VecScatterCreateToAll on V to create a sequential vector ?V_SEQ? using VecScatterBegin/End with SCATTER_FORWARD. Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ. Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from V_SEQ. Step 6. I continue using this new V on the rest of the parallelized program. Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once. Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation. Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector. You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other ranks. The time for step 4 would be roughly the same. You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the data on a rank? Barry Thanks, Vysakh Venugopal --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 ? Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Tue Jan 17 16:13:23 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Tue, 17 Jan 2023 22:13:23 +0000 Subject: [petsc-users] Performance problem using COO interface Message-ID: In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos?. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos? (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: NE_4 solve comparison.png Type: image/png Size: 16880 bytes Desc: NE_4 solve comparison.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: NE_4 rhsfunction and rhsjacobian comparison.png Type: image/png Size: 21407 bytes Desc: NE_4 rhsfunction and rhsjacobian comparison.png URL: From bourdin at mcmaster.ca Tue Jan 17 16:13:23 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Tue, 17 Jan 2023 22:13:23 +0000 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll In-Reply-To: References: <26A3167B-96C1-4140-B760-3979FCB812DC@mcmaster.ca> Message-ID: An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Tue Jan 17 16:25:00 2023 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Tue, 17 Jan 2023 22:25:00 +0000 Subject: [petsc-users] Performance problem using COO interface In-Reply-To: References: Message-ID: Hi, Philip, Could you add -log_view and see what functions are used in the solve? Since it is CPU-only, perhaps with -log_view of different runs, we can easily see which functions slowed down. --Junchao Zhang ________________________________ From: Fackler, Philip Sent: Tuesday, January 17, 2023 4:13 PM To: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov Cc: Mills, Richard Tran ; Zhang, Junchao ; Blondel, Sophie ; Roth, Philip Subject: Performance problem using COO interface In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos?. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos? (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From venugovh at mail.uc.edu Tue Jan 17 16:27:51 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Tue, 17 Jan 2023 22:27:51 +0000 Subject: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll In-Reply-To: References: <26A3167B-96C1-4140-B760-3979FCB812DC@mcmaster.ca> Message-ID: Sure, I will try this. I will update this thread once I get it working using the suggested method. Thank you! Vysakh From: Blaise Bourdin Sent: Tuesday, January 17, 2023 5:13 PM To: Venugopal, Vysakh (venugovh) Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll External Email: Use Caution Got it. Can you partition your mesh with only one processor in the z-direction? (Trivial if using DMDA) Blaise On Jan 17, 2023, at 4:49 PM, Venugopal, Vysakh (venugovh) > wrote: This is the support structure minimization filter. So I need to go layer-by-layer from the bottommost slice of the array and update it as I move up. Every slice needs the updated values below that slice. Vysakh From: Blaise Bourdin > Sent: Tuesday, January 17, 2023 4:47 PM To: Venugopal, Vysakh (venugovh) > Cc: Barry Smith >; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll External Email: Use Caution What type of filter are you implementing? Convolution filters are expensive to parallelize since you need an overlap of the size of the support of the filter, but it may still not be worst than doing it sequentially (typically the filter size is only one or 2 element diameters). Or you may be able to apply the filter in Fourier space. PDE-filters are typically elliptic and can be parallelized. Blaise On Jan 17, 2023, at 4:38 PM, Venugopal, Vysakh (venugovh) via petsc-users > wrote: Thank you! I am doing a structural optimization filter that inherently cannot be parallelized. Vysakh From: Barry Smith > Sent: Tuesday, January 17, 2023 3:28 PM To: Venugopal, Vysakh (venugovh) > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll External Email: Use Caution On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users > wrote: Hi, I am doing the following thing. Step 1. Create DM object and get global vector 'V' using DMGetGlobalVector. Step 2. Doing some parallel operations on V. Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 'V_SEQ' using VecScatterBegin/End with SCATTER_FORWARD. Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ. Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from V_SEQ. Step 6. I continue using this new V on the rest of the parallelized program. Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once. Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation. Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector. You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other ranks. The time for step 4 would be roughly the same. You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the data on a rank? Barry Thanks, Vysakh Venugopal --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 - Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 - Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 17 18:07:12 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 17 Jan 2023 19:07:12 -0500 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: <69C797E7-6916-468D-A879-D261D07C458A@mcmaster.ca> References: <69C797E7-6916-468D-A879-D261D07C458A@mcmaster.ca> Message-ID: <219B4447-34F0-4224-A24E-91FE177A9391@petsc.dev> It appears that Tao is not written to allow multiple TaoSolve() on the same Tao object; this is different from KSP, SNES, and TS. If you look at the converged reason at the beginning of the second TaoSolve() you will see it is the reason that occurred in the first solve and all the Tao data structures are in the state they were in when the previous TaoSolve ended. Thus it is using incorrect flags and previous matrices incorrectly. Fixing this would be a largish process I think. I added an error check for TaoSolve that checks if converged reason is not iterating (meaning the Tao object was previously used and left in a bad state) so that this same problem won't come up for other users. https://gitlab.com/petsc/petsc/-/merge_requests/5986 Barry > On Jan 16, 2023, at 5:07 PM, Blaise Bourdin wrote: > > Hi, > > I am attaching a small modification of the eptorsion1.c example that replicates the issue. It looks like this bug is triggered when the upper and lower bounds are equal on enough (10) degrees of freedom. > > > ---- Elastic-Plastic Torsion Problem ----- > mx: 10 my: 10 > > i: 0 > i: 1 > i: 2 > i: 3 > i: 4 > i: 5 > i: 6 > i: 7 > i: 8 > i: 9 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Preconditioner number of local rows 44 does not equal input vector size 54 > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.2-242-g4615508c7fc GIT Date: 2022-11-28 10:21:46 -0600 > [0]PETSC ERROR: ./eptorsion1 on a ventura-gcc12.2-arm64-g64 named bblaptop.math.mcmaster.ca by blaise Mon Jan 16 17:06:57 2023 > [0]PETSC ERROR: Configure options --CFLAGS="-Wimplicit-function-declaration -Wunused" --FFLAGS="-ffree-line-length-none -fallow-argument-mismatch -Wunused" --download-ctetgen=1 --download-exodusii=1 --download-hdf5=1 --download-hypre=1 --download-metis=1 --download-netcdf=1 --download-mumps=1 --download-parmetis=1 --download-pnetcdf=1 --download-scalapack --download-triangle=1 --download-zlib=1 --with-64-bit-indices=1 --with-debugging=1 --with-exodusii-fortran-bindings --with-shared-libraries=1 --with-x11=0 > [0]PETSC ERROR: #1 PCApply() at /opt/HPC/petsc-main/src/ksp/pc/interface/precon.c:434 > [0]PETSC ERROR: #2 KSP_PCApply() at /opt/HPC/petsc-main/include/petsc/private/kspimpl.h:380 > [0]PETSC ERROR: #3 KSPCGSolve_STCG() at /opt/HPC/petsc-main/src/ksp/ksp/impls/cg/stcg/stcg.c:76 > [0]PETSC ERROR: #4 KSPSolve_Private() at /opt/HPC/petsc-main/src/ksp/ksp/interface/itfunc.c:898 > [0]PETSC ERROR: #5 KSPSolve() at /opt/HPC/petsc-main/src/ksp/ksp/interface/itfunc.c:1070 > [0]PETSC ERROR: #6 TaoBNKComputeStep() at /opt/HPC/petsc-main/src/tao/bound/impls/bnk/bnk.c:459 > [0]PETSC ERROR: #7 TaoSolve_BNTR() at /opt/HPC/petsc-main/src/tao/bound/impls/bnk/bntr.c:138 > [0]PETSC ERROR: #8 TaoSolve() at /opt/HPC/petsc-main/src/tao/interface/taosolver.c:177 > [0]PETSC ERROR: #9 main() at eptorsion1.c:166 > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > Abort(60) on node 0 (rank 0 in comm 16): application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > > I hope that this helps. > > Blaise > > >> On Jan 16, 2023, at 3:14 PM, Alexis Marboeuf wrote: >> >> Hi Matt, >> After investigation, it fails because, at some point, the boolean needH is set to PETSC_FALSE when initializing the BNK method with TAOBNKInitialize (line 103 of $PETSC_DIR/src/tao/bound/impls/bnk/bntr.c). The Hessian and the precondtitioner are thus not updated throughout the TAO iterations. It has something to do with the option BNK_INIT_INTERPOLATION set by default. It works when I choose BNK_INIT_CONSTANT. In my case, in all the successful calls of TAOSolve, the computed trial objective value is better than the current value which implies needH = PETSC_TRUE within TAOBNKInitialize. At some point, the trial value becomes equal to the current objective value up to machine precision and then, needH = PETSC_FALSE. I have to admit I am struggling understanding how that boolean needH is computed when BNK is initialized with BNK_INIT_INTERPOLATION. Can you help me with that? >> Thanks a lot. >> Alexis >> De : Alexis Marboeuf >> Envoy? : samedi 14 janvier 2023 05:24 >> ? : Matthew Knepley >> Cc : petsc-users at mcs.anl.gov >> Objet : RE: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) >> >> Hi Matt, >> Indeed, it fails on 1 process with the same error. The source code is available here: https://github.com/bourdin/mef90 (branch marboeuf/vdef-tao-test) >> >> GitHub - bourdin/mef90: Official repository for mef90/vDef >> mef90 / vDef: A reference implementation of the variational approach to fracture, as described in: Francfort, G. and Marigo, J.-J. (1998). Revisiting brittle fracture as an energy minimization problem. >> github.com >> I can share the details (installation + command line) for running it. But the ideal would be to reproduce this error with a Petsc example so it's easier for you to investigate. I looked for a bound constraint minimization problem with TAO and TS but I didn't find it. What example could I use? >> Thanks! >> Alexis >> De : Matthew Knepley >> Envoy? : samedi 14 janvier 2023 02:38 >> ? : Alexis Marboeuf >> Cc : petsc-users at mcs.anl.gov >> Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) >> >> On Fri, Jan 13, 2023 at 3:21 PM Alexis Marboeuf > wrote: >> Hi Matt, >> >> Here is the output from Petsc when I view the TAO solver: >> >> Okay, there is no sophisticated caching going on. So, first I would get it to fail on1 process. It should if it just depends on >> the convergence (I hope). Then send the source so we can run it. It should be simple for us to find where system size >> changes and the KSP is not reset (if that indeed is what happens). >> >> Thanks, >> >> Matt >> >> Tao Object: (Damage_) 4 MPI processes >> type: bntr >> Tao Object: (Damage_tao_bnk_cg_) 4 MPI processes >> type: bncg >> CG Type: ssml_bfgs >> Skipped Stepdirection Updates: 0 >> Scaled gradient steps: 0 >> Pure gradient steps: 0 >> Not a descent direction: 0 >> Line search fails: 0 >> Matrix has not been preallocated yet >> TaoLineSearch Object: (Damage_tao_bnk_cg_) 4 MPI processes >> type: more-thuente >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=0 >> total number of gradient evaluations=0 >> total number of function/gradient evaluations=0 >> Termination reason: 0 >> Active Set subset type: subvec >> convergence tolerances: gatol=1e-08, steptol=0., gttol=0. >> Residual in Function/Gradient:=0. >> Objective value=0. >> total number of iterations=0, (max: 2000) >> Solver terminated: 0 >> Rejected BFGS updates: 0 >> CG steps: 0 >> Newton steps: 11 >> BFGS steps: 0 >> Scaled gradient steps: 0 >> Gradient steps: 0 >> KSP termination reasons: >> atol: 4 >> rtol: 0 >> ctol: 7 >> negc: 0 >> dtol: 0 >> iter: 0 >> othr: 0 >> TaoLineSearch Object: (Damage_) 4 MPI processes >> type: more-thuente >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=0 >> total number of gradient evaluations=0 >> total number of function/gradient evaluations=0 >> using variable bounds >> Termination reason: 0 >> KSP Object: (Damage_tao_bnk_) 4 MPI processes >> type: stcg >> maximum iterations=10000, nonzero initial guess >> tolerances: relative=1e-08, absolute=1e-08, divergence=1e+10 >> left preconditioning >> using UNPRECONDITIONED norm type for convergence test >> PC Object: (Damage_tao_bnk_) 4 MPI processes >> type: lmvm >> Mat Object: (Damage_tao_bnk_pc_lmvm_) 4 MPI processes >> type: lmvmbfgs >> rows=30634, cols=30634 >> Scale type: DIAGONAL >> Scale history: 1 >> Scale params: alpha=1., beta=0.5, rho=1. >> Convex factors: phi=0., theta=0.125 >> Max. storage: 5 >> Used storage: 5 >> Number of updates: 11 >> Number of rejects: 0 >> Number of resets: 0 >> Mat Object: (Damage_tao_bnk_pc_lmvm_J0_) 4 MPI processes >> type: lmvmdiagbroyden >> rows=30634, cols=30634 >> Scale history: 1 >> Scale params: alpha=1., beta=0.5, rho=1. >> Convex factor: theta=0.125 >> Max. storage: 1 >> Used storage: 1 >> Number of updates: 11 >> Number of rejects: 0 >> Number of resets: 0 >> linear system matrix = precond matrix: >> Mat Object: 4 MPI processes >> type: mpiaij >> rows=468, cols=468 >> total: nonzeros=2932, allocated nonzeros=2932 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node (on process 0) routines >> total KSP iterations: 103 >> Active Set subset type: subvec >> convergence tolerances: gatol=0.0001, steptol=0., gttol=1e-05 >> Residual in Function/Gradient:=9.11153e-05 >> Objective value=0.00665458 >> total number of iterations=11, (max: 50) >> total number of function evaluations=17, max: -1 >> total number of gradient evaluations=13, max: -1 >> total number of Hessian evaluations=12 >> Solution converged: ||g(X)|| <= gatol >> >> Thanks again for your help! >> Alexis >> >> De : Matthew Knepley > >> Envoy? : samedi 14 janvier 2023 01:38 >> ? : Alexis Marboeuf > >> Cc : petsc-users at mcs.anl.gov > >> Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) >> >> On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf > wrote: >> Hi all, >> >> In a variational approach of brittle fracture setting, I try to solve a bound constraint minimization problem using TAO. I checkout on the main branch of Petsc. Minimization with respect to the bounded variable (damage) is achieved through the Bounded Newton Trust Region (TAOBNTR). All other TAO parameters are set by default. On a Linux machine, I get the following error with a 4 processors run: >> >> Can you view the solver? >> >> Thanks, >> >> Matt >> >> >> [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [3]PETSC ERROR: Nonconforming object sizes >> [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal input vector size 1161 >> [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 >> [3]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 >> [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c >> [3]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 >> [3]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 >> [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 >> [3]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 >> [3]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 >> [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 >> [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 >> [3]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 >> [2]PETSC ERROR: Nonconforming object sizes >> [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal input vector size 1254 >> [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000 >> [2]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023 >> [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c >> [2]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434 >> [2]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380 >> [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76 >> [2]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898 >> [2]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070 >> [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459 >> [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138 >> [2]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177 >> [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 >> application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 >> [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370 >> application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 >> slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT 2023-01-12T17:21:07 *** >> srun: Job step aborted: Waiting up to 32 seconds for job step to finish. >> srun: error: bb01: tasks 0-1: Killed >> srun: error: bb01: tasks 2-3: Exited with exit code 1 >> >> The error is raised in the middle of the computation after many successful calls of TAOSolve and TAO iterations. My guess is that TAO computes the preconditioner during its first iteration with all variables in the active set. But the preconditioner is never updated when some variables are moved to the inactive set during the next TAO iterations. Am I right? Can you help me with that? >> >> Thanks a lot for your help and your time. >> Regards, >> Alexis >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Tue Jan 17 20:14:15 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Wed, 18 Jan 2023 02:14:15 +0000 Subject: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR) In-Reply-To: <219B4447-34F0-4224-A24E-91FE177A9391@petsc.dev> References: <69C797E7-6916-468D-A879-D261D07C458A@mcmaster.ca> <219B4447-34F0-4224-A24E-91FE177A9391@petsc.dev> Message-ID: <62F0CDA5-F354-4D2D-8440-EC06F2AB68D9@mcmaster.ca> An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Jan 18 13:53:54 2023 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 18 Jan 2023 14:53:54 -0500 Subject: [petsc-users] multi GPU partitions have very different memory usage Message-ID: Q0) does -memory_view trace GPU memory as well, or is there another method to query the peak device memory allocation? Q1) I'm loading a aijcusparse matrix with MatLoad, and running with -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks on 8x80GB GPUs, and during the setup phase before crashing with CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted content. GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is this expected? Do I need to manually repartition this somehow? Thanks, Mark +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 0 N/A N/A 1696543 C ./petsc_solver_test 38407MiB | | 0 N/A N/A 1696544 C ./petsc_solver_test 467MiB | | 0 N/A N/A 1696545 C ./petsc_solver_test 467MiB | | 0 N/A N/A 1696546 C ./petsc_solver_test 467MiB | | 0 N/A N/A 1696548 C ./petsc_solver_test 467MiB | | 0 N/A N/A 1696550 C ./petsc_solver_test 471MiB | | 0 N/A N/A 1696551 C ./petsc_solver_test 467MiB | | 0 N/A N/A 1696552 C ./petsc_solver_test 467MiB | | 1 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 1 N/A N/A 1696544 C ./petsc_solver_test 35849MiB | | 2 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 2 N/A N/A 1696545 C ./petsc_solver_test 36719MiB | | 3 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 3 N/A N/A 1696546 C ./petsc_solver_test 37343MiB | | 4 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 4 N/A N/A 1696548 C ./petsc_solver_test 36935MiB | | 5 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 5 N/A N/A 1696550 C ./petsc_solver_test 49953MiB | | 6 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 6 N/A N/A 1696551 C ./petsc_solver_test 47693MiB | | 7 N/A N/A 1630309 C nvidia-cuda-mps-server 27MiB | | 7 N/A N/A 1696552 C ./petsc_solver_test 77331MiB | +-----------------------------------------------------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Jan 18 14:34:52 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 18 Jan 2023 15:34:52 -0500 Subject: [petsc-users] multi GPU partitions have very different memory usage In-Reply-To: References: Message-ID: Can your problem have load imbalance? You might try '-pc_type asm' (and/or jacobi) to see your baseline load imbalance. GAMG can add some load imbalance but start by getting a baseline. Mark On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry wrote: > Q0) does -memory_view trace GPU memory as well, or is there another method > to query the peak device memory allocation? > > Q1) I'm loading a aijcusparse matrix with MatLoad, and running with > -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info > 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks > on 8x80GB GPUs, and during the setup phase before crashing with > CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted > content. > > GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is > this expected? Do I need to manually repartition this somehow? > > Thanks, > Mark > > > > +-----------------------------------------------------------------------------+ > > | Processes: > | > > | GPU GI CI PID Type Process name GPU > Memory | > > | ID ID > Usage | > > > |=============================================================================| > > | 0 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 0 N/A N/A 1696543 C ./petsc_solver_test > 38407MiB | > > | 0 N/A N/A 1696544 C ./petsc_solver_test > 467MiB | > > | 0 N/A N/A 1696545 C ./petsc_solver_test > 467MiB | > > | 0 N/A N/A 1696546 C ./petsc_solver_test > 467MiB | > > | 0 N/A N/A 1696548 C ./petsc_solver_test > 467MiB | > > | 0 N/A N/A 1696550 C ./petsc_solver_test > 471MiB | > > | 0 N/A N/A 1696551 C ./petsc_solver_test > 467MiB | > > | 0 N/A N/A 1696552 C ./petsc_solver_test > 467MiB | > > | 1 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 1 N/A N/A 1696544 C ./petsc_solver_test > 35849MiB | > > | 2 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 2 N/A N/A 1696545 C ./petsc_solver_test > 36719MiB | > > | 3 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 3 N/A N/A 1696546 C ./petsc_solver_test > 37343MiB | > > | 4 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 4 N/A N/A 1696548 C ./petsc_solver_test > 36935MiB | > > | 5 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 5 N/A N/A 1696550 C ./petsc_solver_test > 49953MiB | > > | 6 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 6 N/A N/A 1696551 C ./petsc_solver_test > 47693MiB | > > | 7 N/A N/A 1630309 C nvidia-cuda-mps-server > 27MiB | > > | 7 N/A N/A 1696552 C ./petsc_solver_test > 77331MiB | > > > +-----------------------------------------------------------------------------+ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Jan 18 14:42:19 2023 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 18 Jan 2023 15:42:19 -0500 Subject: [petsc-users] multi GPU partitions have very different memory usage In-Reply-To: References: Message-ID: With asm I see a range of 8GB-13GB, slightly smaller ratio but that probably explains it (does this still seem like a lot of memory to you for the problem size?) In general I don't have the same number of blocks per row, so I suppose it makes sense there's some memory imbalance. On Wed, Jan 18, 2023 at 3:35 PM Mark Adams wrote: > Can your problem have load imbalance? > > You might try '-pc_type asm' (and/or jacobi) to see your baseline load > imbalance. > GAMG can add some load imbalance but start by getting a baseline. > > Mark > > On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry wrote: > >> Q0) does -memory_view trace GPU memory as well, or is there another >> method to query the peak device memory allocation? >> >> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with >> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info >> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks >> on 8x80GB GPUs, and during the setup phase before crashing with >> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted >> content. >> >> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is >> this expected? Do I need to manually repartition this somehow? >> >> Thanks, >> Mark >> >> >> >> +-----------------------------------------------------------------------------+ >> >> | Processes: >> | >> >> | GPU GI CI PID Type Process name GPU >> Memory | >> >> | ID ID >> Usage | >> >> >> |=============================================================================| >> >> | 0 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 0 N/A N/A 1696543 C ./petsc_solver_test >> 38407MiB | >> >> | 0 N/A N/A 1696544 C ./petsc_solver_test >> 467MiB | >> >> | 0 N/A N/A 1696545 C ./petsc_solver_test >> 467MiB | >> >> | 0 N/A N/A 1696546 C ./petsc_solver_test >> 467MiB | >> >> | 0 N/A N/A 1696548 C ./petsc_solver_test >> 467MiB | >> >> | 0 N/A N/A 1696550 C ./petsc_solver_test >> 471MiB | >> >> | 0 N/A N/A 1696551 C ./petsc_solver_test >> 467MiB | >> >> | 0 N/A N/A 1696552 C ./petsc_solver_test >> 467MiB | >> >> | 1 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 1 N/A N/A 1696544 C ./petsc_solver_test >> 35849MiB | >> >> | 2 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 2 N/A N/A 1696545 C ./petsc_solver_test >> 36719MiB | >> >> | 3 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 3 N/A N/A 1696546 C ./petsc_solver_test >> 37343MiB | >> >> | 4 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 4 N/A N/A 1696548 C ./petsc_solver_test >> 36935MiB | >> >> | 5 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 5 N/A N/A 1696550 C ./petsc_solver_test >> 49953MiB | >> >> | 6 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 6 N/A N/A 1696551 C ./petsc_solver_test >> 47693MiB | >> >> | 7 N/A N/A 1630309 C nvidia-cuda-mps-server >> 27MiB | >> >> | 7 N/A N/A 1696552 C ./petsc_solver_test >> 77331MiB | >> >> >> +-----------------------------------------------------------------------------+ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Jan 18 15:48:36 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 18 Jan 2023 16:48:36 -0500 Subject: [petsc-users] multi GPU partitions have very different memory usage In-Reply-To: References: Message-ID: cusparse matrix triple product takes a lot of memory. We usually use Kokkos, configured with TPL turned off. If you have a complex problem different parts of the domain can coarsen at different rates. Jacobi instead of asm will save a fair amount od memory. If you run with -ksp_view you will see operator/matrix complexity from GAMG. These should be < 1.5, Mark On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry wrote: > With asm I see a range of 8GB-13GB, slightly smaller ratio but that > probably explains it (does this still seem like a lot of memory to you for > the problem size?) > > In general I don't have the same number of blocks per row, so I suppose it > makes sense there's some memory imbalance. > > > > On Wed, Jan 18, 2023 at 3:35 PM Mark Adams wrote: > >> Can your problem have load imbalance? >> >> You might try '-pc_type asm' (and/or jacobi) to see your baseline load >> imbalance. >> GAMG can add some load imbalance but start by getting a baseline. >> >> Mark >> >> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry wrote: >> >>> Q0) does -memory_view trace GPU memory as well, or is there another >>> method to query the peak device memory allocation? >>> >>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with >>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info >>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks >>> on 8x80GB GPUs, and during the setup phase before crashing with >>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted >>> content. >>> >>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is >>> this expected? Do I need to manually repartition this somehow? >>> >>> Thanks, >>> Mark >>> >>> >>> >>> +-----------------------------------------------------------------------------+ >>> >>> | Processes: >>> | >>> >>> | GPU GI CI PID Type Process name GPU >>> Memory | >>> >>> | ID ID >>> Usage | >>> >>> >>> |=============================================================================| >>> >>> | 0 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 0 N/A N/A 1696543 C ./petsc_solver_test >>> 38407MiB | >>> >>> | 0 N/A N/A 1696544 C ./petsc_solver_test >>> 467MiB | >>> >>> | 0 N/A N/A 1696545 C ./petsc_solver_test >>> 467MiB | >>> >>> | 0 N/A N/A 1696546 C ./petsc_solver_test >>> 467MiB | >>> >>> | 0 N/A N/A 1696548 C ./petsc_solver_test >>> 467MiB | >>> >>> | 0 N/A N/A 1696550 C ./petsc_solver_test >>> 471MiB | >>> >>> | 0 N/A N/A 1696551 C ./petsc_solver_test >>> 467MiB | >>> >>> | 0 N/A N/A 1696552 C ./petsc_solver_test >>> 467MiB | >>> >>> | 1 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 1 N/A N/A 1696544 C ./petsc_solver_test >>> 35849MiB | >>> >>> | 2 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 2 N/A N/A 1696545 C ./petsc_solver_test >>> 36719MiB | >>> >>> | 3 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 3 N/A N/A 1696546 C ./petsc_solver_test >>> 37343MiB | >>> >>> | 4 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 4 N/A N/A 1696548 C ./petsc_solver_test >>> 36935MiB | >>> >>> | 5 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 5 N/A N/A 1696550 C ./petsc_solver_test >>> 49953MiB | >>> >>> | 6 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 6 N/A N/A 1696551 C ./petsc_solver_test >>> 47693MiB | >>> >>> | 7 N/A N/A 1630309 C nvidia-cuda-mps-server >>> 27MiB | >>> >>> | 7 N/A N/A 1696552 C ./petsc_solver_test >>> 77331MiB | >>> >>> >>> +-----------------------------------------------------------------------------+ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Jan 18 17:03:01 2023 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 18 Jan 2023 18:03:01 -0500 Subject: [petsc-users] multi GPU partitions have very different memory usage In-Reply-To: References: Message-ID: Thanks Mark, I'll try the kokkos bit. Any other suggestions for minimizing memory besides the obvious use less levels? Unfortunately Jacobi does poorly compared to ILU on these systems. I'm seeing grid complexity 1.48 and operator complexity 1.75 with pc_gamg_square_graph 0, and 1.15/1.25 with it at 1. Additionally the convergence rate is pretty healthy with 5 gmres+asm smooths but very bad with 5 Richardson+asm. On Wed, Jan 18, 2023, 4:48 PM Mark Adams wrote: > cusparse matrix triple product takes a lot of memory. We usually use > Kokkos, configured with TPL turned off. > > If you have a complex problem different parts of the domain can coarsen at > different rates. > Jacobi instead of asm will save a fair amount od memory. > If you run with -ksp_view you will see operator/matrix complexity from > GAMG. These should be < 1.5, > > Mark > > On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry wrote: > >> With asm I see a range of 8GB-13GB, slightly smaller ratio but that >> probably explains it (does this still seem like a lot of memory to you for >> the problem size?) >> >> In general I don't have the same number of blocks per row, so I suppose >> it makes sense there's some memory imbalance. >> >> >> >> On Wed, Jan 18, 2023 at 3:35 PM Mark Adams wrote: >> >>> Can your problem have load imbalance? >>> >>> You might try '-pc_type asm' (and/or jacobi) to see your baseline load >>> imbalance. >>> GAMG can add some load imbalance but start by getting a baseline. >>> >>> Mark >>> >>> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry wrote: >>> >>>> Q0) does -memory_view trace GPU memory as well, or is there another >>>> method to query the peak device memory allocation? >>>> >>>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with >>>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info >>>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks >>>> on 8x80GB GPUs, and during the setup phase before crashing with >>>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted >>>> content. >>>> >>>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is >>>> this expected? Do I need to manually repartition this somehow? >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> >>>> +-----------------------------------------------------------------------------+ >>>> >>>> | Processes: >>>> | >>>> >>>> | GPU GI CI PID Type Process name GPU >>>> Memory | >>>> >>>> | ID ID >>>> Usage | >>>> >>>> >>>> |=============================================================================| >>>> >>>> | 0 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 0 N/A N/A 1696543 C ./petsc_solver_test >>>> 38407MiB | >>>> >>>> | 0 N/A N/A 1696544 C ./petsc_solver_test >>>> 467MiB | >>>> >>>> | 0 N/A N/A 1696545 C ./petsc_solver_test >>>> 467MiB | >>>> >>>> | 0 N/A N/A 1696546 C ./petsc_solver_test >>>> 467MiB | >>>> >>>> | 0 N/A N/A 1696548 C ./petsc_solver_test >>>> 467MiB | >>>> >>>> | 0 N/A N/A 1696550 C ./petsc_solver_test >>>> 471MiB | >>>> >>>> | 0 N/A N/A 1696551 C ./petsc_solver_test >>>> 467MiB | >>>> >>>> | 0 N/A N/A 1696552 C ./petsc_solver_test >>>> 467MiB | >>>> >>>> | 1 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 1 N/A N/A 1696544 C ./petsc_solver_test >>>> 35849MiB | >>>> >>>> | 2 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 2 N/A N/A 1696545 C ./petsc_solver_test >>>> 36719MiB | >>>> >>>> | 3 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 3 N/A N/A 1696546 C ./petsc_solver_test >>>> 37343MiB | >>>> >>>> | 4 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 4 N/A N/A 1696548 C ./petsc_solver_test >>>> 36935MiB | >>>> >>>> | 5 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 5 N/A N/A 1696550 C ./petsc_solver_test >>>> 49953MiB | >>>> >>>> | 6 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 6 N/A N/A 1696551 C ./petsc_solver_test >>>> 47693MiB | >>>> >>>> | 7 N/A N/A 1630309 C nvidia-cuda-mps-server >>>> 27MiB | >>>> >>>> | 7 N/A N/A 1696552 C ./petsc_solver_test >>>> 77331MiB | >>>> >>>> >>>> +-----------------------------------------------------------------------------+ >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Jan 19 08:40:37 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 19 Jan 2023 09:40:37 -0500 Subject: [petsc-users] multi GPU partitions have very different memory usage In-Reply-To: References: Message-ID: On Wed, Jan 18, 2023 at 6:03 PM Mark Lohry wrote: > Thanks Mark, I'll try the kokkos bit. Any other suggestions for minimizing > memory besides the obvious use less levels? > > Unfortunately Jacobi does poorly compared to ILU on these systems. > > I'm seeing grid complexity 1.48 and operator complexity 1.75 with > pc_gamg_square_graph 0, and 1.15/1.25 with it at 1. > That looks good. Use 1. > Additionally the convergence rate is pretty healthy with 5 gmres+asm > smooths but very bad with 5 Richardson+asm. > > Yea, it needs to be damped and GMRES does that automatically. > > On Wed, Jan 18, 2023, 4:48 PM Mark Adams wrote: > >> cusparse matrix triple product takes a lot of memory. We usually use >> Kokkos, configured with TPL turned off. >> >> If you have a complex problem different parts of the domain can coarsen >> at different rates. >> Jacobi instead of asm will save a fair amount od memory. >> If you run with -ksp_view you will see operator/matrix complexity from >> GAMG. These should be < 1.5, >> >> Mark >> >> On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry wrote: >> >>> With asm I see a range of 8GB-13GB, slightly smaller ratio but that >>> probably explains it (does this still seem like a lot of memory to you for >>> the problem size?) >>> >>> In general I don't have the same number of blocks per row, so I suppose >>> it makes sense there's some memory imbalance. >>> >>> >>> >>> On Wed, Jan 18, 2023 at 3:35 PM Mark Adams wrote: >>> >>>> Can your problem have load imbalance? >>>> >>>> You might try '-pc_type asm' (and/or jacobi) to see your baseline load >>>> imbalance. >>>> GAMG can add some load imbalance but start by getting a baseline. >>>> >>>> Mark >>>> >>>> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry wrote: >>>> >>>>> Q0) does -memory_view trace GPU memory as well, or is there another >>>>> method to query the peak device memory allocation? >>>>> >>>>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with >>>>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info >>>>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks >>>>> on 8x80GB GPUs, and during the setup phase before crashing with >>>>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted >>>>> content. >>>>> >>>>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is >>>>> this expected? Do I need to manually repartition this somehow? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> >>>>> +-----------------------------------------------------------------------------+ >>>>> >>>>> | Processes: >>>>> | >>>>> >>>>> | GPU GI CI PID Type Process name GPU >>>>> Memory | >>>>> >>>>> | ID ID >>>>> Usage | >>>>> >>>>> >>>>> |=============================================================================| >>>>> >>>>> | 0 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 0 N/A N/A 1696543 C ./petsc_solver_test >>>>> 38407MiB | >>>>> >>>>> | 0 N/A N/A 1696544 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696545 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696546 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696548 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696550 C ./petsc_solver_test >>>>> 471MiB | >>>>> >>>>> | 0 N/A N/A 1696551 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696552 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 1 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 1 N/A N/A 1696544 C ./petsc_solver_test >>>>> 35849MiB | >>>>> >>>>> | 2 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 2 N/A N/A 1696545 C ./petsc_solver_test >>>>> 36719MiB | >>>>> >>>>> | 3 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 3 N/A N/A 1696546 C ./petsc_solver_test >>>>> 37343MiB | >>>>> >>>>> | 4 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 4 N/A N/A 1696548 C ./petsc_solver_test >>>>> 36935MiB | >>>>> >>>>> | 5 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 5 N/A N/A 1696550 C ./petsc_solver_test >>>>> 49953MiB | >>>>> >>>>> | 6 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 6 N/A N/A 1696551 C ./petsc_solver_test >>>>> 47693MiB | >>>>> >>>>> | 7 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 7 N/A N/A 1696552 C ./petsc_solver_test >>>>> 77331MiB | >>>>> >>>>> >>>>> +-----------------------------------------------------------------------------+ >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Thu Jan 19 10:57:24 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Thu, 19 Jan 2023 11:57:24 -0500 Subject: [petsc-users] Interpreting Redistribution SF Message-ID: Hi Petsc Users I'm working with a distribution start forest generated by DMPlexDistribute and PetscSFBcast and Reduce to move data between the initial distribution and the distribution generated by DMPlex Distribute. I'm trying to debug some values that aren't being copied properly and wanted to verify I understand how a redistribution SF works compared with a SF that describes overlapped points. [0] 0 <- (0,7) point 0 on the distributed plex is point 7 on process 0 on the initial distribution [0] 1 <- (0,8) point 1 on the distributed plex is point 8 on process 0 on the initial distribution [0] 2 <- (0,9) [0] 3 <- (0,10) [0] 4 <- (0,11) [1] 0 <- (1,0) point 0 on the distributed plex is point 0 on process 1 on the initial distribution [1] 1 <- (1,1) [1] 2 <- (1,2) [1] 3 <- (0,0) point 3 on the distributed plex is point 0 on process 0 on the initial distribution [1] 4 <- (0,1) [1] 5 <- (0,2) my confusion I think is how does the distributionSF inform of what cells will be leafs on the distribution? Sincerely Nicholas -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Thu Jan 19 12:46:03 2023 From: danyang.su at gmail.com (Danyang Su) Date: Thu, 19 Jan 2023 10:46:03 -0800 Subject: [petsc-users] Cmake problem on an old cluster Message-ID: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> Hi All, I am trying to install the latest PETSc on an old cluster but always get some error information at the step of cmake. The system installed cmake is V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake first, it does not work. Then I tried to clean everything (delete the petsc_arch folder), download the latest cmake myself and pass the path to the configuration, the error is still there. The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have no problem to install PETSc-3.13.6 there. The latest version cannot pass configuration, unfortunately. Attached is the last configuration I have tried. --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" Is there any solution for this. Thanks, Danyang -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 1176282 bytes Desc: not available URL: From Tim.Meehan at grayanalytics.com Thu Jan 19 12:50:00 2023 From: Tim.Meehan at grayanalytics.com (Tim Meehan) Date: Thu, 19 Jan 2023 18:50:00 +0000 Subject: [petsc-users] locally deploy PETSc Message-ID: Hi - I am trying to set up a local workstation for a few other developers who need PETSc installed from the latest release. I figured that it would be easiest for me to just clone the repository, as mentioned in the Quick Start. So, in /home/me/opt, I issued: git clone -b release https://gitlab.com/petsc/petsc.git petsc cd petsc ./configure make all check Things work fine, but I would like to install it in /opt/petsc, minus all of the build derbris Is there some way to have './configure' do this? (I was actually thinking that the configure script was from GNU autotools or something - but obviously not) Cheers, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jan 19 12:56:18 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 19 Jan 2023 11:56:18 -0700 Subject: [petsc-users] locally deploy PETSc In-Reply-To: References: Message-ID: <873586z73h.fsf@jedbrown.org> You're probably looking for ./configure --prefix=/opt/petsc. It's documented in ./configure --help. Tim Meehan writes: > Hi - I am trying to set up a local workstation for a few other developers who need PETSc installed from the latest release. I figured that it would be easiest for me to just clone the repository, as mentioned in the Quick Start. > > So, in /home/me/opt, I issued: > git clone -b release https://gitlab.com/petsc/petsc.git petsc > cd petsc > ./configure > make all check > > Things work fine, but I would like to install it in /opt/petsc, minus all of the build derbris > > Is there some way to have './configure' do this? > (I was actually thinking that the configure script was from GNU autotools or something - but obviously not) > > Cheers, > Tim From bsmith at petsc.dev Thu Jan 19 13:00:42 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 19 Jan 2023 14:00:42 -0500 Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> Message-ID: <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> Remove --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz and install CMake yourself. Then configure PETSc with --with-cmake=directory you installed it in. Barry > On Jan 19, 2023, at 1:46 PM, Danyang Su wrote: > > Hi All, > > I am trying to install the latest PETSc on an old cluster but always get some error information at the step of cmake. The system installed cmake is V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake first, it does not work. Then I tried to clean everything (delete the petsc_arch folder), download the latest cmake myself and pass the path to the configuration, the error is still there. > > The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have no problem to install PETSc-3.13.6 there. The latest version cannot pass configuration, unfortunately. Attached is the last configuration I have tried. > > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" > > Is there any solution for this. > > Thanks, > > Danyang > > > From Tim.Meehan at grayanalytics.com Thu Jan 19 13:17:12 2023 From: Tim.Meehan at grayanalytics.com (Tim Meehan) Date: Thu, 19 Jan 2023 19:17:12 +0000 Subject: [petsc-users] locally deploy PETSc In-Reply-To: <873586z73h.fsf@jedbrown.org> References: <873586z73h.fsf@jedbrown.org> Message-ID: Thanks Jed! I ran: make clean ./configure --prefix=/opt/petsc make all check sudo make install It then worked like you said, so thanks! -----Original Message----- From: Jed Brown Sent: Thursday, January 19, 2023 12:56 PM To: Tim Meehan ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] locally deploy PETSc Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. You're probably looking for ./configure --prefix=/opt/petsc. It's documented in ./configure --help. Tim Meehan writes: > Hi - I am trying to set up a local workstation for a few other developers who need PETSc installed from the latest release. I figured that it would be easiest for me to just clone the repository, as mentioned in the Quick Start. > > So, in /home/me/opt, I issued: > git clone -b release > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl > ab.com%2Fpetsc%2Fpetsc.git&data=05%7C01%7CTim.Meehan%40grayanalytics.c > om%7C02f3576f77744f7032e708dafa4edb0f%7C0932a7697e6f40a98297c05af696b8 > 56%7C0%7C0%7C638097513854068046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s > data=kO3MPjPfZ%2B3xaDBts5wG6Rv1%2BlbxEihRmdSxb8MVYNI%3D&reserved=0 > petsc cd petsc ./configure make all check > > Things work fine, but I would like to install it in /opt/petsc, minus > all of the build derbris > > Is there some way to have './configure' do this? > (I was actually thinking that the configure script was from GNU > autotools or something - but obviously not) > > Cheers, > Tim From balay at mcs.anl.gov Thu Jan 19 13:18:16 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 19 Jan 2023 13:18:16 -0600 (CST) Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> Message-ID: <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> BTW: cmake is required by superlu-dist not petsc. And its possible that petsc might not build with this old version of openmpi - [and/or the externalpackages that you are installing - might not build with this old version of intel compilers]. Satish On Thu, 19 Jan 2023, Barry Smith wrote: > > Remove --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz and install CMake yourself. Then configure PETSc with --with-cmake=directory you installed it in. > > Barry > > > > On Jan 19, 2023, at 1:46 PM, Danyang Su wrote: > > > > Hi All, > > > > I am trying to install the latest PETSc on an old cluster but always get some error information at the step of cmake. The system installed cmake is V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake first, it does not work. Then I tried to clean everything (delete the petsc_arch folder), download the latest cmake myself and pass the path to the configuration, the error is still there. > > > > The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have no problem to install PETSc-3.13.6 there. The latest version cannot pass configuration, unfortunately. Attached is the last configuration I have tried. > > > > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" > > > > Is there any solution for this. > > > > Thanks, > > > > Danyang > > > > > > > From danyang.su at gmail.com Thu Jan 19 17:34:50 2023 From: danyang.su at gmail.com (Danyang Su) Date: Thu, 19 Jan 2023 15:34:50 -0800 Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> Message-ID: Hi Barry and Satish, I guess there is compatibility problem with some external package. The latest CMake complains about the compiler, so I remove superlu_dist option since I rarely use it. Then the HYPRE package shows "Error: Hypre requires C++ compiler. None specified", which is a bit tricky since c++ compiler is specified in the configuration so I comment the related error code in hypre.py during configuration. After doing this, there is no error during PETSc configuration but new error occurs during make process. **************************ERROR************************************* ? Error during compile, check intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/make.log ? Send it and intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** It might be not worth checking this problem since most of the users do not work on such old cluster. Both log files are attached in case any developer wants to check. Please let me know if there is any suggestions and I am willing to make a test. Thanks, Danyang On 2023-01-19 11:18 a.m., Satish Balay wrote: > BTW: cmake is required by superlu-dist not petsc. > > And its possible that petsc might not build with this old version of openmpi - [and/or the externalpackages that you are installing - might not build with this old version of intel compilers]. > > Satish > > On Thu, 19 Jan 2023, Barry Smith wrote: > >> Remove --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz and install CMake yourself. Then configure PETSc with --with-cmake=directory you installed it in. >> >> Barry >> >> >>> On Jan 19, 2023, at 1:46 PM, Danyang Su wrote: >>> >>> Hi All, >>> >>> I am trying to install the latest PETSc on an old cluster but always get some error information at the step of cmake. The system installed cmake is V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake first, it does not work. Then I tried to clean everything (delete the petsc_arch folder), download the latest cmake myself and pass the path to the configuration, the error is still there. >>> >>> The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have no problem to install PETSc-3.13.6 there. The latest version cannot pass configuration, unfortunately. Attached is the last configuration I have tried. >>> >>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" >>> >>> Is there any solution for this. >>> >>> Thanks, >>> >>> Danyang >>> >>> >>> -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2052374 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 14516 bytes Desc: not available URL: From balay at mcs.anl.gov Thu Jan 19 17:58:56 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 19 Jan 2023 17:58:56 -0600 (CST) Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> Message-ID: <163c1626-9548-a15a-a7ca-05877407b83a@mcs.anl.gov> > /home/danyangs/soft/petsc/petsc-3.13.6/src/sys/makefile contains a directory not on the filesystem: ['\\'] Its strange that its complaining about petsc-3.13.6. Do you have this location set in your .bashrc or similar file - that's getting sourced during the build? Perhaps you could start with a fresh copy of petsc and retry? Also suggest using 'arch-' prefix for PETSC_ARCH i.e 'arch-intel-14.0.2-openmpi-1.6.5' - just in case there are some bugs lurking with skipping build files in this location Satish On Thu, 19 Jan 2023, Danyang Su wrote: > Hi Barry and Satish, > > I guess there is compatibility problem with some external package. The latest > CMake complains about the compiler, so I remove superlu_dist option since I > rarely use it. Then the HYPRE package shows "Error: Hypre requires C++ > compiler. None specified", which is a bit tricky since c++ compiler is > specified in the configuration so I comment the related error code in hypre.py > during configuration. After doing this, there is no error during PETSc > configuration but new error occurs during make process. > > **************************ERROR************************************* > ? Error during compile, check > intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/make.log > ? Send it and intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > > It might be not worth checking this problem since most of the users do not > work on such old cluster. Both log files are attached in case any developer > wants to check. Please let me know if there is any suggestions and I am > willing to make a test. > > Thanks, > > Danyang > > On 2023-01-19 11:18 a.m., Satish Balay wrote: > > BTW: cmake is required by superlu-dist not petsc. > > > > And its possible that petsc might not build with this old version of openmpi > > - [and/or the externalpackages that you are installing - might not build > > with this old version of intel compilers]. > > > > Satish > > > > On Thu, 19 Jan 2023, Barry Smith wrote: > > > >> Remove > >> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > >> and install CMake yourself. Then configure PETSc with > >> --with-cmake=directory you installed it in. > >> > >> Barry > >> > >> > >>> On Jan 19, 2023, at 1:46 PM, Danyang Su wrote: > >>> > >>> Hi All, > >>> > >>> I am trying to install the latest PETSc on an old cluster but always get > >>> some error information at the step of cmake. The system installed cmake is > >>> V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake > >>> first, it does not work. Then I tried to clean everything (delete the > >>> petsc_arch folder), download the latest cmake myself and pass the path to > >>> the configuration, the error is still there. > >>> > >>> The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have no > >>> problem to install PETSc-3.13.6 there. The latest version cannot pass > >>> configuration, unfortunately. Attached is the last configuration I have > >>> tried. > >>> > >>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > >>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > >>> --download-mumps --download-scalapack --download-parmetis --download-metis > >>> --download-ptscotch --download-fblaslapack --download-hypre > >>> --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings > >>> --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" > >>> CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native > >>> -mtune=native" > >>> > >>> Is there any solution for this. > >>> > >>> Thanks, > >>> > >>> Danyang > >>> > >>> > >>> > From danyang.su at gmail.com Thu Jan 19 18:25:26 2023 From: danyang.su at gmail.com (Danyang Su) Date: Thu, 19 Jan 2023 16:25:26 -0800 Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <163c1626-9548-a15a-a7ca-05877407b83a@mcs.anl.gov> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> <163c1626-9548-a15a-a7ca-05877407b83a@mcs.anl.gov> Message-ID: <58f0cd58-c357-b8c8-c1cb-684082c5aeba@gmail.com> Hi Satish, That's a bit strange since I have already use export PETSC_DIR=/home/danyangs/soft/petsc/petsc-3.18.3. Yes, I have petsc 3.13.6 installed and has PETSC_DIR set in the bashrc file. After changing PETSC_DIR in the bashrc file, PETSc can be compiled now. Thanks, Danyang On 2023-01-19 3:58 p.m., Satish Balay wrote: >> /home/danyangs/soft/petsc/petsc-3.13.6/src/sys/makefile contains a directory not on the filesystem: ['\\'] > > Its strange that its complaining about petsc-3.13.6. Do you have this location set in your .bashrc or similar file - that's getting sourced during the build? > > Perhaps you could start with a fresh copy of petsc and retry? > > Also suggest using 'arch-' prefix for PETSC_ARCH i.e 'arch-intel-14.0.2-openmpi-1.6.5' - just in case there are some bugs lurking with skipping build files in this location > > Satish > > > On Thu, 19 Jan 2023, Danyang Su wrote: > >> Hi Barry and Satish, >> >> I guess there is compatibility problem with some external package. The latest >> CMake complains about the compiler, so I remove superlu_dist option since I >> rarely use it. Then the HYPRE package shows "Error: Hypre requires C++ >> compiler. None specified", which is a bit tricky since c++ compiler is >> specified in the configuration so I comment the related error code in hypre.py >> during configuration. After doing this, there is no error during PETSc >> configuration but new error occurs during make process. >> >> **************************ERROR************************************* >> ? Error during compile, check >> intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/make.log >> ? Send it and intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/configure.log to >> petsc-maint at mcs.anl.gov >> ******************************************************************** >> >> It might be not worth checking this problem since most of the users do not >> work on such old cluster. Both log files are attached in case any developer >> wants to check. Please let me know if there is any suggestions and I am >> willing to make a test. >> >> Thanks, >> >> Danyang >> >> On 2023-01-19 11:18 a.m., Satish Balay wrote: >>> BTW: cmake is required by superlu-dist not petsc. >>> >>> And its possible that petsc might not build with this old version of openmpi >>> - [and/or the externalpackages that you are installing - might not build >>> with this old version of intel compilers]. >>> >>> Satish >>> >>> On Thu, 19 Jan 2023, Barry Smith wrote: >>> >>>> Remove >>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz >>>> and install CMake yourself. Then configure PETSc with >>>> --with-cmake=directory you installed it in. >>>> >>>> Barry >>>> >>>> >>>>> On Jan 19, 2023, at 1:46 PM, Danyang Su wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I am trying to install the latest PETSc on an old cluster but always get >>>>> some error information at the step of cmake. The system installed cmake is >>>>> V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake >>>>> first, it does not work. Then I tried to clean everything (delete the >>>>> petsc_arch folder), download the latest cmake myself and pass the path to >>>>> the configuration, the error is still there. >>>>> >>>>> The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have no >>>>> problem to install PETSc-3.13.6 there. The latest version cannot pass >>>>> configuration, unfortunately. Attached is the last configuration I have >>>>> tried. >>>>> >>>>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 >>>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz >>>>> --download-mumps --download-scalapack --download-parmetis --download-metis >>>>> --download-ptscotch --download-fblaslapack --download-hypre >>>>> --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings >>>>> --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" >>>>> CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native >>>>> -mtune=native" >>>>> >>>>> Is there any solution for this. >>>>> >>>>> Thanks, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> From balay at mcs.anl.gov Thu Jan 19 18:52:35 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 19 Jan 2023 18:52:35 -0600 (CST) Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <58f0cd58-c357-b8c8-c1cb-684082c5aeba@gmail.com> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> <163c1626-9548-a15a-a7ca-05877407b83a@mcs.anl.gov> <58f0cd58-c357-b8c8-c1cb-684082c5aeba@gmail.com> Message-ID: <58cca97d-ea6c-3751-d04b-5ffe65f4a186@mcs.anl.gov> Looks like .bashrc is getting sourced again during the build process [as make creates new bash shell during the build] - thus overriding the env variable that's set. Glad you have a working build now. Thanks for the update! BTW: superlu-dist requires cmake 3.18.1 or higher. You could check if this older version of cmake builds on this cluster [if you want to give superlu-dist a try again] Satish On Thu, 19 Jan 2023, Danyang Su wrote: > Hi Satish, > > That's a bit strange since I have already use export > PETSC_DIR=/home/danyangs/soft/petsc/petsc-3.18.3. > > Yes, I have petsc 3.13.6 installed and has PETSC_DIR set in the bashrc file. > After changing PETSC_DIR in the bashrc file, PETSc can be compiled now. > > Thanks, > > Danyang > > On 2023-01-19 3:58 p.m., Satish Balay wrote: > >> /home/danyangs/soft/petsc/petsc-3.13.6/src/sys/makefile contains a > >> directory not on the filesystem: ['\\'] > > > > Its strange that its complaining about petsc-3.13.6. Do you have this > > location set in your .bashrc or similar file - that's getting sourced during > > the build? > > > > Perhaps you could start with a fresh copy of petsc and retry? > > > > Also suggest using 'arch-' prefix for PETSC_ARCH i.e > > 'arch-intel-14.0.2-openmpi-1.6.5' - just in case there are some bugs lurking > > with skipping build files in this location > > > > Satish > > > > > > On Thu, 19 Jan 2023, Danyang Su wrote: > > > >> Hi Barry and Satish, > >> > >> I guess there is compatibility problem with some external package. The > >> latest > >> CMake complains about the compiler, so I remove superlu_dist option since I > >> rarely use it. Then the HYPRE package shows "Error: Hypre requires C++ > >> compiler. None specified", which is a bit tricky since c++ compiler is > >> specified in the configuration so I comment the related error code in > >> hypre.py > >> during configuration. After doing this, there is no error during PETSc > >> configuration but new error occurs during make process. > >> > >> **************************ERROR************************************* > >> ? Error during compile, check > >> intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/make.log > >> ? Send it and intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/configure.log to > >> petsc-maint at mcs.anl.gov > >> ******************************************************************** > >> > >> It might be not worth checking this problem since most of the users do not > >> work on such old cluster. Both log files are attached in case any developer > >> wants to check. Please let me know if there is any suggestions and I am > >> willing to make a test. > >> > >> Thanks, > >> > >> Danyang > >> > >> On 2023-01-19 11:18 a.m., Satish Balay wrote: > >>> BTW: cmake is required by superlu-dist not petsc. > >>> > >>> And its possible that petsc might not build with this old version of > >>> openmpi > >>> - [and/or the externalpackages that you are installing - might not build > >>> with this old version of intel compilers]. > >>> > >>> Satish > >>> > >>> On Thu, 19 Jan 2023, Barry Smith wrote: > >>> > >>>> Remove > >>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > >>>> and install CMake yourself. Then configure PETSc with > >>>> --with-cmake=directory you installed it in. > >>>> > >>>> Barry > >>>> > >>>> > >>>>> On Jan 19, 2023, at 1:46 PM, Danyang Su wrote: > >>>>> > >>>>> Hi All, > >>>>> > >>>>> I am trying to install the latest PETSc on an old cluster but always get > >>>>> some error information at the step of cmake. The system installed cmake > >>>>> is > >>>>> V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake > >>>>> first, it does not work. Then I tried to clean everything (delete the > >>>>> petsc_arch folder), download the latest cmake myself and pass the path > >>>>> to > >>>>> the configuration, the error is still there. > >>>>> > >>>>> The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have > >>>>> no > >>>>> problem to install PETSc-3.13.6 there. The latest version cannot pass > >>>>> configuration, unfortunately. Attached is the last configuration I have > >>>>> tried. > >>>>> > >>>>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > >>>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > >>>>> --download-mumps --download-scalapack --download-parmetis > >>>>> --download-metis > >>>>> --download-ptscotch --download-fblaslapack --download-hypre > >>>>> --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings > >>>>> --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" > >>>>> CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 > >>>>> -march=native > >>>>> -mtune=native" > >>>>> > >>>>> Is there any solution for this. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Danyang > >>>>> > >>>>> > >>>>> > From knepley at gmail.com Thu Jan 19 19:28:17 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Jan 2023 20:28:17 -0500 Subject: [petsc-users] Interpreting Redistribution SF In-Reply-To: References: Message-ID: On Thu, Jan 19, 2023 at 11:58 AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Petsc Users > > I'm working with a distribution start forest generated by > DMPlexDistribute and PetscSFBcast and Reduce to move data between the > initial distribution and the distribution generated by DMPlex Distribute. > > I'm trying to debug some values that aren't being copied properly and > wanted to verify I understand how a redistribution SF works compared with a > SF that describes overlapped points. > > [0] 0 <- (0,7) point 0 on the distributed plex is point 7 on process > 0 on the initial distribution > [0] 1 <- (0,8) point 1 on the distributed plex is point 8 on process > 0 on the initial distribution > [0] 2 <- (0,9) > [0] 3 <- (0,10) > [0] 4 <- (0,11) > > [1] 0 <- (1,0) point 0 on the distributed plex is point 0 on process > 1 on the initial distribution > [1] 1 <- (1,1) > [1] 2 <- (1,2) > [1] 3 <- (0,0) point 3 on the distributed plex is point 0 on process > 0 on the initial distribution > [1] 4 <- (0,1) > [1] 5 <- (0,2) > > my confusion I think is how does the distributionSF inform of what cells > will be leafs on the distribution? > I should eventually write something to clarify this. I am using SF in (at least) two different ways. First, there is a familiar SF that we use for dealing with "ghost" points. These are replicated points where one process is said to "own" the point and another process is said to hold a "ghost". The ghost points are leaves in the SF which point back to the root point owned by another process. We call this the pointSF for a DM. Second, we have a migration SF. Here the root points give the original point distribution. The leaf points give the new point distribution. Thus a PetscSFBcast() pushes points from the original to new distribution, which is what we mean by a migration. Third, instead of point values, we might want to communicate fields over those points. For this we make new SFes, where the numbering does not refer to points, but rather to dofs. Does this make sense? Thanks, Matt > Sincerely > Nicholas > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Thu Jan 19 20:12:48 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Thu, 19 Jan 2023 21:12:48 -0500 Subject: [petsc-users] Interpreting Redistribution SF In-Reply-To: References: Message-ID: Hi Matt Yep, that makes sense and is consistent. My question is a little more specific. So let's say I take an initial mesh and distribute it and get the distribution SF with an overlap of one. Consider a cell that is a root on process 0 and a leaf on process 1 after the distribution. Will the distribution pointSF have an entry for the cell that is a leaf in the ghost cell sense? I guess, in short does the distribution SF only have entries for the movement of points that are roots in the ghost SF? Sorry if this is a little unclear. Maybe my usage will be a bit clearer. I am generating a distributionSF (type 2 in your desc) then using that to generate a dof distribution(type 3) using the section information. I then pass the information from the initial distribution to new distribution with PetscSFBcast with MPI_REPLACE. That scatters the vector to the new distribution. I then do "stuff" and now want to redistribute back. So I pass the same dof distributionSF but call PetscSFReduce with MPI_REPLACE. My concern is I am only setting the root cell values on each partition. So if the ghost cells are part of the distribution SF there will be multiple cells reducing to the original distribution cell? Thanks Nicholas On Thu, Jan 19, 2023 at 8:28 PM Matthew Knepley wrote: > On Thu, Jan 19, 2023 at 11:58 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Petsc Users >> >> I'm working with a distribution start forest generated by >> DMPlexDistribute and PetscSFBcast and Reduce to move data between the >> initial distribution and the distribution generated by DMPlex Distribute. >> >> I'm trying to debug some values that aren't being copied properly and >> wanted to verify I understand how a redistribution SF works compared with a >> SF that describes overlapped points. >> >> [0] 0 <- (0,7) point 0 on the distributed plex is point 7 on >> process 0 on the initial distribution >> [0] 1 <- (0,8) point 1 on the distributed plex is point 8 on >> process 0 on the initial distribution >> [0] 2 <- (0,9) >> [0] 3 <- (0,10) >> [0] 4 <- (0,11) >> >> [1] 0 <- (1,0) point 0 on the distributed plex is point 0 on >> process 1 on the initial distribution >> [1] 1 <- (1,1) >> [1] 2 <- (1,2) >> [1] 3 <- (0,0) point 3 on the distributed plex is point 0 on >> process 0 on the initial distribution >> [1] 4 <- (0,1) >> [1] 5 <- (0,2) >> >> my confusion I think is how does the distributionSF inform of what cells >> will be leafs on the distribution? >> > > I should eventually write something to clarify this. I am using SF in (at > least) two different ways. > > First, there is a familiar SF that we use for dealing with "ghost" points. > These are replicated points where one process > is said to "own" the point and another process is said to hold a "ghost". > The ghost points are leaves in the SF which > point back to the root point owned by another process. We call this the > pointSF for a DM. > > Second, we have a migration SF. Here the root points give the original > point distribution. The leaf points give the new > point distribution. Thus a PetscSFBcast() pushes points from the original > to new distribution, which is what we mean > by a migration. > > Third, instead of point values, we might want to communicate fields over > those points. For this we make new SFes, > where the numbering does not refer to points, but rather to dofs. > > Does this make sense? > > Thanks, > > Matt > > >> Sincerely >> Nicholas >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 19 20:28:01 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Jan 2023 21:28:01 -0500 Subject: [petsc-users] Interpreting Redistribution SF In-Reply-To: References: Message-ID: On Thu, Jan 19, 2023 at 9:13 PM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Matt > > Yep, that makes sense and is consistent. > > My question is a little more specific. So let's say I take an initial mesh > and distribute it and get the distribution SF with an overlap of one. > Consider a cell that is a root on process 0 and a leaf on process 1 after > the distribution. > > Will the distribution pointSF have an entry for the cell that is a leaf in > the ghost cell sense? > > I guess, in short does the distribution SF only have entries for the > movement of points that are roots in the ghost SF? > I do not understand the question. Suppose that a certain cell, say 0, in the original distribution goes to two different processes, say 0 and 1, and will happen when you distribute with overlap. Then the migration SF has two leaf entries for that cell, one from process 0 and one from process 1. They both point to root cell 0 on process 0. > Sorry if this is a little unclear. > > Maybe my usage will be a bit clearer. I am generating a distributionSF > (type 2 in your desc) then using that to generate a dof distribution(type > 3) using the section information. I then pass the information from the > initial distribution to new distribution with PetscSFBcast with > MPI_REPLACE. That scatters the vector to the new distribution. I then do > "stuff" and now want to redistribute back. So I pass the same dof > distributionSF but call PetscSFReduce with MPI_REPLACE. My concern is I am > only setting the root cell values on each partition. So if the ghost cells > are part of the distribution SF there will be multiple cells reducing to > the original distribution cell? > Yes, definitely. Thanks, Matt > Thanks > Nicholas > > > On Thu, Jan 19, 2023 at 8:28 PM Matthew Knepley wrote: > >> On Thu, Jan 19, 2023 at 11:58 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Petsc Users >>> >>> I'm working with a distribution start forest generated by >>> DMPlexDistribute and PetscSFBcast and Reduce to move data between the >>> initial distribution and the distribution generated by DMPlex Distribute. >>> >>> I'm trying to debug some values that aren't being copied properly and >>> wanted to verify I understand how a redistribution SF works compared with a >>> SF that describes overlapped points. >>> >>> [0] 0 <- (0,7) point 0 on the distributed plex is point 7 on >>> process 0 on the initial distribution >>> [0] 1 <- (0,8) point 1 on the distributed plex is point 8 on >>> process 0 on the initial distribution >>> [0] 2 <- (0,9) >>> [0] 3 <- (0,10) >>> [0] 4 <- (0,11) >>> >>> [1] 0 <- (1,0) point 0 on the distributed plex is point 0 on >>> process 1 on the initial distribution >>> [1] 1 <- (1,1) >>> [1] 2 <- (1,2) >>> [1] 3 <- (0,0) point 3 on the distributed plex is point 0 on >>> process 0 on the initial distribution >>> [1] 4 <- (0,1) >>> [1] 5 <- (0,2) >>> >>> my confusion I think is how does the distributionSF inform of what >>> cells will be leafs on the distribution? >>> >> >> I should eventually write something to clarify this. I am using SF in (at >> least) two different ways. >> >> First, there is a familiar SF that we use for dealing with "ghost" >> points. These are replicated points where one process >> is said to "own" the point and another process is said to hold a "ghost". >> The ghost points are leaves in the SF which >> point back to the root point owned by another process. We call this the >> pointSF for a DM. >> >> Second, we have a migration SF. Here the root points give the original >> point distribution. The leaf points give the new >> point distribution. Thus a PetscSFBcast() pushes points from the original >> to new distribution, which is what we mean >> by a migration. >> >> Third, instead of point values, we might want to communicate fields over >> those points. For this we make new SFes, >> where the numbering does not refer to points, but rather to dofs. >> >> Does this make sense? >> >> Thanks, >> >> Matt >> >> >>> Sincerely >>> Nicholas >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Thu Jan 19 21:12:19 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Thu, 19 Jan 2023 22:12:19 -0500 Subject: [petsc-users] Interpreting Redistribution SF In-Reply-To: References: Message-ID: Ok thanks for the clarification. In theory, if before the Reduction back to the original distribution, if I call DMGlobaltoLocal then even with MPI_REPLACE all the leafs corresponding to the original root should have the same value so I won't have an ambiguity, correct? On Thu, Jan 19, 2023 at 9:28 PM Matthew Knepley wrote: > On Thu, Jan 19, 2023 at 9:13 PM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Matt >> >> Yep, that makes sense and is consistent. >> >> My question is a little more specific. So let's say I take an >> initial mesh and distribute it and get the distribution SF with an overlap >> of one. Consider a cell that is a root on process 0 and a leaf on process 1 >> after the distribution. >> >> Will the distribution pointSF have an entry for the cell that is a leaf >> in the ghost cell sense? >> >> I guess, in short does the distribution SF only have entries for the >> movement of points that are roots in the ghost SF? >> > > I do not understand the question. Suppose that a certain cell, say 0, in > the original distribution goes to two different processes, say 0 and 1, and > will happen when you distribute with overlap. Then the migration SF has two > leaf entries for that cell, one from process 0 and one from process 1. They > both point to root cell 0 on process 0. > > >> Sorry if this is a little unclear. >> >> Maybe my usage will be a bit clearer. I am generating a distributionSF >> (type 2 in your desc) then using that to generate a dof distribution(type >> 3) using the section information. I then pass the information from the >> initial distribution to new distribution with PetscSFBcast with >> MPI_REPLACE. That scatters the vector to the new distribution. I then do >> "stuff" and now want to redistribute back. So I pass the same dof >> distributionSF but call PetscSFReduce with MPI_REPLACE. My concern is I am >> only setting the root cell values on each partition. So if the ghost cells >> are part of the distribution SF there will be multiple cells reducing to >> the original distribution cell? >> > > Yes, definitely. > > Thanks, > > Matt > > >> Thanks >> Nicholas >> >> >> On Thu, Jan 19, 2023 at 8:28 PM Matthew Knepley >> wrote: >> >>> On Thu, Jan 19, 2023 at 11:58 AM Nicholas Arnold-Medabalimi < >>> narnoldm at umich.edu> wrote: >>> >>>> Hi Petsc Users >>>> >>>> I'm working with a distribution start forest generated by >>>> DMPlexDistribute and PetscSFBcast and Reduce to move data between the >>>> initial distribution and the distribution generated by DMPlex Distribute. >>>> >>>> I'm trying to debug some values that aren't being copied properly and >>>> wanted to verify I understand how a redistribution SF works compared with a >>>> SF that describes overlapped points. >>>> >>>> [0] 0 <- (0,7) point 0 on the distributed plex is point 7 on >>>> process 0 on the initial distribution >>>> [0] 1 <- (0,8) point 1 on the distributed plex is point 8 on >>>> process 0 on the initial distribution >>>> [0] 2 <- (0,9) >>>> [0] 3 <- (0,10) >>>> [0] 4 <- (0,11) >>>> >>>> [1] 0 <- (1,0) point 0 on the distributed plex is point 0 on >>>> process 1 on the initial distribution >>>> [1] 1 <- (1,1) >>>> [1] 2 <- (1,2) >>>> [1] 3 <- (0,0) point 3 on the distributed plex is point 0 on >>>> process 0 on the initial distribution >>>> [1] 4 <- (0,1) >>>> [1] 5 <- (0,2) >>>> >>>> my confusion I think is how does the distributionSF inform of what >>>> cells will be leafs on the distribution? >>>> >>> >>> I should eventually write something to clarify this. I am using SF in >>> (at least) two different ways. >>> >>> First, there is a familiar SF that we use for dealing with "ghost" >>> points. These are replicated points where one process >>> is said to "own" the point and another process is said to hold a >>> "ghost". The ghost points are leaves in the SF which >>> point back to the root point owned by another process. We call this the >>> pointSF for a DM. >>> >>> Second, we have a migration SF. Here the root points give the original >>> point distribution. The leaf points give the new >>> point distribution. Thus a PetscSFBcast() pushes points from the >>> original to new distribution, which is what we mean >>> by a migration. >>> >>> Third, instead of point values, we might want to communicate fields over >>> those points. For this we make new SFes, >>> where the numbering does not refer to points, but rather to dofs. >>> >>> Does this make sense? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Sincerely >>>> Nicholas >>>> >>>> -- >>>> Nicholas Arnold-Medabalimi >>>> >>>> Ph.D. Candidate >>>> Computational Aeroscience Lab >>>> University of Michigan >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Thu Jan 19 23:38:34 2023 From: danyang.su at gmail.com (Danyang Su) Date: Thu, 19 Jan 2023 21:38:34 -0800 Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <58cca97d-ea6c-3751-d04b-5ffe65f4a186@mcs.anl.gov> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> <163c1626-9548-a15a-a7ca-05877407b83a@mcs.anl.gov> <58f0cd58-c357-b8c8-c1cb-684082c5aeba@gmail.com> <58cca97d-ea6c-3751-d04b-5ffe65f4a186@mcs.anl.gov> Message-ID: <737C8A4B-E77E-40A4-B40B-238DBE23F49C@gmail.com> Hi Satish, For some unknown reason during Cmake 3.18.5 installation, I get error "Cannot find a C++ compiler that supports both C++11 and the specified C++ flags.". The system installed Cmake 3.2.3 is way too old. I will just leave it as is since superlu_dist is optional in my model. Thanks for your suggestions to make it work, Danyang ?On 2023-01-19, 4:52 PM, "Satish Balay" > wrote: Looks like .bashrc is getting sourced again during the build process [as make creates new bash shell during the build] - thus overriding the env variable that's set. Glad you have a working build now. Thanks for the update! BTW: superlu-dist requires cmake 3.18.1 or higher. You could check if this older version of cmake builds on this cluster [if you want to give superlu-dist a try again] Satish On Thu, 19 Jan 2023, Danyang Su wrote: > Hi Satish, > > That's a bit strange since I have already use export > PETSC_DIR=/home/danyangs/soft/petsc/petsc-3.18.3. > > Yes, I have petsc 3.13.6 installed and has PETSC_DIR set in the bashrc file. > After changing PETSC_DIR in the bashrc file, PETSc can be compiled now. > > Thanks, > > Danyang > > On 2023-01-19 3:58 p.m., Satish Balay wrote: > >> /home/danyangs/soft/petsc/petsc-3.13.6/src/sys/makefile contains a > >> directory not on the filesystem: ['\\'] > > > > Its strange that its complaining about petsc-3.13.6. Do you have this > > location set in your .bashrc or similar file - that's getting sourced during > > the build? > > > > Perhaps you could start with a fresh copy of petsc and retry? > > > > Also suggest using 'arch-' prefix for PETSC_ARCH i.e > > 'arch-intel-14.0.2-openmpi-1.6.5' - just in case there are some bugs lurking > > with skipping build files in this location > > > > Satish > > > > > > On Thu, 19 Jan 2023, Danyang Su wrote: > > > >> Hi Barry and Satish, > >> > >> I guess there is compatibility problem with some external package. The > >> latest > >> CMake complains about the compiler, so I remove superlu_dist option since I > >> rarely use it. Then the HYPRE package shows "Error: Hypre requires C++ > >> compiler. None specified", which is a bit tricky since c++ compiler is > >> specified in the configuration so I comment the related error code in > >> hypre.py > >> during configuration. After doing this, there is no error during PETSc > >> configuration but new error occurs during make process. > >> > >> **************************ERROR************************************* > >> Error during compile, check > >> intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/make.log > >> Send it and intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/configure.log to > >> petsc-maint at mcs.anl.gov > >> ******************************************************************** > >> > >> It might be not worth checking this problem since most of the users do not > >> work on such old cluster. Both log files are attached in case any developer > >> wants to check. Please let me know if there is any suggestions and I am > >> willing to make a test. > >> > >> Thanks, > >> > >> Danyang > >> > >> On 2023-01-19 11:18 a.m., Satish Balay wrote: > >>> BTW: cmake is required by superlu-dist not petsc. > >>> > >>> And its possible that petsc might not build with this old version of > >>> openmpi > >>> - [and/or the externalpackages that you are installing - might not build > >>> with this old version of intel compilers]. > >>> > >>> Satish > >>> > >>> On Thu, 19 Jan 2023, Barry Smith wrote: > >>> > >>>> Remove > >>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > >>>> and install CMake yourself. Then configure PETSc with > >>>> --with-cmake=directory you installed it in. > >>>> > >>>> Barry > >>>> > >>>> > >>>>> On Jan 19, 2023, at 1:46 PM, Danyang Su > wrote: > >>>>> > >>>>> Hi All, > >>>>> > >>>>> I am trying to install the latest PETSc on an old cluster but always get > >>>>> some error information at the step of cmake. The system installed cmake > >>>>> is > >>>>> V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake > >>>>> first, it does not work. Then I tried to clean everything (delete the > >>>>> petsc_arch folder), download the latest cmake myself and pass the path > >>>>> to > >>>>> the configuration, the error is still there. > >>>>> > >>>>> The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have > >>>>> no > >>>>> problem to install PETSc-3.13.6 there. The latest version cannot pass > >>>>> configuration, unfortunately. Attached is the last configuration I have > >>>>> tried. > >>>>> > >>>>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > >>>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > >>>>> --download-mumps --download-scalapack --download-parmetis > >>>>> --download-metis > >>>>> --download-ptscotch --download-fblaslapack --download-hypre > >>>>> --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings > >>>>> --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" > >>>>> CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 > >>>>> -march=native > >>>>> -mtune=native" > >>>>> > >>>>> Is there any solution for this. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Danyang > >>>>> > >>>>> > >>>>> > From knepley at gmail.com Fri Jan 20 07:54:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 20 Jan 2023 08:54:05 -0500 Subject: [petsc-users] Interpreting Redistribution SF In-Reply-To: References: Message-ID: On Thu, Jan 19, 2023 at 10:12 PM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Ok thanks for the clarification. In theory, if before the Reduction back > to the original distribution, if I call DMGlobaltoLocal then even with > MPI_REPLACE all the leafs corresponding to the original root should have > the same value so I won't have an ambiguity, correct? > That is right, so it should give you the result you expect. Thanks, Matt > On Thu, Jan 19, 2023 at 9:28 PM Matthew Knepley wrote: > >> On Thu, Jan 19, 2023 at 9:13 PM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Matt >>> >>> Yep, that makes sense and is consistent. >>> >>> My question is a little more specific. So let's say I take an >>> initial mesh and distribute it and get the distribution SF with an overlap >>> of one. Consider a cell that is a root on process 0 and a leaf on process 1 >>> after the distribution. >>> >>> Will the distribution pointSF have an entry for the cell that is a leaf >>> in the ghost cell sense? >>> >>> I guess, in short does the distribution SF only have entries for the >>> movement of points that are roots in the ghost SF? >>> >> >> I do not understand the question. Suppose that a certain cell, say 0, in >> the original distribution goes to two different processes, say 0 and 1, and >> will happen when you distribute with overlap. Then the migration SF has two >> leaf entries for that cell, one from process 0 and one from process 1. They >> both point to root cell 0 on process 0. >> >> >>> Sorry if this is a little unclear. >>> >>> Maybe my usage will be a bit clearer. I am generating a distributionSF >>> (type 2 in your desc) then using that to generate a dof distribution(type >>> 3) using the section information. I then pass the information from the >>> initial distribution to new distribution with PetscSFBcast with >>> MPI_REPLACE. That scatters the vector to the new distribution. I then do >>> "stuff" and now want to redistribute back. So I pass the same dof >>> distributionSF but call PetscSFReduce with MPI_REPLACE. My concern is I am >>> only setting the root cell values on each partition. So if the ghost cells >>> are part of the distribution SF there will be multiple cells reducing to >>> the original distribution cell? >>> >> >> Yes, definitely. >> >> Thanks, >> >> Matt >> >> >>> Thanks >>> Nicholas >>> >>> >>> On Thu, Jan 19, 2023 at 8:28 PM Matthew Knepley >>> wrote: >>> >>>> On Thu, Jan 19, 2023 at 11:58 AM Nicholas Arnold-Medabalimi < >>>> narnoldm at umich.edu> wrote: >>>> >>>>> Hi Petsc Users >>>>> >>>>> I'm working with a distribution start forest generated by >>>>> DMPlexDistribute and PetscSFBcast and Reduce to move data between the >>>>> initial distribution and the distribution generated by DMPlex Distribute. >>>>> >>>>> I'm trying to debug some values that aren't being copied properly and >>>>> wanted to verify I understand how a redistribution SF works compared with a >>>>> SF that describes overlapped points. >>>>> >>>>> [0] 0 <- (0,7) point 0 on the distributed plex is point 7 on >>>>> process 0 on the initial distribution >>>>> [0] 1 <- (0,8) point 1 on the distributed plex is point 8 on >>>>> process 0 on the initial distribution >>>>> [0] 2 <- (0,9) >>>>> [0] 3 <- (0,10) >>>>> [0] 4 <- (0,11) >>>>> >>>>> [1] 0 <- (1,0) point 0 on the distributed plex is point 0 on >>>>> process 1 on the initial distribution >>>>> [1] 1 <- (1,1) >>>>> [1] 2 <- (1,2) >>>>> [1] 3 <- (0,0) point 3 on the distributed plex is point 0 on >>>>> process 0 on the initial distribution >>>>> [1] 4 <- (0,1) >>>>> [1] 5 <- (0,2) >>>>> >>>>> my confusion I think is how does the distributionSF inform of what >>>>> cells will be leafs on the distribution? >>>>> >>>> >>>> I should eventually write something to clarify this. I am using SF in >>>> (at least) two different ways. >>>> >>>> First, there is a familiar SF that we use for dealing with "ghost" >>>> points. These are replicated points where one process >>>> is said to "own" the point and another process is said to hold a >>>> "ghost". The ghost points are leaves in the SF which >>>> point back to the root point owned by another process. We call this the >>>> pointSF for a DM. >>>> >>>> Second, we have a migration SF. Here the root points give the original >>>> point distribution. The leaf points give the new >>>> point distribution. Thus a PetscSFBcast() pushes points from the >>>> original to new distribution, which is what we mean >>>> by a migration. >>>> >>>> Third, instead of point values, we might want to communicate fields >>>> over those points. For this we make new SFes, >>>> where the numbering does not refer to points, but rather to dofs. >>>> >>>> Does this make sense? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Sincerely >>>>> Nicholas >>>>> >>>>> -- >>>>> Nicholas Arnold-Medabalimi >>>>> >>>>> Ph.D. Candidate >>>>> Computational Aeroscience Lab >>>>> University of Michigan >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Jan 20 08:10:06 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 20 Jan 2023 08:10:06 -0600 (CST) Subject: [petsc-users] Cmake problem on an old cluster In-Reply-To: <737C8A4B-E77E-40A4-B40B-238DBE23F49C@gmail.com> References: <0f4fb83b-cc97-2902-d0c1-7c7d05404a6c@gmail.com> <25400088-E41A-4F0C-8D6D-1D43319ED090@petsc.dev> <40cad570-0e60-b742-961a-abb867656214@mcs.anl.gov> <163c1626-9548-a15a-a7ca-05877407b83a@mcs.anl.gov> <58f0cd58-c357-b8c8-c1cb-684082c5aeba@gmail.com> <58cca97d-ea6c-3751-d04b-5ffe65f4a186@mcs.anl.gov> <737C8A4B-E77E-40A4-B40B-238DBE23F49C@gmail.com> Message-ID: > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] I guess this g++ version is a bit too old, and you might need a newer version of gcc. [if one is not already installed on this cluster] One way to install a newer gcc - say gcc-7 is via spack: git clone https://github.com/spack/spack/ cd spack ./bin/spack install gcc at 7.5.0 Satish On Thu, 19 Jan 2023, Danyang Su wrote: > Hi Satish, > > For some unknown reason during Cmake 3.18.5 installation, I get error "Cannot find a C++ compiler that supports both C++11 and the specified C++ flags.". The system installed Cmake 3.2.3 is way too old. > > I will just leave it as is since superlu_dist is optional in my model. > > Thanks for your suggestions to make it work, > > Danyang > > ?On 2023-01-19, 4:52 PM, "Satish Balay" > wrote: > > > Looks like .bashrc is getting sourced again during the build process [as make creates new bash shell during the build] - thus overriding the env variable that's set. > > > Glad you have a working build now. Thanks for the update! > > > BTW: superlu-dist requires cmake 3.18.1 or higher. You could check if this older version of cmake builds on this cluster [if you want to give superlu-dist a try again] > > > Satish > > > > > On Thu, 19 Jan 2023, Danyang Su wrote: > > > > Hi Satish, > > > > That's a bit strange since I have already use export > > PETSC_DIR=/home/danyangs/soft/petsc/petsc-3.18.3. > > > > Yes, I have petsc 3.13.6 installed and has PETSC_DIR set in the bashrc file. > > After changing PETSC_DIR in the bashrc file, PETSc can be compiled now. > > > > Thanks, > > > > Danyang > > > > On 2023-01-19 3:58 p.m., Satish Balay wrote: > > >> /home/danyangs/soft/petsc/petsc-3.13.6/src/sys/makefile contains a > > >> directory not on the filesystem: ['\\'] > > > > > > Its strange that its complaining about petsc-3.13.6. Do you have this > > > location set in your .bashrc or similar file - that's getting sourced during > > > the build? > > > > > > Perhaps you could start with a fresh copy of petsc and retry? > > > > > > Also suggest using 'arch-' prefix for PETSC_ARCH i.e > > > 'arch-intel-14.0.2-openmpi-1.6.5' - just in case there are some bugs lurking > > > with skipping build files in this location > > > > > > Satish > > > > > > > > > On Thu, 19 Jan 2023, Danyang Su wrote: > > > > > >> Hi Barry and Satish, > > >> > > >> I guess there is compatibility problem with some external package. The > > >> latest > > >> CMake complains about the compiler, so I remove superlu_dist option since I > > >> rarely use it. Then the HYPRE package shows "Error: Hypre requires C++ > > >> compiler. None specified", which is a bit tricky since c++ compiler is > > >> specified in the configuration so I comment the related error code in > > >> hypre.py > > >> during configuration. After doing this, there is no error during PETSc > > >> configuration but new error occurs during make process. > > >> > > >> **************************ERROR************************************* > > >> Error during compile, check > > >> intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/make.log > > >> Send it and intel-14.0.2-openmpi-1.6.5/lib/petsc/conf/configure.log to > > >> petsc-maint at mcs.anl.gov > > >> ******************************************************************** > > >> > > >> It might be not worth checking this problem since most of the users do not > > >> work on such old cluster. Both log files are attached in case any developer > > >> wants to check. Please let me know if there is any suggestions and I am > > >> willing to make a test. > > >> > > >> Thanks, > > >> > > >> Danyang > > >> > > >> On 2023-01-19 11:18 a.m., Satish Balay wrote: > > >>> BTW: cmake is required by superlu-dist not petsc. > > >>> > > >>> And its possible that petsc might not build with this old version of > > >>> openmpi > > >>> - [and/or the externalpackages that you are installing - might not build > > >>> with this old version of intel compilers]. > > >>> > > >>> Satish > > >>> > > >>> On Thu, 19 Jan 2023, Barry Smith wrote: > > >>> > > >>>> Remove > > >>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > > >>>> and install CMake yourself. Then configure PETSc with > > >>>> --with-cmake=directory you installed it in. > > >>>> > > >>>> Barry > > >>>> > > >>>> > > >>>>> On Jan 19, 2023, at 1:46 PM, Danyang Su > wrote: > > >>>>> > > >>>>> Hi All, > > >>>>> > > >>>>> I am trying to install the latest PETSc on an old cluster but always get > > >>>>> some error information at the step of cmake. The system installed cmake > > >>>>> is > > >>>>> V3.2.3, which is out-of-date for PETSc. I tried to use --download-cmake > > >>>>> first, it does not work. Then I tried to clean everything (delete the > > >>>>> petsc_arch folder), download the latest cmake myself and pass the path > > >>>>> to > > >>>>> the configuration, the error is still there. > > >>>>> > > >>>>> The compiler there is a bit old, intel-14.0.2 and openmpi-1.6.5. I have > > >>>>> no > > >>>>> problem to install PETSc-3.13.6 there. The latest version cannot pass > > >>>>> configuration, unfortunately. Attached is the last configuration I have > > >>>>> tried. > > >>>>> > > >>>>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > > >>>>> --download-cmake=/home/danyangs/soft/petsc/petsc-3.18.3/packages/cmake-3.25.1.tar.gz > > >>>>> --download-mumps --download-scalapack --download-parmetis > > >>>>> --download-metis > > >>>>> --download-ptscotch --download-fblaslapack --download-hypre > > >>>>> --download-superlu_dist --download-hdf5=yes --with-hdf5-fortran-bindings > > >>>>> --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" > > >>>>> CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 > > >>>>> -march=native > > >>>>> -mtune=native" > > >>>>> > > >>>>> Is there any solution for this. > > >>>>> > > >>>>> Thanks, > > >>>>> > > >>>>> Danyang > > >>>>> > > >>>>> > > >>>>> > > > > > > From facklerpw at ornl.gov Fri Jan 20 10:55:28 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Fri, 20 Jan 2023 16:55:28 +0000 Subject: [petsc-users] Performance problem using COO interface In-Reply-To: References: Message-ID: The following is the log_view output for the ported case using 4 MPI tasks. **************************************************************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20 11:53:04 2023 Using Petsc Release Version 3.18.3, unknown Max Max/Min Avg Total Time (sec): 1.447e+01 1.000 1.447e+01 Objects: 1.229e+03 1.003 1.226e+03 Flops: 5.053e+09 1.217 4.593e+09 1.837e+10 Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09 MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04 MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08 MPI Reductions: 2.065e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04 100.0% 3.727e+03 100.0% 2.046e+03 99.1% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan -nan 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423 -nan 0 0.00e+00 0 0.00e+00 99 TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan -nan 0 0.00e+00 0 0.00e+00 87 MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 99 MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506 -nan 0 0.00e+00 0 0.00e+00 99 KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419 -nan 0 0.00e+00 0 0.00e+00 99 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan -nan 0 0.00e+00 0 0.00e+00 87 SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan -nan 0 0.00e+00 0 0.00e+00 96 PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 100 GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 43 PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan -nan 0 0.00e+00 0 0.00e+00 96 PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan -nan 0 0.00e+00 0 0.00e+00 99 KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan -nan 0 0.00e+00 0 0.00e+00 100 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 14 14 Distributed Mesh 9 9 Index Set 120 120 IS L to G Mapping 10 10 Star Forest Graph 87 87 Discrete System 9 9 Weak Form 9 9 Vector 761 761 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 11 11 DMKSP interface 1 1 Matrix 171 171 Matrix Coarsen 3 3 Preconditioner 11 11 Viewer 2 1 PetscRandom 3 3 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.82e-08 Average time for MPI_Barrier(): 2.2968e-06 Average time for zero size MPI_Send(): 3.371e-06 #PETSc Option Table entries: -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home2/4pf/petsc PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 --with-shared-libraries --with-64-bit-indices --with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-kokkos-dir=/home2/4pf/.local/serial --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack ----------------------------------------- Libraries compiled on 2023-01-06 18:21:31 on iguazu Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28 Using PETSc directory: /home2/4pf/.local/serial Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home2/4pf/.local/serial/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lpetsc -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm -lX11 -lquadmath -lstdc++ -ldl ----------------------------------------- --- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Zhang, Junchao Sent: Tuesday, January 17, 2023 17:25 To: Fackler, Philip ; xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov Cc: Mills, Richard Tran ; Blondel, Sophie ; Roth, Philip Subject: [EXTERNAL] Re: Performance problem using COO interface Hi, Philip, Could you add -log_view and see what functions are used in the solve? Since it is CPU-only, perhaps with -log_view of different runs, we can easily see which functions slowed down. --Junchao Zhang ________________________________ From: Fackler, Philip Sent: Tuesday, January 17, 2023 4:13 PM To: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov Cc: Mills, Richard Tran ; Zhang, Junchao ; Blondel, Sophie ; Roth, Philip Subject: Performance problem using COO interface In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos?. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos? (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Fri Jan 20 11:00:51 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Fri, 20 Jan 2023 17:00:51 +0000 Subject: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device. In-Reply-To: References: Message-ID: Any progress on this? Any info/help needed? Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Fackler, Philip Sent: Thursday, December 8, 2022 09:07 To: Junchao Zhang Cc: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov ; Blondel, Sophie ; Roth, Philip Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Great! Thank you! Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Wednesday, December 7, 2022 18:47 To: Fackler, Philip Cc: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov ; Blondel, Sophie ; Roth, Philip Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, I could reproduce the error. I need to find a way to debug it. Thanks. /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10 *** 1 failure is detected in the test module "Regression" --Junchao Zhang On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip > wrote: I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run: ./test/system/SystemTester -t System/PSI_1 -- -v? (No need for multiple MPI ranks) Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Monday, December 5, 2022 15:40 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built. [ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang > wrote: Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip > wrote: Junchao, Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the corresponding cusparse/cuda option). Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Thursday, December 1, 2022 17:05 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip > wrote: ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip > wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Zhang, Junchao >; Roth, Philip > Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users > wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Jan 20 11:31:48 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 20 Jan 2023 11:31:48 -0600 Subject: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device. In-Reply-To: References: Message-ID: Sorry, no progress. I guess that is because a vector was gotten but not restored (e.g., VecRestoreArray() etc), causing host and device data not synced. Maybe in your code, or in petsc code. After the ECP AM, I will have more time on this bug. Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 11:00 AM Fackler, Philip wrote: > Any progress on this? Any info/help needed? > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Fackler, Philip > *Sent:* Thursday, December 8, 2022 09:07 > *To:* Junchao Zhang > *Cc:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, > Philip > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Great! Thank you! > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Wednesday, December 7, 2022 18:47 > *To:* Fackler, Philip > *Cc:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, > Philip > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Hi, Philip, > I could reproduce the error. I need to find a way to debug it. Thanks. > > /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in > "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds > 1e-10 > *** 1 failure is detected in the test module "Regression" > > > --Junchao Zhang > > > On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip > wrote: > > I think it would be simpler to use the develop branch for this issue. But > you can still just build the SystemTester. Then (if you changed the PSI_1 > case) run: > > ./test/system/SystemTester -t System/PSI_1 -- -v? > > (No need for multiple MPI ranks) > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, December 5, 2022 15:40 > *To:* Fackler, Philip > *Cc:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, > Philip > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > I configured with xolotl branch feature-petsc-kokkos, and typed `make` > under ~/xolotl-build/. Though there were errors, a lot of *Tester were > built. > > [ 62%] Built target xolotlViz > [ 63%] Linking CXX executable TemperatureProfileHandlerTester > [ 64%] Linking CXX executable TemperatureGradientHandlerTester > [ 64%] Built target TemperatureProfileHandlerTester > [ 64%] Built target TemperatureConstantHandlerTester > [ 64%] Built target TemperatureGradientHandlerTester > [ 65%] Linking CXX executable HeatEquationHandlerTester > [ 65%] Built target HeatEquationHandlerTester > [ 66%] Linking CXX executable FeFitFluxHandlerTester > [ 66%] Linking CXX executable W111FitFluxHandlerTester > [ 67%] Linking CXX executable FuelFitFluxHandlerTester > [ 67%] Linking CXX executable W211FitFluxHandlerTester > > Which Tester should I use to run with the parameter file > benchmarks/params_system_PSI_2.txt? And how many ranks should I use? > Could you give an example command line? > Thanks. > > --Junchao Zhang > > > On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang > wrote: > > Hello, Philip, > Do I still need to use the feature-petsc-kokkos branch? > --Junchao Zhang > > > On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip > wrote: > > Junchao, > > Thank you for working on this. If you open the parameter file for, say, > the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type > aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the > corresponding cusparse/cuda option). > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Thursday, December 1, 2022 17:05 > *To:* Fackler, Philip > *Cc:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, > Philip > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Hi, Philip, > Sorry for the long delay. I could not get something useful from the > -log_view output. Since I have already built xolotl, could you give me > instructions on how to do a xolotl test to reproduce the divergence with > petsc GPU backends (but fine on CPU)? > Thank you. > --Junchao Zhang > > > On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip > wrote: > > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 > 14:36:46 2022 > Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: > 2022-10-28 14:39:41 +0000 > > Max Max/Min Avg Total > Time (sec): 6.023e+00 1.000 6.023e+00 > Objects: 1.020e+02 1.000 1.020e+02 > Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 > Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 > MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 2 5.14e-03 0 0.00e+00 0 > > VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 97100 0 0 0 97100 0 0 0 184 > -nan 2 5.14e-03 0 0.00e+00 54 > > TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan > -nan 1 3.36e-04 0 0.00e+00 100 > > TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 > -nan 1 4.80e-03 0 0.00e+00 46 > > KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 > -nan 1 4.80e-03 0 0.00e+00 53 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan > -nan 1 4.80e-03 0 0.00e+00 19 > > KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 5 5 > Distributed Mesh 2 2 > Index Set 11 11 > IS L to G Mapping 1 1 > Star Forest Graph 7 7 > Discrete System 2 2 > Weak Form 2 2 > Vector 49 49 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 4 4 > DMKSP interface 1 1 > Matrix 4 4 > Preconditioner 4 4 > Viewer 2 1 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.14e-08 > #PETSc Option Table entries: > -log_view > -log_view_gpu_times > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home/4pf/repos/petsc > PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries > --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 > --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install > --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install > > ----------------------------------------- > Libraries compiled on 2022-11-01 21:01:08 on PC0115427 > Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 > Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib > -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc > -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib > -L/home/4pf/build/kokkos/cuda/install/lib > -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 > -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers > -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, November 15, 2022 13:03 > *To:* Fackler, Philip > *Cc:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, > Philip > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Can you paste -log_view result so I can see what functions are used? > > --Junchao Zhang > > > On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip > wrote: > > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip > *Cc:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Blondel, Sophie ; Zhang, > Junchao ; Roth, Philip > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From quentin.chevalier at polytechnique.edu Sun Jan 22 03:40:11 2023 From: quentin.chevalier at polytechnique.edu (Quentin Chevalier) Date: Sun, 22 Jan 2023 10:40:11 +0100 Subject: [petsc-users] MUMPS icntl for petsc4py Message-ID: Hello PETSc users, I'm getting an INFOG(1)=-9 and INFO(2)=27 error on an eigenvalue code based on dolfinx run in a docker container. Based on https://mumps-solver.org/doc/userguide_5.5.1.pdf, I figured the fix would be to increase ICNTL(14). I'm coding in python through the petsc4py/slepc4py wrapper. I found a Mat.setMumpsIcntl method but I can't seem to place it properly and always obtain another error : "Operation done in wrong order and the like". Here's the code snippet that is failing : # Solver EPS = SLEPc.EPS().create(COMM_WORLD) EPS.setOperators(-A,M) # Solve Ax=sigma*Mx EPS.setProblemType(SLEPc.EPS.ProblemType.PGNHEP) # Specify that A is not hermitian, but M is semi-definite EPS.setWhichEigenpairs(EPS.Which.TARGET_MAGNITUDE) # Find eigenvalues close to sigma EPS.setTarget(sigma) EPS.setDimensions(2,10) # Find k eigenvalues only with max number of Lanczos vectors EPS.setTolerances(1e-9,100) # Set absolute tolerance and number of iterations # Spectral transform ST = EPS.getST(); ST.setType('sinvert') # Krylov subspace KSP = ST.getKSP() KSP.setTolerances(rtol=1e-6, atol=1e-9, max_it=100) # Krylov subspace KSP.setType('preonly') # Preconditioner PC = KSP.getPC(); PC.setType('lu') PC.setFactorSolverType('mumps') KSP.setFromOptions() EPS.setFromOptions() PC.getFactorMatrix().setMumpsIcntl(14,50) print(f"Solver launch for sig={sigma:.1f}...",flush=True) EPS.solve() n=EPS.getConverged() For context, matrix A is complex, size 500k x 500k but AIJ sparse, and I'm running this code on 36 nodes. I'd appreciate any insight on how to fix this issue, it's not clear to me what the order of operations should be. Funnily enough, it's very shift-dependent. Cheers, Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sun Jan 22 03:58:11 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 22 Jan 2023 10:58:11 +0100 Subject: [petsc-users] MUMPS icntl for petsc4py In-Reply-To: References: Message-ID: You have to call ST.getOperator() as is done in this C example: https://slepc.upv.es/documentation/current/src/eps/tutorials/ex43.c.html Jose > El 22 ene 2023, a las 10:40, Quentin Chevalier escribi?: > > Hello PETSc users, > > I'm getting an INFOG(1)=-9 and INFO(2)=27 error on an eigenvalue code based on dolfinx run in a docker container. Based on https://mumps-solver.org/doc/userguide_5.5.1.pdf, I figured the fix would be to increase ICNTL(14). > > I'm coding in python through the petsc4py/slepc4py wrapper. I found a Mat.setMumpsIcntl method but I can't seem to place it properly and always obtain another error : "Operation done in wrong order and the like". > > Here's the code snippet that is failing : > # Solver > EPS = SLEPc.EPS().create(COMM_WORLD) > EPS.setOperators(-A,M) # Solve Ax=sigma*Mx > EPS.setProblemType(SLEPc.EPS.ProblemType.PGNHEP) # Specify that A is not hermitian, but M is semi-definite > EPS.setWhichEigenpairs(EPS.Which.TARGET_MAGNITUDE) # Find eigenvalues close to sigma > EPS.setTarget(sigma) > EPS.setDimensions(2,10) # Find k eigenvalues only with max number of Lanczos vectors > EPS.setTolerances(1e-9,100) # Set absolute tolerance and number of iterations > # Spectral transform > ST = EPS.getST(); ST.setType('sinvert') > # Krylov subspace > KSP = ST.getKSP() > KSP.setTolerances(rtol=1e-6, atol=1e-9, max_it=100) > # Krylov subspace > KSP.setType('preonly') > # Preconditioner > PC = KSP.getPC(); PC.setType('lu') > PC.setFactorSolverType('mumps') > KSP.setFromOptions() > EPS.setFromOptions() > PC.getFactorMatrix().setMumpsIcntl(14,50) > print(f"Solver launch for sig={sigma:.1f}...",flush=True) > EPS.solve() > n=EPS.getConverged() > > For context, matrix A is complex, size 500k x 500k but AIJ sparse, and I'm running this code on 36 nodes. > > I'd appreciate any insight on how to fix this issue, it's not clear to me what the order of operations should be. Funnily enough, it's very shift-dependent. > > Cheers, > > Quentin From quentin.chevalier at polytechnique.edu Mon Jan 23 06:13:17 2023 From: quentin.chevalier at polytechnique.edu (Quentin Chevalier) Date: Mon, 23 Jan 2023 13:13:17 +0100 Subject: [petsc-users] MUMPS icntl for petsc4py In-Reply-To: References: Message-ID: Many thanks Jose, it works beautifully ! I'm at a loss as to why, but thanks for the quick fix ! Quentin Quentin CHEVALIER ? IA parcours recherche LadHyX - Ecole polytechnique __________ On Sun, 22 Jan 2023 at 10:58, Jose E. Roman wrote: > > You have to call ST.getOperator() as is done in this C example: > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex43.c.html > > Jose > > > > El 22 ene 2023, a las 10:40, Quentin Chevalier escribi?: > > > > Hello PETSc users, > > > > I'm getting an INFOG(1)=-9 and INFO(2)=27 error on an eigenvalue code based on dolfinx run in a docker container. Based on https://mumps-solver.org/doc/userguide_5.5.1.pdf, I figured the fix would be to increase ICNTL(14). > > > > I'm coding in python through the petsc4py/slepc4py wrapper. I found a Mat.setMumpsIcntl method but I can't seem to place it properly and always obtain another error : "Operation done in wrong order and the like". > > > > Here's the code snippet that is failing : > > # Solver > > EPS = SLEPc.EPS().create(COMM_WORLD) > > EPS.setOperators(-A,M) # Solve Ax=sigma*Mx > > EPS.setProblemType(SLEPc.EPS.ProblemType.PGNHEP) # Specify that A is not hermitian, but M is semi-definite > > EPS.setWhichEigenpairs(EPS.Which.TARGET_MAGNITUDE) # Find eigenvalues close to sigma > > EPS.setTarget(sigma) > > EPS.setDimensions(2,10) # Find k eigenvalues only with max number of Lanczos vectors > > EPS.setTolerances(1e-9,100) # Set absolute tolerance and number of iterations > > # Spectral transform > > ST = EPS.getST(); ST.setType('sinvert') > > # Krylov subspace > > KSP = ST.getKSP() > > KSP.setTolerances(rtol=1e-6, atol=1e-9, max_it=100) > > # Krylov subspace > > KSP.setType('preonly') > > # Preconditioner > > PC = KSP.getPC(); PC.setType('lu') > > PC.setFactorSolverType('mumps') > > KSP.setFromOptions() > > EPS.setFromOptions() > > PC.getFactorMatrix().setMumpsIcntl(14,50) > > print(f"Solver launch for sig={sigma:.1f}...",flush=True) > > EPS.solve() > > n=EPS.getConverged() > > > > For context, matrix A is complex, size 500k x 500k but AIJ sparse, and I'm running this code on 36 nodes. > > > > I'd appreciate any insight on how to fix this issue, it's not clear to me what the order of operations should be. Funnily enough, it's very shift-dependent. > > > > Cheers, > > > > Quentin > From jonas.lundgren at liu.se Fri Jan 20 18:21:20 2023 From: jonas.lundgren at liu.se (Jonas Lundgren) Date: Sat, 21 Jan 2023 00:21:20 +0000 Subject: [petsc-users] Using PCREDISTRIBUTE together with PCFIELDSPLIT Message-ID: Hi! (Sorry for a long message, I have tried to cover the essentials only. I am happy to provide further details and logs if necessary, but I have tried to keep it as short as possible.) I am trying to solve a Stokes flow problem with PCFIELDSPLIT as preconditioner (and KSPBCGS as solver). I have successfully set up the preconditioner and solved several examples by applying the following options when setting up the solver: // Preliminaries KSP ksp; PC pc; KSPCreate(PETSC_COMM_WORLD, ksp); KSPSetType(ksp, KSPBCGS); KSPSetFromOptions(ksp); // here, "-pc_type redistribute" is read from command line KSPSetOperators(ksp, K, K); KSPGetPC(ksp, &pc); // Preconditioner PCSetType(pc, PCFIELDSPLIT); PCFieldSplitSetType(pc, PC_COMPOSITE_SCHUR); PCFieldSplitSetSchurFactType(pc, PC_FIELDSPLIT_SCHUR_FACT_LOWER); PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELFP, K); PCFieldSplitSetIS(pc,"0",isu); where "isu" in the last row is an IS containing all flow indices (no pressure indices), and "K" is my system matrix, which is created using DMCreateMatrix(dm_state, &K); and "dm_state" is a DMStag object. (However, K is not assembled yet.) I want to try to use PCREDISTRIBUTE on top of this, since I believe that I have a lot of degrees-of-freedom (DOF) locked, and if I can reduce the size of my linear problem, the solution time can decrease as well. My idea was to use PCREDISTRIBUTE as the main preconditioner (with KSPPREONLY as the solver), and to move the KSPBCGS and PCFIELDSPLIT down one level, to act as a sub-KSP and sub-PC, by introducing the following between the two code blocks: PC ipc; PCRedistributeGetKSP(pc, &iksp); KSPGetPC(iksp, &ipc); and letting "ipc" replace "pc" in the second code block ("Preconditioner"). The two last rows of in the "Preconditioner" block refers to "K" and "isu", which are defined in terms of the entire system, not the reduced. I believe that I have to replace these two variables with a Mat ("A") that has locked DOFs removed, and an IS ("isfree") that contains only free flow DOF indices (no locked flow indices and no pressure indices). And here comes my problem: the Mat "A" and the IS "isfree" will only be available when the original KSP ("ksp") - or possibly the PC "pc" - are set up (using KSPSetUp(ksp); or PCSetUp(pc); ). I have no way of knowing how "A" or "isfree" looks before it is automatically extracted from "K" and "isu" during set up - they will be different depending on the specifics of the example I am running. I think I can solve the matrix problem by inserting the following (instead of the second to last row above): PCFieldSplitSetSchurPre(ipc,PC_FIELDSPLIT_SCHUR_PRE_SELFP, ipc->mat); Even though "ipc->mat" is not created yet, I believe that this might lead to the use of the extracted sub-flow-matrix as (a part of the) preconditioner for the flow block in PCFIELDSPLIT. I do not know for sure, since I have other issues that have prevented me from testing this approach. Does this sound reasonable? (One might come up with a better alternative than the system matrix to use for this sub-PC, but that work is yet to be done...) Furthermore, the second issue I have is regarding the IS "isfree". After the "Preconditioner" code block I am trying to set the Fieldsplit sub-PCs by calling PCFieldSplitSchurGetSubKSP(); It appears as if I need to set up the outer KSP "ksp" or PC "pc" before I call PCFieldSplitSchurGetSubKSP(); or otherwise I get an error message looking like: [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call KSPSetUp() or PCSetUp() before calling PCFieldSplitSchurGetSubKSP() However, when I try to call KSPSetUp() as suggested above (which is possible thanks to assembling the system matrix before the call to set up the KSP), I just get another error message indicating that I have the wrong IS ("isu") (most likely because the indices "isu" refers to the entire system, not the reduced): [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: Index 0's value 3864 is larger than maximum given 2334 [1]PETSC ERROR: #1 ISComplement() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/vec/is/is/utils/iscoloring.c:804 [1]PETSC ERROR: #2 PCFieldSplitSetDefaults() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:544 [1]PETSC ERROR: #3 PCSetUp_FieldSplit() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:588 [1]PETSC ERROR: #4 PCSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 [1]PETSC ERROR: #5 KSPSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 [1]PETSC ERROR: #6 PCSetUp_Redistribute() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/redistribute/redistribute.c:240 [1]PETSC ERROR: #7 PCSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 [1]PETSC ERROR: #8 KSPSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 Okay, so now I want to obtain the reduced IS "isfree" that contains only free flow DOFs (not locked flow DOFs or pressure DOFs). One option is to try to obtain it from the same IS that extracts the sub-matrix when setting up the PCREDISTRIBUTE preconditioner, see red->is in MatCreateSubMatrix() in https://petsc.org/main/src/ksp/pc/impls/redistribute/redistribute.c.html And annoyingly, I need to have called PCSetUp(pc) or KSPSetUp(ksp) before using red->is (thus, before calling PCFieldSplitSetIS() ), which is something that is impossible it seems: [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Unhandled case, must have at least two fields, not 1 [0]PETSC ERROR: #1 PCFieldSplitSetDefaults() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:550 [0]PETSC ERROR: #2 PCSetUp_FieldSplit() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:588 [0]PETSC ERROR: #3 PCSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 [0]PETSC ERROR: #4 KSPSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 [0]PETSC ERROR: #5 PCSetUp_Redistribute() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/redistribute/redistribute.c:240 [0]PETSC ERROR: #6 PCSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 [0]PETSC ERROR: #7 KSPSetUp() at /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 So, to summarize, I seem to be stuck in a Catch 22; I cannot set up the KSP ("ksp") before I have the correct IS ("isfree") set to my Fieldsplit precondtitioner, but I cannot obtain the correct IS preconditioner ("isfree") before I set up the (outer) KSP ("ksp"). Is there any possible ideas on how to resolve this issue? Do I have to hard code the IS "isfree" in order to achieve what I want? (I don't know how...) Or maybe this approach is wrong all together. Can I achieve my goal of using PCFIELDSPLIT and PCREDISTRIBUTE together in another way? Is this even a sane way of looking at this problem? Best regards, Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 23 08:23:19 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 23 Jan 2023 09:23:19 -0500 Subject: [petsc-users] Using PCREDISTRIBUTE together with PCFIELDSPLIT In-Reply-To: References: Message-ID: On Mon, Jan 23, 2023 at 9:18 AM Jonas Lundgren via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi! > > > > (Sorry for a long message, I have tried to cover the essentials only. I am > happy to provide further details and logs if necessary, but I have tried to > keep it as short as possible.) > > > > I am trying to solve a Stokes flow problem with PCFIELDSPLIT as > preconditioner (and KSPBCGS as solver). I have successfully set up the > preconditioner and solved several examples by applying the following > options when setting up the solver: > > > > // Preliminaries > > KSP ksp; > > PC pc; > > KSPCreate(PETSC_COMM_WORLD, ksp); > > KSPSetType(ksp, KSPBCGS); > > KSPSetFromOptions(ksp); // here, ?-pc_type redistribute? is read from > command line > > KSPSetOperators(ksp, K, K); > > KSPGetPC(ksp, &pc); > > > > // Preconditioner > > PCSetType(pc, PCFIELDSPLIT); > > PCFieldSplitSetType(pc, PC_COMPOSITE_SCHUR); > > PCFieldSplitSetSchurFactType(pc, PC_FIELDSPLIT_SCHUR_FACT_LOWER); > > PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELFP, K); > This seems wrong. K is your system matrix, but the matrix here should be the size of the Schur complement, and furthermore it is overwritten with the Schur complement preconditioner. I think you should be passing NULL. Thanks, Matt > PCFieldSplitSetIS(pc,"0",isu); > > > > where ?isu? in the last row is an IS containing all flow indices (no > pressure indices), and ?K? is my system matrix, which is created using > DMCreateMatrix(dm_state, &K); and ?dm_state? is a DMStag object. (However, > K is not assembled yet.) > > > > I want to try to use PCREDISTRIBUTE on top of this, since I believe that I > have a lot of degrees-of-freedom (DOF) locked, and if I can reduce the size > of my linear problem, the solution time can decrease as well. My idea was > to use PCREDISTRIBUTE as the main preconditioner (with KSPPREONLY as the > solver), and to move the KSPBCGS and PCFIELDSPLIT down one level, to act as > a sub-KSP and sub-PC, by introducing the following between the two code > blocks: > > PC ipc; > > PCRedistributeGetKSP(pc, &iksp); > > KSPGetPC(iksp, &ipc); > > and letting ?ipc? replace ?pc? in the second code block (?Preconditioner?). > > > > The two last rows of in the ?Preconditioner? block refers to ?K? and > ?isu?, which are defined in terms of the entire system, not the reduced. I > believe that I have to replace these two variables with a Mat (?A?) that > has locked DOFs removed, and an IS (?isfree?) that contains only free flow > DOF indices (no locked flow indices and no pressure indices). > > > > And here comes my problem: the Mat ?A? and the IS ?isfree? will only be > available when the original KSP (?ksp?) ? or possibly the PC ?pc? ? are set > up (using KSPSetUp(ksp); or PCSetUp(pc); ). I have no way of knowing how > ?A? or ?isfree? looks before it is automatically extracted from ?K? and > ?isu? during set up ? they will be different depending on the specifics of > the example I am running. > > > > I think I can solve the matrix problem by inserting the following (instead > of the second to last row above): > > PCFieldSplitSetSchurPre(ipc,PC_FIELDSPLIT_SCHUR_PRE_SELFP, ipc->mat); > > Even though ?ipc->mat? is not created yet, I believe that this might lead > to the use of the extracted sub-flow-matrix as (a part of the) > preconditioner for the flow block in PCFIELDSPLIT. I do not know for sure, > since I have other issues that have prevented me from testing this > approach. Does this sound reasonable? > > > > (One might come up with a better alternative than the system matrix to use > for this sub-PC, but that work is yet to be done?) > > > > Furthermore, the second issue I have is regarding the IS ?isfree?. After > the ?Preconditioner? code block I am trying to set the Fieldsplit sub-PCs > by calling PCFieldSplitSchurGetSubKSP(); It appears as if I need to set up > the outer KSP ?ksp? or PC ?pc? before I call PCFieldSplitSchurGetSubKSP(); > or otherwise I get an error message looking like: > > [0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Must call KSPSetUp() or PCSetUp() before calling > PCFieldSplitSchurGetSubKSP() > > > > However, when I try to call KSPSetUp() as suggested above (which is > possible thanks to assembling the system matrix before the call to set up > the KSP), I just get another error message indicating that I have the wrong > IS (?isu?) (most likely because the indices ?isu? refers to the entire > system, not the reduced): > > [1]PETSC ERROR: Argument out of range > > [1]PETSC ERROR: Index 0's value 3864 is larger than maximum given 2334 > > [1]PETSC ERROR: #1 ISComplement() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/vec/is/is/utils/iscoloring.c:804 > > [1]PETSC ERROR: #2 PCFieldSplitSetDefaults() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:544 > > [1]PETSC ERROR: #3 PCSetUp_FieldSplit() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:588 > > [1]PETSC ERROR: #4 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [1]PETSC ERROR: #5 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > [1]PETSC ERROR: #6 PCSetUp_Redistribute() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/redistribute/redistribute.c:240 > > [1]PETSC ERROR: #7 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [1]PETSC ERROR: #8 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > > > Okay, so now I want to obtain the reduced IS ?isfree? that contains only > free flow DOFs (not locked flow DOFs or pressure DOFs). One option is to > try to obtain it from the same IS that extracts the sub-matrix when setting > up the PCREDISTRIBUTE preconditioner, see red->is in MatCreateSubMatrix() > in > https://petsc.org/main/src/ksp/pc/impls/redistribute/redistribute.c.html > > > > And annoyingly, I need to have called PCSetUp(pc) or KSPSetUp(ksp) before > using red->is (thus, before calling PCFieldSplitSetIS() ), which is > something that is impossible it seems: > > [0]PETSC ERROR: Petsc has generated inconsistent data > > [0]PETSC ERROR: Unhandled case, must have at least two fields, not 1 > > [0]PETSC ERROR: #1 PCFieldSplitSetDefaults() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:550 > > [0]PETSC ERROR: #2 PCSetUp_FieldSplit() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:588 > > [0]PETSC ERROR: #3 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [0]PETSC ERROR: #4 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > [0]PETSC ERROR: #5 PCSetUp_Redistribute() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/redistribute/redistribute.c:240 > > [0]PETSC ERROR: #6 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [0]PETSC ERROR: #7 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > > > So, to summarize, I seem to be stuck in a Catch 22; I cannot set up the > KSP (?ksp?) before I have the correct IS (?isfree?) set to my Fieldsplit > precondtitioner, but I cannot obtain the correct IS preconditioner > (?isfree?) before I set up the (outer) KSP (?ksp?). > > > > Is there any possible ideas on how to resolve this issue? Do I have to > hard code the IS ?isfree? in order to achieve what I want? (I don?t know > how?) Or maybe this approach is wrong all together. Can I achieve my goal > of using PCFIELDSPLIT and PCREDISTRIBUTE together in another way? > > > > Is this even a sane way of looking at this problem? > > > > Best regards, > > Jonas Lundgren > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Jan 23 09:28:45 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 23 Jan 2023 16:28:45 +0100 Subject: [petsc-users] MUMPS icntl for petsc4py In-Reply-To: References: Message-ID: Here is the explanation. With shift-and-invert, two main things must be done at STSetUp: 1) build the matrix A-sigma*B, (2) factorize it. Normally this is done at the beginning of EPSSolve. Before that you can set PC options, but the problem is that MUMPS options belong to Mat, not PC, so step 1) must be done beforehand. But you cannot call PCSetUp because you have not yet configured MUMPS options. Around version 3.12 we split the implementation of STSetUp so that 1) and 2) can be done separately. STGetOperator is what triggers 1). Jose > El 23 ene 2023, a las 13:13, Quentin Chevalier escribi?: > > Many thanks Jose, it works beautifully ! > > I'm at a loss as to why, but thanks for the quick fix ! > > Quentin > > > > Quentin CHEVALIER ? IA parcours recherche > > LadHyX - Ecole polytechnique > > __________ > > > > On Sun, 22 Jan 2023 at 10:58, Jose E. Roman wrote: >> >> You have to call ST.getOperator() as is done in this C example: >> https://slepc.upv.es/documentation/current/src/eps/tutorials/ex43.c.html >> >> Jose >> >> >>> El 22 ene 2023, a las 10:40, Quentin Chevalier escribi?: >>> >>> Hello PETSc users, >>> >>> I'm getting an INFOG(1)=-9 and INFO(2)=27 error on an eigenvalue code based on dolfinx run in a docker container. Based on https://mumps-solver.org/doc/userguide_5.5.1.pdf, I figured the fix would be to increase ICNTL(14). >>> >>> I'm coding in python through the petsc4py/slepc4py wrapper. I found a Mat.setMumpsIcntl method but I can't seem to place it properly and always obtain another error : "Operation done in wrong order and the like". >>> >>> Here's the code snippet that is failing : >>> # Solver >>> EPS = SLEPc.EPS().create(COMM_WORLD) >>> EPS.setOperators(-A,M) # Solve Ax=sigma*Mx >>> EPS.setProblemType(SLEPc.EPS.ProblemType.PGNHEP) # Specify that A is not hermitian, but M is semi-definite >>> EPS.setWhichEigenpairs(EPS.Which.TARGET_MAGNITUDE) # Find eigenvalues close to sigma >>> EPS.setTarget(sigma) >>> EPS.setDimensions(2,10) # Find k eigenvalues only with max number of Lanczos vectors >>> EPS.setTolerances(1e-9,100) # Set absolute tolerance and number of iterations >>> # Spectral transform >>> ST = EPS.getST(); ST.setType('sinvert') >>> # Krylov subspace >>> KSP = ST.getKSP() >>> KSP.setTolerances(rtol=1e-6, atol=1e-9, max_it=100) >>> # Krylov subspace >>> KSP.setType('preonly') >>> # Preconditioner >>> PC = KSP.getPC(); PC.setType('lu') >>> PC.setFactorSolverType('mumps') >>> KSP.setFromOptions() >>> EPS.setFromOptions() >>> PC.getFactorMatrix().setMumpsIcntl(14,50) >>> print(f"Solver launch for sig={sigma:.1f}...",flush=True) >>> EPS.solve() >>> n=EPS.getConverged() >>> >>> For context, matrix A is complex, size 500k x 500k but AIJ sparse, and I'm running this code on 36 nodes. >>> >>> I'd appreciate any insight on how to fix this issue, it's not clear to me what the order of operations should be. Funnily enough, it's very shift-dependent. >>> >>> Cheers, >>> >>> Quentin >> From junchao.zhang at gmail.com Mon Jan 23 09:34:35 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 23 Jan 2023 09:34:35 -0600 Subject: [petsc-users] Performance problem using COO interface In-Reply-To: References: Message-ID: Hi, Philip, It looks the performance of MatPtAP is pretty bad. There are a lot of issues with PtAP, which I am going to address. MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 10:55 AM Fackler, Philip via petsc-users < petsc-users at mcs.anl.gov> wrote: > The following is the log_view output for the ported case using 4 MPI tasks. > > **************************************************************************************************************************************************************** > > *** WIDEN YOUR WINDOW TO 160 CHARACTERS. > Use 'enscript -r -fCourier9' to print this document > *** > > **************************************************************************************************************************************************************** > > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20 > 11:53:04 2023 > Using Petsc Release Version 3.18.3, unknown > > Max Max/Min Avg Total > Time (sec): 1.447e+01 1.000 1.447e+01 > Objects: 1.229e+03 1.003 1.226e+03 > Flops: 5.053e+09 1.217 4.593e+09 1.837e+10 > Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09 > MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04 > MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08 > MPI Reductions: 2.065e+03 1.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04 > 100.0% 3.727e+03 100.0% 2.046e+03 99.1% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 > 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 > 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 > 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 > 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 > 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 > 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 > 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 > 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 > 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 > 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423 > -nan 0 0.00e+00 0 0.00e+00 99 > > TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 > 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 > 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan > -nan 0 0.00e+00 0 0.00e+00 87 > > MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 > 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 > 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 > 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 > 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 99 > > MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 > 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 > 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 > 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 > 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 > 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 > 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 > 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 > 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 > 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 > 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506 > -nan 0 0.00e+00 0 0.00e+00 99 > > KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 > 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 > 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419 > -nan 0 0.00e+00 0 0.00e+00 99 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 > 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 > 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan > -nan 0 0.00e+00 0 0.00e+00 87 > > SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 > 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 > 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan > -nan 0 0.00e+00 0 0.00e+00 96 > > PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 > 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 > 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 > 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 > 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 > 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 > 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 > 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 > 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 43 > > PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 > 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 > 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 > 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 > 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 > 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 > 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 > 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 > 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 > 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 > 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan > -nan 0 0.00e+00 0 0.00e+00 96 > > PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 > 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan > -nan 0 0.00e+00 0 0.00e+00 99 > > KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 > 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 14 14 > Distributed Mesh 9 9 > Index Set 120 120 > IS L to G Mapping 10 10 > Star Forest Graph 87 87 > Discrete System 9 9 > Weak Form 9 9 > Vector 761 761 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 11 11 > DMKSP interface 1 1 > Matrix 171 171 > Matrix Coarsen 3 3 > Preconditioner 11 11 > Viewer 2 1 > PetscRandom 3 3 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.82e-08 > Average time for MPI_Barrier(): 2.2968e-06 > Average time for zero size MPI_Send(): 3.371e-06 > #PETSc Option Table entries: > -log_view > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home2/4pf/petsc > PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial > --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 > --with-shared-libraries --with-64-bit-indices --with-debugging=0 > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 > --with-kokkos-dir=/home2/4pf/.local/serial > --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack > > ----------------------------------------- > Libraries compiled on 2023-01-06 18:21:31 on iguazu > Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28 > Using PETSc directory: /home2/4pf/.local/serial > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home2/4pf/.local/serial/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib > -L/home2/4pf/.local/serial/lib -lpetsc > -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 > -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib > -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm > -lX11 -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > --- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Zhang, Junchao > *Sent:* Tuesday, January 17, 2023 17:25 > *To:* Fackler, Philip ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Cc:* Mills, Richard Tran ; Blondel, Sophie < > sblondel at utk.edu>; Roth, Philip > *Subject:* [EXTERNAL] Re: Performance problem using COO interface > > Hi, Philip, > Could you add -log_view and see what functions are used in the solve? > Since it is CPU-only, perhaps with -log_view of different runs, we can > easily see which functions slowed down. > > --Junchao Zhang > ------------------------------ > *From:* Fackler, Philip > *Sent:* Tuesday, January 17, 2023 4:13 PM > *To:* xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Cc:* Mills, Richard Tran ; Zhang, Junchao < > jczhang at mcs.anl.gov>; Blondel, Sophie ; Roth, Philip < > rothpc at ornl.gov> > *Subject:* Performance problem using COO interface > > In Xolotl's feature-petsc-kokkos branch I have ported the code to use > petsc's COO interface for creating the Jacobian matrix (and the Kokkos > interface for interacting with Vec entries). As the attached plots show for > one case, while the code for computing the RHSFunction and RHSJacobian > perform similarly (or slightly better) after the port, the performance for > the solve as a whole is significantly worse. > > Note: > This is all CPU-only (so kokkos and kokkos-kernels are built with only the > serial backend). > The dev version is using MatSetValuesStencil with the default > implementations for Mat and Vec. > The port version is using MatSetValuesCOO and is run with -dm_mat_type > aijkokkos -dm_vec_type kokkos?. > The port/def version is using MatSetValuesCOO and is run with -dm_vec_type > kokkos? (using the default Mat implementation). > > So, this seems to be due be a performance difference in the petsc > implementations. Please advise. Is this a known issue? Or am I missing > something? > > Thank you for the help, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Mon Jan 23 09:52:08 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Mon, 23 Jan 2023 15:52:08 +0000 Subject: [petsc-users] [EXTERNAL] Re: Performance problem using COO interface In-Reply-To: References: Message-ID: Thank you for looking into that. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Monday, January 23, 2023 10:34 To: Fackler, Philip Cc: Zhang, Junchao ; xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov ; Blondel, Sophie ; Roth, Philip Subject: [EXTERNAL] Re: [petsc-users] Performance problem using COO interface Hi, Philip, It looks the performance of MatPtAP is pretty bad. There are a lot of issues with PtAP, which I am going to address. MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 10:55 AM Fackler, Philip via petsc-users > wrote: The following is the log_view output for the ported case using 4 MPI tasks. **************************************************************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20 11:53:04 2023 Using Petsc Release Version 3.18.3, unknown Max Max/Min Avg Total Time (sec): 1.447e+01 1.000 1.447e+01 Objects: 1.229e+03 1.003 1.226e+03 Flops: 5.053e+09 1.217 4.593e+09 1.837e+10 Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09 MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04 MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08 MPI Reductions: 2.065e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04 100.0% 3.727e+03 100.0% 2.046e+03 99.1% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan -nan 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423 -nan 0 0.00e+00 0 0.00e+00 99 TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan -nan 0 0.00e+00 0 0.00e+00 87 MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 99 MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506 -nan 0 0.00e+00 0 0.00e+00 99 KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419 -nan 0 0.00e+00 0 0.00e+00 99 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan -nan 0 0.00e+00 0 0.00e+00 87 SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan -nan 0 0.00e+00 0 0.00e+00 96 PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 100 GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 43 PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan -nan 0 0.00e+00 0 0.00e+00 96 PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan -nan 0 0.00e+00 0 0.00e+00 99 KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan -nan 0 0.00e+00 0 0.00e+00 100 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 14 14 Distributed Mesh 9 9 Index Set 120 120 IS L to G Mapping 10 10 Star Forest Graph 87 87 Discrete System 9 9 Weak Form 9 9 Vector 761 761 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 11 11 DMKSP interface 1 1 Matrix 171 171 Matrix Coarsen 3 3 Preconditioner 11 11 Viewer 2 1 PetscRandom 3 3 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.82e-08 Average time for MPI_Barrier(): 2.2968e-06 Average time for zero size MPI_Send(): 3.371e-06 #PETSc Option Table entries: -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home2/4pf/petsc PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 --with-shared-libraries --with-64-bit-indices --with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-kokkos-dir=/home2/4pf/.local/serial --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack ----------------------------------------- Libraries compiled on 2023-01-06 18:21:31 on iguazu Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28 Using PETSc directory: /home2/4pf/.local/serial Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home2/4pf/.local/serial/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lpetsc -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm -lX11 -lquadmath -lstdc++ -ldl ----------------------------------------- --- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Zhang, Junchao > Sent: Tuesday, January 17, 2023 17:25 To: Fackler, Philip >; xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov > Cc: Mills, Richard Tran >; Blondel, Sophie >; Roth, Philip > Subject: [EXTERNAL] Re: Performance problem using COO interface Hi, Philip, Could you add -log_view and see what functions are used in the solve? Since it is CPU-only, perhaps with -log_view of different runs, we can easily see which functions slowed down. --Junchao Zhang ________________________________ From: Fackler, Philip > Sent: Tuesday, January 17, 2023 4:13 PM To: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov > Cc: Mills, Richard Tran >; Zhang, Junchao >; Blondel, Sophie >; Roth, Philip > Subject: Performance problem using COO interface In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos?. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos? (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 23 11:46:06 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 23 Jan 2023 12:46:06 -0500 Subject: [petsc-users] Using PCREDISTRIBUTE together with PCFIELDSPLIT In-Reply-To: References: Message-ID: On Mon, Jan 23, 2023 at 9:18 AM Jonas Lundgren via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi! > > > > (Sorry for a long message, I have tried to cover the essentials only. I am > happy to provide further details and logs if necessary, but I have tried to > keep it as short as possible.) > > > > I am trying to solve a Stokes flow problem with PCFIELDSPLIT as > preconditioner (and KSPBCGS as solver). I have successfully set up the > preconditioner and solved several examples by applying the following > options when setting up the solver: > > > > // Preliminaries > > KSP ksp; > > PC pc; > > KSPCreate(PETSC_COMM_WORLD, ksp); > > KSPSetType(ksp, KSPBCGS); > > KSPSetFromOptions(ksp); // here, ?-pc_type redistribute? is read from > command line > > KSPSetOperators(ksp, K, K); > > KSPGetPC(ksp, &pc); > > > > // Preconditioner > > PCSetType(pc, PCFIELDSPLIT); > > PCFieldSplitSetType(pc, PC_COMPOSITE_SCHUR); > > PCFieldSplitSetSchurFactType(pc, PC_FIELDSPLIT_SCHUR_FACT_LOWER); > > PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELFP, K); > > PCFieldSplitSetIS(pc,"0",isu); > > > > where ?isu? in the last row is an IS containing all flow indices (no > pressure indices), and ?K? is my system matrix, which is created using > DMCreateMatrix(dm_state, &K); and ?dm_state? is a DMStag object. (However, > K is not assembled yet.) > > > > I want to try to use PCREDISTRIBUTE on top of this, since I believe that I > have a lot of degrees-of-freedom (DOF) locked, and if I can reduce the size > of my linear problem, the solution time can decrease as well. > Let me reply to the substance of the question. I will restate it to make sure I understand. You would like to eliminate the diagonal rows/cols from your operator before the solve, which PCREDISTRIBUTE can do. So you could have -ksp_type preonly -pc_type redistribute -redistribute_ksp_type bcgs -redistribute_pc_type fieldsplit -redistribute_pc_fieldsplit_detect_saddle_point -redistribute_pc_fieldsplit_type schur -redistribute_pc_fieldsplit_schur_fact_type lower -redistribute_pc_fieldsplit_schur_precondition selfp Using options solves the ordering problems you are having in setup and this is what we recommend. Satisfying the ordering constraints by hand, rather than programmatically, is very hard. Also note the selfp is not a good preconditioner for Stokes. The Schur complement looks like div (lap)^{-1} grad ~ id so you want the mass matrix as the preconditioning matrix, not the diagonally weighted pressure Laplacian, which is what you get with selfp. What you can do is make a preconditioning matrix for the whole problem that looks like / A B \ \ C M / where M is the mass matrix, or leave out B and C and use -pc_amat to cleverly get them from the system matrix. You can see this in SNES ex69. Thanks, Matt > My idea was to use PCREDISTRIBUTE as the main preconditioner (with > KSPPREONLY as the solver), and to move the KSPBCGS and PCFIELDSPLIT down > one level, to act as a sub-KSP and sub-PC, by introducing the following > between the two code blocks: > > PC ipc; > > PCRedistributeGetKSP(pc, &iksp); > > KSPGetPC(iksp, &ipc); > > and letting ?ipc? replace ?pc? in the second code block (?Preconditioner?). > > > > The two last rows of in the ?Preconditioner? block refers to ?K? and > ?isu?, which are defined in terms of the entire system, not the reduced. I > believe that I have to replace these two variables with a Mat (?A?) that > has locked DOFs removed, and an IS (?isfree?) that contains only free flow > DOF indices (no locked flow indices and no pressure indices). > > > > And here comes my problem: the Mat ?A? and the IS ?isfree? will only be > available when the original KSP (?ksp?) ? or possibly the PC ?pc? ? are set > up (using KSPSetUp(ksp); or PCSetUp(pc); ). I have no way of knowing how > ?A? or ?isfree? looks before it is automatically extracted from ?K? and > ?isu? during set up ? they will be different depending on the specifics of > the example I am running. > > > > I think I can solve the matrix problem by inserting the following (instead > of the second to last row above): > > PCFieldSplitSetSchurPre(ipc,PC_FIELDSPLIT_SCHUR_PRE_SELFP, ipc->mat); > > Even though ?ipc->mat? is not created yet, I believe that this might lead > to the use of the extracted sub-flow-matrix as (a part of the) > preconditioner for the flow block in PCFIELDSPLIT. I do not know for sure, > since I have other issues that have prevented me from testing this > approach. Does this sound reasonable? > > > > (One might come up with a better alternative than the system matrix to use > for this sub-PC, but that work is yet to be done?) > > > > Furthermore, the second issue I have is regarding the IS ?isfree?. After > the ?Preconditioner? code block I am trying to set the Fieldsplit sub-PCs > by calling PCFieldSplitSchurGetSubKSP(); It appears as if I need to set up > the outer KSP ?ksp? or PC ?pc? before I call PCFieldSplitSchurGetSubKSP(); > or otherwise I get an error message looking like: > > [0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Must call KSPSetUp() or PCSetUp() before calling > PCFieldSplitSchurGetSubKSP() > > > > However, when I try to call KSPSetUp() as suggested above (which is > possible thanks to assembling the system matrix before the call to set up > the KSP), I just get another error message indicating that I have the wrong > IS (?isu?) (most likely because the indices ?isu? refers to the entire > system, not the reduced): > > [1]PETSC ERROR: Argument out of range > > [1]PETSC ERROR: Index 0's value 3864 is larger than maximum given 2334 > > [1]PETSC ERROR: #1 ISComplement() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/vec/is/is/utils/iscoloring.c:804 > > [1]PETSC ERROR: #2 PCFieldSplitSetDefaults() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:544 > > [1]PETSC ERROR: #3 PCSetUp_FieldSplit() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:588 > > [1]PETSC ERROR: #4 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [1]PETSC ERROR: #5 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > [1]PETSC ERROR: #6 PCSetUp_Redistribute() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/redistribute/redistribute.c:240 > > [1]PETSC ERROR: #7 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [1]PETSC ERROR: #8 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > > > Okay, so now I want to obtain the reduced IS ?isfree? that contains only > free flow DOFs (not locked flow DOFs or pressure DOFs). One option is to > try to obtain it from the same IS that extracts the sub-matrix when setting > up the PCREDISTRIBUTE preconditioner, see red->is in MatCreateSubMatrix() > in > https://petsc.org/main/src/ksp/pc/impls/redistribute/redistribute.c.html > > > > And annoyingly, I need to have called PCSetUp(pc) or KSPSetUp(ksp) before > using red->is (thus, before calling PCFieldSplitSetIS() ), which is > something that is impossible it seems: > > [0]PETSC ERROR: Petsc has generated inconsistent data > > [0]PETSC ERROR: Unhandled case, must have at least two fields, not 1 > > [0]PETSC ERROR: #1 PCFieldSplitSetDefaults() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:550 > > [0]PETSC ERROR: #2 PCSetUp_FieldSplit() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/fieldsplit/fieldsplit.c:588 > > [0]PETSC ERROR: #3 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [0]PETSC ERROR: #4 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > [0]PETSC ERROR: #5 PCSetUp_Redistribute() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/impls/redistribute/redistribute.c:240 > > [0]PETSC ERROR: #6 PCSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/pc/interface/precon.c:994 > > [0]PETSC ERROR: #7 KSPSetUp() at > /mnt/c/Users/jonlu89/install/petsc-v3-18-1/src/ksp/ksp/interface/itfunc.c:406 > > > > So, to summarize, I seem to be stuck in a Catch 22; I cannot set up the > KSP (?ksp?) before I have the correct IS (?isfree?) set to my Fieldsplit > precondtitioner, but I cannot obtain the correct IS preconditioner > (?isfree?) before I set up the (outer) KSP (?ksp?). > > > > Is there any possible ideas on how to resolve this issue? Do I have to > hard code the IS ?isfree? in order to achieve what I want? (I don?t know > how?) Or maybe this approach is wrong all together. Can I achieve my goal > of using PCFIELDSPLIT and PCREDISTRIBUTE together in another way? > > > > Is this even a sane way of looking at this problem? > > > > Best regards, > > Jonas Lundgren > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From quentin.chevalier at polytechnique.edu Mon Jan 23 15:41:52 2023 From: quentin.chevalier at polytechnique.edu (Quentin Chevalier) Date: Mon, 23 Jan 2023 22:41:52 +0100 Subject: [petsc-users] MUMPS icntl for petsc4py In-Reply-To: References: Message-ID: Makes sense. Thanks again ! On Mon, 23 Jan 2023, 16:28 Jose E. Roman, wrote: > Here is the explanation. With shift-and-invert, two main things must be > done at STSetUp: 1) build the matrix A-sigma*B, (2) factorize it. Normally > this is done at the beginning of EPSSolve. Before that you can set PC > options, but the problem is that MUMPS options belong to Mat, not PC, so > step 1) must be done beforehand. But you cannot call PCSetUp because you > have not yet configured MUMPS options. Around version 3.12 we split the > implementation of STSetUp so that 1) and 2) can be done separately. > STGetOperator is what triggers 1). > > Jose > > > > El 23 ene 2023, a las 13:13, Quentin Chevalier < > quentin.chevalier at polytechnique.edu> escribi?: > > > > Many thanks Jose, it works beautifully ! > > > > I'm at a loss as to why, but thanks for the quick fix ! > > > > Quentin > > > > > > > > Quentin CHEVALIER ? IA parcours recherche > > > > LadHyX - Ecole polytechnique > > > > __________ > > > > > > > > On Sun, 22 Jan 2023 at 10:58, Jose E. Roman wrote: > >> > >> You have to call ST.getOperator() as is done in this C example: > >> > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex43.c.html > >> > >> Jose > >> > >> > >>> El 22 ene 2023, a las 10:40, Quentin Chevalier < > quentin.chevalier at polytechnique.edu> escribi?: > >>> > >>> Hello PETSc users, > >>> > >>> I'm getting an INFOG(1)=-9 and INFO(2)=27 error on an eigenvalue code > based on dolfinx run in a docker container. Based on > https://mumps-solver.org/doc/userguide_5.5.1.pdf, I figured the fix would > be to increase ICNTL(14). > >>> > >>> I'm coding in python through the petsc4py/slepc4py wrapper. I found a > Mat.setMumpsIcntl method but I can't seem to place it properly and always > obtain another error : "Operation done in wrong order and the like". > >>> > >>> Here's the code snippet that is failing : > >>> # Solver > >>> EPS = SLEPc.EPS().create(COMM_WORLD) > >>> EPS.setOperators(-A,M) # Solve Ax=sigma*Mx > >>> EPS.setProblemType(SLEPc.EPS.ProblemType.PGNHEP) # Specify that A is > not hermitian, but M is semi-definite > >>> EPS.setWhichEigenpairs(EPS.Which.TARGET_MAGNITUDE) # Find eigenvalues > close to sigma > >>> EPS.setTarget(sigma) > >>> EPS.setDimensions(2,10) # Find k eigenvalues only with max number of > Lanczos vectors > >>> EPS.setTolerances(1e-9,100) # Set absolute tolerance and number of > iterations > >>> # Spectral transform > >>> ST = EPS.getST(); ST.setType('sinvert') > >>> # Krylov subspace > >>> KSP = ST.getKSP() > >>> KSP.setTolerances(rtol=1e-6, atol=1e-9, max_it=100) > >>> # Krylov subspace > >>> KSP.setType('preonly') > >>> # Preconditioner > >>> PC = KSP.getPC(); PC.setType('lu') > >>> PC.setFactorSolverType('mumps') > >>> KSP.setFromOptions() > >>> EPS.setFromOptions() > >>> PC.getFactorMatrix().setMumpsIcntl(14,50) > >>> print(f"Solver launch for sig={sigma:.1f}...",flush=True) > >>> EPS.solve() > >>> n=EPS.getConverged() > >>> > >>> For context, matrix A is complex, size 500k x 500k but AIJ sparse, and > I'm running this code on 36 nodes. > >>> > >>> I'd appreciate any insight on how to fix this issue, it's not clear to > me what the order of operations should be. Funnily enough, it's very > shift-dependent. > >>> > >>> Cheers, > >>> > >>> Quentin > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Tue Jan 24 09:39:55 2023 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Tue, 24 Jan 2023 16:39:55 +0100 Subject: [petsc-users] reading and writing periodic DMPlex to file In-Reply-To: References: <83e2b092-5440-e009-ef84-dfde3ff6804d@ovgu.de> Message-ID: <9b91727e-fa6d-09b6-fd42-93e00947cc38@ovgu.de> Dear Matt, I have been working on this now with Petsc-3.18.3 1) I can confirm that enforcing periodicity works for a single core simulation. 2) However, when using multiple cores, the code still hangs. Is there something I should do to fix this? Or should this be fixed in the next Petsc version? 3) This is strange, as it works fine for me. Thanks, best, Berend. On 12/15/22 18:56, Matthew Knepley wrote: > On Wed, Dec 14, 2022 at 3:58 AM Berend van Wachem > > wrote: > > > Dear PETSc team and users, > > I have asked a few times about this before, but we haven't really > gotten > this to work yet. > > In our code, we use the DMPlex framework and are also interested in > periodic geometries. > > As our simulations typically require many time-steps, we would like to > be able to save the DM to file and to read it again to resume the > simulation (a restart). > > Although this works for a non-periodic DM, we haven't been able to get > this to work for a periodic one. To illustrate this, I have made a > working example, consisting of 2 files, createandwrite.c and > readandcreate.c. I have attached these 2 working examples. We are using > Petsc-3.18.2. > > In the first file (createandwrite.c) a DMPlex is created and written to > a file. Periodicity is activated on lines 52-55 of the code. > > In the second file (readandcreate.c) a DMPlex is read from the file. > When a periodic DM is read, this does not work. Also, trying to > 'enforce' periodicity, lines 55 - 66, does not work if the number of > processes is larger than 1 - the code "hangs" without producing an > error. > > Could you indicate what I am missing? I have really tried many > different > options, without finding a solution. > > > Hi Berend, > > There are several problems. I will eventually fix all of them, but I > think we can get this working quickly. > > 1) Periodicity information is not saved. I will fix this, but forcing it > should work. > > 2) You were getting a hang because the blocksize on the local > coordinates was not set correctly after loading > ? ? ?since the vector had zero length. This does not happen in any test > because HDF5 loads a global vector, but > ? ? ?most other things create local coordinates. I have a fix for this, > which I will get in an MR, Also, I moved DMLocalizeCoordinates() > ? ? ?after distribution, since this is where it belongs. > > knepley/fix-plex-periodic-faces *$:/PETSc3/petsc/petsc-dev$ git diff > diff --git a/src/dm/interface/dmcoordinates.c > b/src/dm/interface/dmcoordinates.c > index a922348f95b..6437e9f7259 100644 > --- a/src/dm/interface/dmcoordinates.c > +++ b/src/dm/interface/dmcoordinates.c > @@ -551,10 +551,14 @@ PetscErrorCode DMGetCoordinatesLocalSetUp(DM dm) > ? ?PetscFunctionBegin; > ? ?PetscValidHeaderSpecific(dm, DM_CLASSID, 1); > ? ?if (!dm->coordinates[0].xl && dm->coordinates[0].x) { > - ? ?DM cdm = NULL; > + ? ?DM ? ? ? cdm = NULL; > + ? ?PetscInt bs; > > ? ? ?PetscCall(DMGetCoordinateDM(dm, &cdm)); > ? ? ?PetscCall(DMCreateLocalVector(cdm, &dm->coordinates[0].xl)); > + ? ?// If the size of the vector is 0, it will not get the right block size > + ? ?PetscCall(VecGetBlockSize(dm->coordinates[0].x, &bs)); > + ? ?PetscCall(VecSetBlockSize(dm->coordinates[0].xl, bs)); > ? ? ?PetscCall(PetscObjectSetName((PetscObject)dm->coordinates[0].xl, > "coordinates")); > ? ? ?PetscCall(DMGlobalToLocalBegin(cdm, dm->coordinates[0].x, > INSERT_VALUES, dm->coordinates[0].xl)); > ? ? ?PetscCall(DMGlobalToLocalEnd(cdm, dm->coordinates[0].x, > INSERT_VALUES, dm->coordinates[0].xl)); > > ?3) If I comment out forcing the periodicity, your example does not run > for me. I will try to figure it out > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: SF roots 4400 < pEnd 6000 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling > mistake, etc! > [1]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Option left: name:-start_in_debugger_no (no value) > source: command line > [1]PETSC ERROR: SF roots 4400 < pEnd 6000 > [0]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-494-g16200351da0 > ?GIT Date: 2022-12-12 23:42:20 +0000 > [1]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling > mistake, etc! > [1]PETSC ERROR: Option left: name:-start_in_debugger_no (no value) > source: command line > [0]PETSC ERROR: ./readandcreate on a arch-master-debug named > MacBook-Pro.cable.rcn.com by knepley > Thu Dec 15 12:50:26 2022 > [1]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [0]PETSC ERROR: Configure options --PETSC_ARCH=arch-master-debug > --download-bamg --download-bison --download-chaco --download-ctetgen > --download-egads --download-eigen --download-exodusii --download-fftw > --download-hpddm --download-ks --download-libceed --download-libpng > --download-metis --download-ml --download-mumps --download-muparser > --download-netcdf --download-opencascade --download-p4est > --download-parmetis --download-pnetcdf --download-pragmatic > --download-ptscotch --download-scalapack --download-slepc > --download-suitesparse --download-superlu_dist --download-tetgen > --download-triangle --with-cmake-exec=/PETSc3/petsc/apple/bin/cmake > --with-ctest-exec=/PETSc3/petsc/apple/bin/ctest > --with-hdf5-dir=/PETSc3/petsc/apple --with-mpi-dir=/PETSc3/petsc/apple > --with-petsc4py=1 --with-shared-libraries --with-slepc --with-zlib > [1]PETSC ERROR: Petsc Development GIT revision: v3.18.1-494-g16200351da0 > ?GIT Date: 2022-12-12 23:42:20 +0000 > [0]PETSC ERROR: #1 PetscSectionCreateGlobalSection() at > /PETSc3/petsc/petsc-dev/src/vec/is/section/interface/section.c:1322 > [1]PETSC ERROR: ./readandcreate on a arch-master-debug named > MacBook-Pro.cable.rcn.com by knepley > Thu Dec 15 12:50:26 2022 > [0]PETSC ERROR: #2 DMGetGlobalSection() at > /PETSc3/petsc/petsc-dev/src/dm/interface/dm.c:4527 > [1]PETSC ERROR: Configure options --PETSC_ARCH=arch-master-debug > --download-bamg --download-bison --download-chaco --download-ctetgen > --download-egads --download-eigen --download-exodusii --download-fftw > --download-hpddm --download-ks --download-libceed --download-libpng > --download-metis --download-ml --download-mumps --download-muparser > --download-netcdf --download-opencascade --download-p4est > --download-parmetis --download-pnetcdf --download-pragmatic > --download-ptscotch --download-scalapack --download-slepc > --download-suitesparse --download-superlu_dist --download-tetgen > --download-triangle --with-cmake-exec=/PETSc3/petsc/apple/bin/cmake > --with-ctest-exec=/PETSc3/petsc/apple/bin/ctest > --with-hdf5-dir=/PETSc3/petsc/apple --with-mpi-dir=/PETSc3/petsc/apple > --with-petsc4py=1 --with-shared-libraries --with-slepc --with-zlib > [0]PETSC ERROR: #3 DMPlexSectionLoad_HDF5_Internal() at > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plexhdf5.c:2750 > [1]PETSC ERROR: #1 PetscSectionCreateGlobalSection() at > /PETSc3/petsc/petsc-dev/src/vec/is/section/interface/section.c:1322 > [0]PETSC ERROR: #4 DMPlexSectionLoad() at > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plex.c:2364 > [1]PETSC ERROR: #2 DMGetGlobalSection() at > /PETSc3/petsc/petsc-dev/src/dm/interface/dm.c:4527 > [0]PETSC ERROR: #5 main() at > /Users/knepley/Downloads/tmp/Berend/readandcreate.c:85 > [1]PETSC ERROR: #3 DMPlexSectionLoad_HDF5_Internal() at > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plexhdf5.c:2750 > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -malloc_debug (source: environment) > [1]PETSC ERROR: #4 DMPlexSectionLoad() at > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plex.c:2364 > [1]PETSC ERROR: #5 main() at > /Users/knepley/Downloads/tmp/Berend/readandcreate.c:85 > [0]PETSC ERROR: -start_in_debugger_no (source: command line) > [1]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > [1]PETSC ERROR: -malloc_debug (source: environment) > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > [1]PETSC ERROR: -start_in_debugger_no (source: command line) > [1]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > 4) We now have parallel HDF5 loading, so you should not have to manually > distribute. I will change your example to use it > ? ? ?and send it back when I am done. > > ? Thanks! > > ? ? ?Matt > > Many thanks and kind regards, > Berend. > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Tue Jan 24 10:26:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Jan 2023 11:26:05 -0500 Subject: [petsc-users] reading and writing periodic DMPlex to file In-Reply-To: <9b91727e-fa6d-09b6-fd42-93e00947cc38@ovgu.de> References: <83e2b092-5440-e009-ef84-dfde3ff6804d@ovgu.de> <9b91727e-fa6d-09b6-fd42-93e00947cc38@ovgu.de> Message-ID: On Tue, Jan 24, 2023 at 10:39 AM Berend van Wachem wrote: > Dear Matt, > > I have been working on this now with Petsc-3.18.3 > > 1) I can confirm that enforcing periodicity works for a single core > simulation. > > 2) However, when using multiple cores, the code still hangs. Is there > something I should do to fix this? Or should this be fixed in the next > Petsc version? > Dang dang dang. I forgot to merge this fix. Thanks for reminding me. It is now here: https://gitlab.com/petsc/petsc/-/merge_requests/6001 > 3) This is strange, as it works fine for me. > Will try again with current main. Thanks Matt > Thanks, best, Berend. > > > On 12/15/22 18:56, Matthew Knepley wrote: > > On Wed, Dec 14, 2022 at 3:58 AM Berend van Wachem > > > wrote: > > > > > > Dear PETSc team and users, > > > > I have asked a few times about this before, but we haven't really > > gotten > > this to work yet. > > > > In our code, we use the DMPlex framework and are also interested in > > periodic geometries. > > > > As our simulations typically require many time-steps, we would like > to > > be able to save the DM to file and to read it again to resume the > > simulation (a restart). > > > > Although this works for a non-periodic DM, we haven't been able to > get > > this to work for a periodic one. To illustrate this, I have made a > > working example, consisting of 2 files, createandwrite.c and > > readandcreate.c. I have attached these 2 working examples. We are > using > > Petsc-3.18.2. > > > > In the first file (createandwrite.c) a DMPlex is created and written > to > > a file. Periodicity is activated on lines 52-55 of the code. > > > > In the second file (readandcreate.c) a DMPlex is read from the file. > > When a periodic DM is read, this does not work. Also, trying to > > 'enforce' periodicity, lines 55 - 66, does not work if the number of > > processes is larger than 1 - the code "hangs" without producing an > > error. > > > > Could you indicate what I am missing? I have really tried many > > different > > options, without finding a solution. > > > > > > Hi Berend, > > > > There are several problems. I will eventually fix all of them, but I > > think we can get this working quickly. > > > > 1) Periodicity information is not saved. I will fix this, but forcing it > > should work. > > > > 2) You were getting a hang because the blocksize on the local > > coordinates was not set correctly after loading > > since the vector had zero length. This does not happen in any test > > because HDF5 loads a global vector, but > > most other things create local coordinates. I have a fix for this, > > which I will get in an MR, Also, I moved DMLocalizeCoordinates() > > after distribution, since this is where it belongs. > > > > knepley/fix-plex-periodic-faces *$:/PETSc3/petsc/petsc-dev$ git diff > > diff --git a/src/dm/interface/dmcoordinates.c > > b/src/dm/interface/dmcoordinates.c > > index a922348f95b..6437e9f7259 100644 > > --- a/src/dm/interface/dmcoordinates.c > > +++ b/src/dm/interface/dmcoordinates.c > > @@ -551,10 +551,14 @@ PetscErrorCode DMGetCoordinatesLocalSetUp(DM dm) > > PetscFunctionBegin; > > PetscValidHeaderSpecific(dm, DM_CLASSID, 1); > > if (!dm->coordinates[0].xl && dm->coordinates[0].x) { > > - DM cdm = NULL; > > + DM cdm = NULL; > > + PetscInt bs; > > > > PetscCall(DMGetCoordinateDM(dm, &cdm)); > > PetscCall(DMCreateLocalVector(cdm, &dm->coordinates[0].xl)); > > + // If the size of the vector is 0, it will not get the right block > size > > + PetscCall(VecGetBlockSize(dm->coordinates[0].x, &bs)); > > + PetscCall(VecSetBlockSize(dm->coordinates[0].xl, bs)); > > PetscCall(PetscObjectSetName((PetscObject)dm->coordinates[0].xl, > > "coordinates")); > > PetscCall(DMGlobalToLocalBegin(cdm, dm->coordinates[0].x, > > INSERT_VALUES, dm->coordinates[0].xl)); > > PetscCall(DMGlobalToLocalEnd(cdm, dm->coordinates[0].x, > > INSERT_VALUES, dm->coordinates[0].xl)); > > > > 3) If I comment out forcing the periodicity, your example does not run > > for me. I will try to figure it out > > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: Nonconforming object sizes > > [0]PETSC ERROR: SF roots 4400 < pEnd 6000 > > [1]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > > Could be the program crashed before they were used or a spelling > > mistake, etc! > > [1]PETSC ERROR: Nonconforming object sizes > > [0]PETSC ERROR: Option left: name:-start_in_debugger_no (no value) > > source: command line > > [1]PETSC ERROR: SF roots 4400 < pEnd 6000 > > [0]PETSC ERROR: See https://petsc.org/release/faq/ > > for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-494-g16200351da0 > > GIT Date: 2022-12-12 23:42:20 +0000 > > [1]PETSC ERROR: WARNING! There are option(s) set that were not used! > > Could be the program crashed before they were used or a spelling > > mistake, etc! > > [1]PETSC ERROR: Option left: name:-start_in_debugger_no (no value) > > source: command line > > [0]PETSC ERROR: ./readandcreate on a arch-master-debug named > > MacBook-Pro.cable.rcn.com by knepley > > Thu Dec 15 12:50:26 2022 > > [1]PETSC ERROR: See https://petsc.org/release/faq/ > > for trouble shooting. > > [0]PETSC ERROR: Configure options --PETSC_ARCH=arch-master-debug > > --download-bamg --download-bison --download-chaco --download-ctetgen > > --download-egads --download-eigen --download-exodusii --download-fftw > > --download-hpddm --download-ks --download-libceed --download-libpng > > --download-metis --download-ml --download-mumps --download-muparser > > --download-netcdf --download-opencascade --download-p4est > > --download-parmetis --download-pnetcdf --download-pragmatic > > --download-ptscotch --download-scalapack --download-slepc > > --download-suitesparse --download-superlu_dist --download-tetgen > > --download-triangle --with-cmake-exec=/PETSc3/petsc/apple/bin/cmake > > --with-ctest-exec=/PETSc3/petsc/apple/bin/ctest > > --with-hdf5-dir=/PETSc3/petsc/apple --with-mpi-dir=/PETSc3/petsc/apple > > --with-petsc4py=1 --with-shared-libraries --with-slepc --with-zlib > > [1]PETSC ERROR: Petsc Development GIT revision: v3.18.1-494-g16200351da0 > > GIT Date: 2022-12-12 23:42:20 +0000 > > [0]PETSC ERROR: #1 PetscSectionCreateGlobalSection() at > > /PETSc3/petsc/petsc-dev/src/vec/is/section/interface/section.c:1322 > > [1]PETSC ERROR: ./readandcreate on a arch-master-debug named > > MacBook-Pro.cable.rcn.com by knepley > > Thu Dec 15 12:50:26 2022 > > [0]PETSC ERROR: #2 DMGetGlobalSection() at > > /PETSc3/petsc/petsc-dev/src/dm/interface/dm.c:4527 > > [1]PETSC ERROR: Configure options --PETSC_ARCH=arch-master-debug > > --download-bamg --download-bison --download-chaco --download-ctetgen > > --download-egads --download-eigen --download-exodusii --download-fftw > > --download-hpddm --download-ks --download-libceed --download-libpng > > --download-metis --download-ml --download-mumps --download-muparser > > --download-netcdf --download-opencascade --download-p4est > > --download-parmetis --download-pnetcdf --download-pragmatic > > --download-ptscotch --download-scalapack --download-slepc > > --download-suitesparse --download-superlu_dist --download-tetgen > > --download-triangle --with-cmake-exec=/PETSc3/petsc/apple/bin/cmake > > --with-ctest-exec=/PETSc3/petsc/apple/bin/ctest > > --with-hdf5-dir=/PETSc3/petsc/apple --with-mpi-dir=/PETSc3/petsc/apple > > --with-petsc4py=1 --with-shared-libraries --with-slepc --with-zlib > > [0]PETSC ERROR: #3 DMPlexSectionLoad_HDF5_Internal() at > > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plexhdf5.c:2750 > > [1]PETSC ERROR: #1 PetscSectionCreateGlobalSection() at > > /PETSc3/petsc/petsc-dev/src/vec/is/section/interface/section.c:1322 > > [0]PETSC ERROR: #4 DMPlexSectionLoad() at > > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plex.c:2364 > > [1]PETSC ERROR: #2 DMGetGlobalSection() at > > /PETSc3/petsc/petsc-dev/src/dm/interface/dm.c:4527 > > [0]PETSC ERROR: #5 main() at > > /Users/knepley/Downloads/tmp/Berend/readandcreate.c:85 > > [1]PETSC ERROR: #3 DMPlexSectionLoad_HDF5_Internal() at > > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plexhdf5.c:2750 > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -malloc_debug (source: environment) > > [1]PETSC ERROR: #4 DMPlexSectionLoad() at > > /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plex.c:2364 > > [1]PETSC ERROR: #5 main() at > > /Users/knepley/Downloads/tmp/Berend/readandcreate.c:85 > > [0]PETSC ERROR: -start_in_debugger_no (source: command line) > > [1]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > > error message to petsc-maint at mcs.anl.gov---------- > > [1]PETSC ERROR: -malloc_debug (source: environment) > > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > > [1]PETSC ERROR: -start_in_debugger_no (source: command line) > > [1]PETSC ERROR: ----------------End of Error Message -------send entire > > error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_SELF, 60) - process 0 > > 4) We now have parallel HDF5 loading, so you should not have to manually > > distribute. I will change your example to use it > > and send it back when I am done. > > > > Thanks! > > > > Matt > > > > Many thanks and kind regards, > > Berend. > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at siemens.com Tue Jan 24 15:01:47 2023 From: sam.guo at siemens.com (Guo, Sam) Date: Tue, 24 Jan 2023 21:01:47 +0000 Subject: [petsc-users] compile PETSc on win using clang Message-ID: Hi PETSc dev team, I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I've already made intel compiler work on win using win32fe icl). Thanks, Sam Guo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jan 24 15:12:05 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 24 Jan 2023 16:12:05 -0500 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: Message-ID: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> Are you using clang as a replacement for the * "Unix-like" Cygwin GNU compilers compilers or * MinGW GNU compilers that are compatible with the Microsoft compilers? If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. Send the configure.log and make.log if things go wrong and we'll help you out. Barry > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > Hi PETSc dev team, > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > Thanks, > Sam Guo -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Jan 24 16:00:11 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 24 Jan 2023 16:00:11 -0600 (CST) Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> Message-ID: <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> https://www.wikihow.com/Install-Clang-on-Windows Is the clang you have from visual studio - as described above? We don't have experience with using this variant of clang. If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: '--with-cc=win32fe cl --use clang' Satish On Tue, 24 Jan 2023, Barry Smith wrote: > > Are you using clang as a replacement for the > > * "Unix-like" Cygwin GNU compilers compilers or > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > Send the configure.log and make.log if things go wrong and we'll help you out. > > Barry > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > Hi PETSc dev team, > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > Thanks, > > Sam Guo > > From sam.guo at siemens.com Tue Jan 24 17:22:59 2023 From: sam.guo at siemens.com (Guo, Sam) Date: Tue, 24 Jan 2023 23:22:59 +0000 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> Message-ID: Attached please find configure.log. error messgae: C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier ________________________________ From: Satish Balay Sent: Tuesday, January 24, 2023 2:00 PM To: Barry Smith Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] compile PETSc on win using clang https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7Ca6e1607f7e23403f9b4008dafe56627d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638101944252560682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BOy9RDMGw11IlwRthzcB5Il3YUIgVVrukbzOMFdV8MI%3D&reserved=0 Is the clang you have from visual studio - as described above? We don't have experience with using this variant of clang. If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: '--with-cc=win32fe cl --use clang' Satish On Tue, 24 Jan 2023, Barry Smith wrote: > > Are you using clang as a replacement for the > > * "Unix-like" Cygwin GNU compilers compilers or > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > Send the configure.log and make.log if things go wrong and we'll help you out. > > Barry > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > Hi PETSc dev team, > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > Thanks, > > Sam Guo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 1239118 bytes Desc: configure.log URL: From balay at mcs.anl.gov Tue Jan 24 19:15:46 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 24 Jan 2023 19:15:46 -0600 (CST) Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> Message-ID: <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use win32fe clang --with-fc=0 --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" It should be --with-cc="win32fe cl --use clang" But then - this mode is untested with configure - so there could be other issues. Also - do you need c++? If not - use --with-cxx=0. This can avoid the error below. [for clang++ - you might need --with-cxx="win32fe cl --use clang++" - again untested - so might not work..] Satish On Tue, 24 Jan 2023, Guo, Sam wrote: > Attached please find configure.log. > > error messgae: > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > ________________________________ > From: Satish Balay > Sent: Tuesday, January 24, 2023 2:00 PM > To: Barry Smith > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] compile PETSc on win using clang > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7Ca6e1607f7e23403f9b4008dafe56627d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638101944252560682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BOy9RDMGw11IlwRthzcB5Il3YUIgVVrukbzOMFdV8MI%3D&reserved=0 > > Is the clang you have from visual studio - as described above? > > We don't have experience with using this variant of clang. > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > '--with-cc=win32fe cl --use clang' > > Satish > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > Are you using clang as a replacement for the > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > Barry > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > Hi PETSc dev team, > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > Thanks, > > > Sam Guo > > > > > From sam.guo at siemens.com Tue Jan 24 19:26:13 2023 From: sam.guo at siemens.com (Guo, Sam) Date: Wed, 25 Jan 2023 01:26:13 +0000 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Message-ID: configure.log with -with-cc="win32fe cl --use clang" ________________________________ From: Satish Balay Sent: Tuesday, January 24, 2023 5:15 PM To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) Cc: Barry Smith ; petsc-users Subject: Re: [petsc-users] compile PETSc on win using clang Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use win32fe clang --with-fc=0 --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" It should be --with-cc="win32fe cl --use clang" But then - this mode is untested with configure - so there could be other issues. Also - do you need c++? If not - use --with-cxx=0. This can avoid the error below. [for clang++ - you might need --with-cxx="win32fe cl --use clang++" - again untested - so might not work..] Satish On Tue, 24 Jan 2023, Guo, Sam wrote: > Attached please find configure.log. > > error messgae: > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > ________________________________ > From: Satish Balay > Sent: Tuesday, January 24, 2023 2:00 PM > To: Barry Smith > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] compile PETSc on win using clang > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7Cf91c34cd02bb4cea9d4808dafe71b408%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638102061582605791%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DW33e3209lC4piF9wlJeQLR%2BRmAHQakczlgTUKe%2Bcuo%3D&reserved=0 > > Is the clang you have from visual studio - as described above? > > We don't have experience with using this variant of clang. > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > '--with-cc=win32fe cl --use clang' > > Satish > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > Are you using clang as a replacement for the > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > Barry > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > Hi PETSc dev team, > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > Thanks, > > > Sam Guo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 202062 bytes Desc: configure.log URL: From balay at mcs.anl.gov Tue Jan 24 19:36:15 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 24 Jan 2023 19:36:15 -0600 (CST) Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Message-ID: Do you have clang in your PATH? Is the binary named clang for something else? >>>>>>> Defined make macro "CC" to "/home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang" Executing: /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang -c -o /tmp/petsc-t4q6osvc/config.setCompilers/conftest.o -I/tmp/petsc-t4q6osvc/config.setCompilers -O2 -MD -wd4996 /tmp/petsc-t4q6osvc/config.setCompilers/conftest.c Possible ERROR while running compiler: exit code 15 Source: #include "confdefs.h" #include "conffix.h" int main() { ; return 0; } Error testing C compiler: Cannot compile C with /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang. <<<<<< You can try this test below manually - with the --verbose option. [I don't have clang - so using 'cl' here] >>>>>> balay at ps5 ~/petsc/src/benchmarks $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use cl --verbose sizeof.c Using tool: cl Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe2972.tmp cl Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 del C:\cygwin64\tmp\wfe2972.tmp Attempting to create file: C:\cygwin64\tmp\wfe2983.tmp cl sizeof.c -link Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 Copyright (C) Microsoft Corporation. All rights reserved. sizeof.c Microsoft (R) Incremental Linker Version 14.30.30709.0 Copyright (C) Microsoft Corporation. All rights reserved. /out:sizeof.exe sizeof.obj del C:\cygwin64\tmp\wfe2983.tmp balay at ps5 ~/petsc/src/benchmarks $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use dummy-binary --verbose sizeof.c Using tool: dummy-binary Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe6AD0.tmp dummy-binary win32feutils::CreateProcess failed. dummy-binary The system cannot find the file specified. del C:\cygwin64\tmp\wfe6AD0.tmp balay at ps5 ~/petsc/src/benchmarks $ <<<<<<< Satish On Wed, 25 Jan 2023, Guo, Sam wrote: > configure.log with -with-cc="win32fe cl --use clang" > ________________________________ > From: Satish Balay > Sent: Tuesday, January 24, 2023 5:15 PM > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > Cc: Barry Smith ; petsc-users > Subject: Re: [petsc-users] compile PETSc on win using clang > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use win32fe clang --with-fc=0 > --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real > --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" > > It should be --with-cc="win32fe cl --use clang" > > But then - this mode is untested with configure - so there could be other issues. > > Also - do you need c++? If not - use --with-cxx=0. This can avoid the error below. [for clang++ - you might need --with-cxx="win32fe cl --use clang++" - again untested - so might not work..] > > Satish > > > On Tue, 24 Jan 2023, Guo, Sam wrote: > > > Attached please find configure.log. > > > > error messgae: > > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > > > ________________________________ > > From: Satish Balay > > Sent: Tuesday, January 24, 2023 2:00 PM > > To: Barry Smith > > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) ; petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7Cf91c34cd02bb4cea9d4808dafe71b408%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638102061582605791%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DW33e3209lC4piF9wlJeQLR%2BRmAHQakczlgTUKe%2Bcuo%3D&reserved=0 > > > > Is the clang you have from visual studio - as described above? > > > > We don't have experience with using this variant of clang. > > > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > > > '--with-cc=win32fe cl --use clang' > > > > Satish > > > > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > > > > Are you using clang as a replacement for the > > > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > > > Barry > > > > > > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > > > Hi PETSc dev team, > > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > > > Thanks, > > > > Sam Guo > > > > > > > > > From sam.guo at siemens.com Tue Jan 24 20:14:27 2023 From: sam.guo at siemens.com (Guo, Sam) Date: Wed, 25 Jan 2023 02:14:27 +0000 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Message-ID: I've added the path to clang to PATH but configuration still failed. configure.log is attached. Thanks a lot for your help. ________________________________ From: Satish Balay Sent: Tuesday, January 24, 2023 5:36 PM To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) Cc: petsc-users Subject: Re: [petsc-users] compile PETSc on win using clang Do you have clang in your PATH? Is the binary named clang for something else? >>>>>>> Defined make macro "CC" to "/home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang" Executing: /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang -c -o /tmp/petsc-t4q6osvc/config.setCompilers/conftest.o -I/tmp/petsc-t4q6osvc/config.setCompilers -O2 -MD -wd4996 /tmp/petsc-t4q6osvc/config.setCompilers/conftest.c Possible ERROR while running compiler: exit code 15 Source: #include "confdefs.h" #include "conffix.h" int main() { ; return 0; } Error testing C compiler: Cannot compile C with /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang. <<<<<< You can try this test below manually - with the --verbose option. [I don't have clang - so using 'cl' here] >>>>>> balay at ps5 ~/petsc/src/benchmarks $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use cl --verbose sizeof.c Using tool: cl Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe2972.tmp cl Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 del C:\cygwin64\tmp\wfe2972.tmp Attempting to create file: C:\cygwin64\tmp\wfe2983.tmp cl sizeof.c -link Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 Copyright (C) Microsoft Corporation. All rights reserved. sizeof.c Microsoft (R) Incremental Linker Version 14.30.30709.0 Copyright (C) Microsoft Corporation. All rights reserved. /out:sizeof.exe sizeof.obj del C:\cygwin64\tmp\wfe2983.tmp balay at ps5 ~/petsc/src/benchmarks $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use dummy-binary --verbose sizeof.c Using tool: dummy-binary Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe6AD0.tmp dummy-binary win32feutils::CreateProcess failed. dummy-binary The system cannot find the file specified. del C:\cygwin64\tmp\wfe6AD0.tmp balay at ps5 ~/petsc/src/benchmarks $ <<<<<<< Satish On Wed, 25 Jan 2023, Guo, Sam wrote: > configure.log with -with-cc="win32fe cl --use clang" > ________________________________ > From: Satish Balay > Sent: Tuesday, January 24, 2023 5:15 PM > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > Cc: Barry Smith ; petsc-users > Subject: Re: [petsc-users] compile PETSc on win using clang > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use win32fe clang --with-fc=0 > --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real > --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" > > It should be --with-cc="win32fe cl --use clang" > > But then - this mode is untested with configure - so there could be other issues. > > Also - do you need c++? If not - use --with-cxx=0. This can avoid the error below. [for clang++ - you might need --with-cxx="win32fe cl --use clang++" - again untested - so might not work..] > > Satish > > > On Tue, 24 Jan 2023, Guo, Sam wrote: > > > Attached please find configure.log. > > > > error messgae: > > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > > > ________________________________ > > From: Satish Balay > > Sent: Tuesday, January 24, 2023 2:00 PM > > To: Barry Smith > > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) ; petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7C1ca3b3b2be7e4886974d08dafe749081%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638102073862928368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6Qn3tRbky7pGr1qqSz2IBDsO2q8MJGis%2B57JEyzNSBk%3D&reserved=0 > > > > Is the clang you have from visual studio - as described above? > > > > We don't have experience with using this variant of clang. > > > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > > > '--with-cc=win32fe cl --use clang' > > > > Satish > > > > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > > > > Are you using clang as a replacement for the > > > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > > > Barry > > > > > > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > > > Hi PETSc dev team, > > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > > > Thanks, > > > > Sam Guo > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 205160 bytes Desc: configure.log URL: From balay at mcs.anl.gov Tue Jan 24 21:36:51 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 24 Jan 2023 21:36:51 -0600 (CST) Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Message-ID: Here is my prior message: >>>>>> https://www.wikihow.com/Install-Clang-on-Windows Is the clang you have from visual studio - as described above? We don't have experience with using this variant of clang. If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: '--with-cc=win32fe cl --use clang' <<<< However I see: >>> /home/xian/dev/compilers/win64/clang11.1.0/bin/ <<< Clearly its not the above compiler [with the above constraints] - so the above instructions won't work. How do you get this compiler? Any particular reason you need to build PETSc with this compiler? Satish On Wed, 25 Jan 2023, Guo, Sam wrote: > I've added the path to clang to PATH but configuration still failed. configure.log is attached. Thanks a lot for your help. > ________________________________ > From: Satish Balay > Sent: Tuesday, January 24, 2023 5:36 PM > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > Cc: petsc-users > Subject: Re: [petsc-users] compile PETSc on win using clang > > Do you have clang in your PATH? Is the binary named clang for something else? > > > >>>>>>> > Defined make macro "CC" to "/home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang" > Executing: /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang -c -o /tmp/petsc-t4q6osvc/config.setCompilers/conftest.o -I/tmp/petsc-t4q6osvc/config.setCompilers -O2 -MD -wd4996 /tmp/petsc-t4q6osvc/config.setCompilers/conftest.c > Possible ERROR while running compiler: exit code 15 > Source: > #include "confdefs.h" > #include "conffix.h" > > int main() { > ; > return 0; > } > > Error testing C compiler: Cannot compile C with /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang. > > <<<<<< > > You can try this test below manually - with the --verbose option. [I don't have clang - so using 'cl' here] > > >>>>>> > balay at ps5 ~/petsc/src/benchmarks > $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use cl --verbose sizeof.c > > Using tool: cl > Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM > Attempting to create file: C:\cygwin64\tmp\wfe2972.tmp > cl > Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 > del C:\cygwin64\tmp\wfe2972.tmp > Attempting to create file: C:\cygwin64\tmp\wfe2983.tmp > cl sizeof.c -link > Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 > Copyright (C) Microsoft Corporation. All rights reserved. > > sizeof.c > Microsoft (R) Incremental Linker Version 14.30.30709.0 > Copyright (C) Microsoft Corporation. All rights reserved. > > /out:sizeof.exe > sizeof.obj > del C:\cygwin64\tmp\wfe2983.tmp > > balay at ps5 ~/petsc/src/benchmarks > $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use dummy-binary --verbose sizeof.c > > Using tool: dummy-binary > Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM > Attempting to create file: C:\cygwin64\tmp\wfe6AD0.tmp > dummy-binary > win32feutils::CreateProcess failed. > dummy-binary > The system cannot find the file specified. > > del C:\cygwin64\tmp\wfe6AD0.tmp > > balay at ps5 ~/petsc/src/benchmarks > $ > > <<<<<<< > > Satish > > > > On Wed, 25 Jan 2023, Guo, Sam wrote: > > > configure.log with -with-cc="win32fe cl --use clang" > > ________________________________ > > From: Satish Balay > > Sent: Tuesday, January 24, 2023 5:15 PM > > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > > Cc: Barry Smith ; petsc-users > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use win32fe clang --with-fc=0 > > --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real > > --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" > > > > It should be --with-cc="win32fe cl --use clang" > > > > But then - this mode is untested with configure - so there could be other issues. > > > > Also - do you need c++? If not - use --with-cxx=0. This can avoid the error below. [for clang++ - you might need --with-cxx="win32fe cl --use clang++" - again untested - so might not work..] > > > > Satish > > > > > > On Tue, 24 Jan 2023, Guo, Sam wrote: > > > > > Attached please find configure.log. > > > > > > error messgae: > > > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > > > > > ________________________________ > > > From: Satish Balay > > > Sent: Tuesday, January 24, 2023 2:00 PM > > > To: Barry Smith > > > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) ; petsc-users at mcs.anl.gov > > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7C1ca3b3b2be7e4886974d08dafe749081%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638102073862928368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6Qn3tRbky7pGr1qqSz2IBDsO2q8MJGis%2B57JEyzNSBk%3D&reserved=0 > > > > > > Is the clang you have from visual studio - as described above? > > > > > > We don't have experience with using this variant of clang. > > > > > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > > > > > '--with-cc=win32fe cl --use clang' > > > > > > Satish > > > > > > > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > > > > > > > Are you using clang as a replacement for the > > > > > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > > > > > Hi PETSc dev team, > > > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > > > > > Thanks, > > > > > Sam Guo > > > > > > > > > > > > > > From sam.guo at siemens.com Tue Jan 24 22:19:48 2023 From: sam.guo at siemens.com (Guo, Sam) Date: Wed, 25 Jan 2023 04:19:48 +0000 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Message-ID: Sorry, I don't know too much about clang. Our code is compiled on win using clang and that's why I want to compile PETSc using clang as well. -----Original Message----- From: Satish Balay Sent: Tuesday, January 24, 2023 7:37 PM To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) Cc: petsc-users Subject: Re: [petsc-users] compile PETSc on win using clang Here is my prior message: >>>>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7C2dcacf01ab3848588c5908dafe856a17%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638102146240080885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pUZfgzke8JGODkOLNR1101lrdINswnqqg2kM%2FOAN2kU%3D&reserved=0 Is the clang you have from visual studio - as described above? We don't have experience with using this variant of clang. If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: '--with-cc=win32fe cl --use clang' <<<< However I see: >>> /home/xian/dev/compilers/win64/clang11.1.0/bin/ <<< Clearly its not the above compiler [with the above constraints] - so the above instructions won't work. How do you get this compiler? Any particular reason you need to build PETSc with this compiler? Satish On Wed, 25 Jan 2023, Guo, Sam wrote: > I've added the path to clang to PATH but configuration still failed. configure.log is attached. Thanks a lot for your help. > ________________________________ > From: Satish Balay > Sent: Tuesday, January 24, 2023 5:36 PM > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > Cc: petsc-users > Subject: Re: [petsc-users] compile PETSc on win using clang > > Do you have clang in your PATH? Is the binary named clang for something else? > > > >>>>>>> > Defined make macro "CC" to "/home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang" > Executing: /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl > --use clang -c -o /tmp/petsc-t4q6osvc/config.setCompilers/conftest.o > -I/tmp/petsc-t4q6osvc/config.setCompilers -O2 -MD -wd4996 > /tmp/petsc-t4q6osvc/config.setCompilers/conftest.c > Possible ERROR while running compiler: exit code 15 > Source: > #include "confdefs.h" > #include "conffix.h" > > int main() { > ; > return 0; > } > > Error testing C compiler: Cannot compile C with /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang. > > <<<<<< > > You can try this test below manually - with the --verbose option. [I > don't have clang - so using 'cl' here] > > >>>>>> > balay at ps5 ~/petsc/src/benchmarks > $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use cl --verbose sizeof.c > > Using tool: cl > Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 > 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe2972.tmp cl > Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 > del C:\cygwin64\tmp\wfe2972.tmp Attempting to create file: > C:\cygwin64\tmp\wfe2983.tmp cl sizeof.c -link Microsoft (R) C/C++ > Optimizing Compiler Version 19.30.30709 for x64 Copyright (C) > Microsoft Corporation. All rights reserved. > > sizeof.c > Microsoft (R) Incremental Linker Version 14.30.30709.0 Copyright (C) > Microsoft Corporation. All rights reserved. > > /out:sizeof.exe > sizeof.obj > del C:\cygwin64\tmp\wfe2983.tmp > > balay at ps5 ~/petsc/src/benchmarks > $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use dummy-binary > --verbose sizeof.c > > Using tool: dummy-binary > Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 > 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe6AD0.tmp > dummy-binary win32feutils::CreateProcess failed. > dummy-binary > The system cannot find the file specified. > > del C:\cygwin64\tmp\wfe6AD0.tmp > > balay at ps5 ~/petsc/src/benchmarks > $ > > <<<<<<< > > Satish > > > > On Wed, 25 Jan 2023, Guo, Sam wrote: > > > configure.log with -with-cc="win32fe cl --use clang" > > ________________________________ > > From: Satish Balay > > Sent: Tuesday, January 24, 2023 5:15 PM > > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > > Cc: Barry Smith ; petsc-users > > > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > Configure Options: --configModules=PETSc.Configure > > --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use > > win32fe clang --with-fc=0 > > --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD > > -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" > > > > It should be --with-cc="win32fe cl --use clang" > > > > But then - this mode is untested with configure - so there could be other issues. > > > > Also - do you need c++? If not - use --with-cxx=0. This can avoid > > the error below. [for clang++ - you might need --with-cxx="win32fe > > cl --use clang++" - again untested - so might not work..] > > > > Satish > > > > > > On Tue, 24 Jan 2023, Guo, Sam wrote: > > > > > Attached please find configure.log. > > > > > > error messgae: > > > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device > > > .cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > > > > > ________________________________ > > > From: Satish Balay > > > Sent: Tuesday, January 24, 2023 2:00 PM > > > To: Barry Smith > > > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > > > ; petsc-users at mcs.anl.gov > > > > > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > www.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo% > > > 40siemens.com%7C2dcacf01ab3848588c5908dafe856a17%7C38ae3bcd95794fd > > > 4addab42e1495d55a%7C1%7C0%7C638102146240080885%7CUnknown%7CTWFpbGZ > > > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M > > > n0%3D%7C3000%7C%7C%7C&sdata=pUZfgzke8JGODkOLNR1101lrdINswnqqg2kM%2 > > > FOAN2kU%3D&reserved=0 > > > > > > Is the clang you have from visual studio - as described above? > > > > > > We don't have experience with using this variant of clang. > > > > > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > > > > > '--with-cc=win32fe cl --use clang' > > > > > > Satish > > > > > > > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > > > > > > > Are you using clang as a replacement for the > > > > > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > > > > > Hi PETSc dev team, > > > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > > > > > Thanks, > > > > > Sam Guo > > > > > > > > > > > > > > From balay at mcs.anl.gov Wed Jan 25 00:17:01 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 25 Jan 2023 00:17:01 -0600 (CST) Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> <02301c55-53ae-eb1f-7669-d08087ce1be2@mcs.anl.gov> Message-ID: <1fca0a3c-0a23-886d-ea59-6182030e28ca@mcs.anl.gov> Where does one download this install of clang from? I see one at https://github.com/llvm/llvm-project/releases/tag/llvmorg-15.0.6 i.e https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.6/LLVM-15.0.6-win64.exe - but it looks a bit different that what you have. There is also clang available via cygwin, and PETSc can be build with it. [but that's a bit different than win-clang] However to build with clang that you have [and the above from llvm] - it might need significant work from someone who understand the windows issues. One way cold be to improve win32fe to support win-clang [but that's not easy - win32fe code base is a very old - and hardly modified in past many years] Alternative could be to get petsc configure working with win-python - and use it with win-clang [deal with issues like /tmp and such paths without cygwin]. Again could be a major project. [and then fortran, MPI?] So no easy solution here to get this working. Satish On Wed, 25 Jan 2023, Guo, Sam wrote: > Sorry, I don't know too much about clang. Our code is compiled on win using clang and that's why I want to compile PETSc using clang as well. > > -----Original Message----- > From: Satish Balay > Sent: Tuesday, January 24, 2023 7:37 PM > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > Cc: petsc-users > Subject: Re: [petsc-users] compile PETSc on win using clang > > Here is my prior message: > > >>>>>> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7C2dcacf01ab3848588c5908dafe856a17%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638102146240080885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pUZfgzke8JGODkOLNR1101lrdINswnqqg2kM%2FOAN2kU%3D&reserved=0 > > Is the clang you have from visual studio - as described above? > > We don't have experience with using this variant of clang. > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > '--with-cc=win32fe cl --use clang' > <<<< > > However I see: > > >>> > /home/xian/dev/compilers/win64/clang11.1.0/bin/ > <<< > > Clearly its not the above compiler [with the above constraints] - so the above instructions won't work. > > How do you get this compiler? > > Any particular reason you need to build PETSc with this compiler? > > Satish > > > On Wed, 25 Jan 2023, Guo, Sam wrote: > > > I've added the path to clang to PATH but configuration still failed. configure.log is attached. Thanks a lot for your help. > > ________________________________ > > From: Satish Balay > > Sent: Tuesday, January 24, 2023 5:36 PM > > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > > Cc: petsc-users > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > Do you have clang in your PATH? Is the binary named clang for something else? > > > > > > >>>>>>> > > Defined make macro "CC" to "/home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang" > > Executing: /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl > > --use clang -c -o /tmp/petsc-t4q6osvc/config.setCompilers/conftest.o > > -I/tmp/petsc-t4q6osvc/config.setCompilers -O2 -MD -wd4996 > > /tmp/petsc-t4q6osvc/config.setCompilers/conftest.c > > Possible ERROR while running compiler: exit code 15 > > Source: > > #include "confdefs.h" > > #include "conffix.h" > > > > int main() { > > ; > > return 0; > > } > > > > Error testing C compiler: Cannot compile C with /home/xian/dev/star/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang. > > > > <<<<<< > > > > You can try this test below manually - with the --verbose option. [I > > don't have clang - so using 'cl' here] > > > > >>>>>> > > balay at ps5 ~/petsc/src/benchmarks > > $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use cl --verbose sizeof.c > > > > Using tool: cl > > Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 > > 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe2972.tmp cl > > Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x64 > > del C:\cygwin64\tmp\wfe2972.tmp Attempting to create file: > > C:\cygwin64\tmp\wfe2983.tmp cl sizeof.c -link Microsoft (R) C/C++ > > Optimizing Compiler Version 19.30.30709 for x64 Copyright (C) > > Microsoft Corporation. All rights reserved. > > > > sizeof.c > > Microsoft (R) Incremental Linker Version 14.30.30709.0 Copyright (C) > > Microsoft Corporation. All rights reserved. > > > > /out:sizeof.exe > > sizeof.obj > > del C:\cygwin64\tmp\wfe2983.tmp > > > > balay at ps5 ~/petsc/src/benchmarks > > $ ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use dummy-binary > > --verbose sizeof.c > > > > Using tool: dummy-binary > > Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 > > 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe6AD0.tmp > > dummy-binary win32feutils::CreateProcess failed. > > dummy-binary > > The system cannot find the file specified. > > > > del C:\cygwin64\tmp\wfe6AD0.tmp > > > > balay at ps5 ~/petsc/src/benchmarks > > $ > > > > <<<<<<< > > > > Satish > > > > > > > > On Wed, 25 Jan 2023, Guo, Sam wrote: > > > > > configure.log with -with-cc="win32fe cl --use clang" > > > ________________________________ > > > From: Satish Balay > > > Sent: Tuesday, January 24, 2023 5:15 PM > > > To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > > > Cc: Barry Smith ; petsc-users > > > > > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > > > Configure Options: --configModules=PETSc.Configure > > > --optionsModule=config.compilerOptions --with-cc="win32fe cl" --use > > > win32fe clang --with-fc=0 > > > --with-debugging=0 -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD > > > -wd4996" --with-mpi=0 --with-clean=1 --force --with-scalar-type=real --ignore-cygwin-link -CFLAGS="-O2 -MD -wd4996" -CXXFLAGS="-O2 -MD -wd4996" > > > > > > It should be --with-cc="win32fe cl --use clang" > > > > > > But then - this mode is untested with configure - so there could be other issues. > > > > > > Also - do you need c++? If not - use --with-cxx=0. This can avoid > > > the error below. [for clang++ - you might need --with-cxx="win32fe > > > cl --use clang++" - again untested - so might not work..] > > > > > > Satish > > > > > > > > > On Tue, 24 Jan 2023, Guo, Sam wrote: > > > > > > > Attached please find configure.log. > > > > > > > > error messgae: > > > > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device > > > > .cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > > > > > > > ________________________________ > > > > From: Satish Balay > > > > Sent: Tuesday, January 24, 2023 2:00 PM > > > > To: Barry Smith > > > > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) > > > > ; petsc-users at mcs.anl.gov > > > > > > > > Subject: Re: [petsc-users] compile PETSc on win using clang > > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > > www.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo% > > > > 40siemens.com%7C2dcacf01ab3848588c5908dafe856a17%7C38ae3bcd95794fd > > > > 4addab42e1495d55a%7C1%7C0%7C638102146240080885%7CUnknown%7CTWFpbGZ > > > > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M > > > > n0%3D%7C3000%7C%7C%7C&sdata=pUZfgzke8JGODkOLNR1101lrdINswnqqg2kM%2 > > > > FOAN2kU%3D&reserved=0 > > > > > > > > Is the clang you have from visual studio - as described above? > > > > > > > > We don't have experience with using this variant of clang. > > > > > > > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > > > > > > > '--with-cc=win32fe cl --use clang' > > > > > > > > Satish > > > > > > > > > > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > > > > > > > > > > Are you using clang as a replacement for the > > > > > > > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > > > > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > > > > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > > > > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam wrote: > > > > > > > > > > > > Hi PETSc dev team, > > > > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > > > > > > > Thanks, > > > > > > Sam Guo > > > > > > > > > > > > > > > > > > > > From bsmith at petsc.dev Wed Jan 25 09:47:39 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 25 Jan 2023 10:47:39 -0500 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> Message-ID: Please do as Satish previously suggested ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang --verbose sizeof.c Also do clang sizeof.c and send the output of both. Where sizeof.c is #include int main(int argc,char **args) { printf("%d\n",(int)sizeof(int)); return 0; } I may have typos in my sample code so please fix those. Barry > On Jan 24, 2023, at 6:22 PM, Guo, Sam wrote: > > Attached please find configure.log. > > error messgae: > C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier > > From: Satish Balay > > Sent: Tuesday, January 24, 2023 2:00 PM > To: Barry Smith > > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) >; petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] compile PETSc on win using clang > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7Ca6e1607f7e23403f9b4008dafe56627d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638101944252560682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BOy9RDMGw11IlwRthzcB5Il3YUIgVVrukbzOMFdV8MI%3D&reserved=0 > > Is the clang you have from visual studio - as described above? > > We don't have experience with using this variant of clang. > > If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: > > '--with-cc=win32fe cl --use clang' > > Satish > > > On Tue, 24 Jan 2023, Barry Smith wrote: > > > > > Are you using clang as a replacement for the > > > > * "Unix-like" Cygwin GNU compilers compilers or > > > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > > > Send the configure.log and make.log if things go wrong and we'll help you out. > > > > Barry > > > > > > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam > wrote: > > > > > > Hi PETSc dev team, > > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I?ve already made intel compiler work on win using win32fe icl). > > > > > > Thanks, > > > Sam Guo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at siemens.com Wed Jan 25 11:31:47 2023 From: sam.guo at siemens.com (Guo, Sam) Date: Wed, 25 Jan 2023 17:31:47 +0000 Subject: [petsc-users] compile PETSc on win using clang In-Reply-To: References: <95C464C1-C5D4-436B-99A7-E1D15F8630B5@petsc.dev> <82e36b51-8340-e738-a0d8-959befc746e5@mcs.anl.gov> Message-ID: Hi Barry, Here is the output of win32fe cl --use clang --verbose sizeof.c Using tool: clang Win32 Development Tool Front End, version 1.11.4 Fri, Sep 10, 2021 6:33:40 PM Attempting to create file: C:\cygwin64\tmp\wfe1EB0.tmp clang clang: error: no input files del C:\cygwin64\tmp\wfe1EB0.tmp Attempting to create file: C:\cygwin64\tmp\wfe1ED1.tmp clang sizeof.c -link LINK : fatal error LNK1181: cannot open input file 'ink.lib' clang: error: linker command failed with exit code 1181 (use -v to see invocation) del C:\cygwin64\tmp\wfe1ED1.tmp clang sizeof.c works which produces a.exe. This is from clang -help: OVERVIEW: clang LLVM compiler From: Barry Smith Sent: Wednesday, January 25, 2023 7:48 AM To: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) Cc: petsc-users Subject: Re: [petsc-users] compile PETSc on win using clang Please do as Satish previously suggested ~/petsc/lib/petsc/bin/win32fe/win32fe cl --use clang --verbose sizeof.c Also do clang sizeof.c and send the output of both. Where sizeof.c is #include int main(int argc,char **args) { printf("%d\n",(int)sizeof(int)); return 0; } I may have typos in my sample code so please fix those. Barry On Jan 24, 2023, at 6:22 PM, Guo, Sam > wrote: Attached please find configure.log. error messgae: C:\home\xian\dev\star\petsc\src\sys\objects\device\INTERF~1\device.cxx(486): error C2065: 'PETSC_DEVICE_CASE': undeclared identifier ________________________________ From: Satish Balay > Sent: Tuesday, January 24, 2023 2:00 PM To: Barry Smith > Cc: Guo, Sam (DI SW STS SDDEV MECH PHY FEA FW) >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] compile PETSc on win using clang https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikihow.com%2FInstall-Clang-on-Windows&data=05%7C01%7Csam.guo%40siemens.com%7Ca6e1607f7e23403f9b4008dafe56627d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638101944252560682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BOy9RDMGw11IlwRthzcB5Il3YUIgVVrukbzOMFdV8MI%3D&reserved=0 Is the clang you have from visual studio - as described above? We don't have experience with using this variant of clang. If its compatible with 'cl' - and supports the same command interface as 'cl' then the following might work [assuming clang.exe is the compiler binary installed - and available in PATH]: '--with-cc=win32fe cl --use clang' Satish On Tue, 24 Jan 2023, Barry Smith wrote: > > Are you using clang as a replacement for the > > * "Unix-like" Cygwin GNU compilers compilers or > > * MinGW GNU compilers that are compatible with the Microsoft compilers? > > If the former, follow the instructions for using the Cygwin GNU compilers, if the latter follow the directions for the MinGW compilers. > > Send the configure.log and make.log if things go wrong and we'll help you out. > > Barry > > > > > > On Jan 24, 2023, at 4:01 PM, Guo, Sam > wrote: > > > > Hi PETSc dev team, > > I try to compile PETSc on win using clang. I am wondering if you could give me some hint. (I've already made intel compiler work on win using win32fe icl). > > > > Thanks, > > Sam Guo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Jan 25 13:07:32 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 25 Jan 2023 14:07:32 -0500 Subject: [petsc-users] Register for the next PETSc users meeting: Chicago June 5-7, 2023 Message-ID: We are pleased to announce the next PETSc users meeting in Chicago on June 5-7, 2023. https://petsc.org/release/community/meetings/2023 Please register now and submit your talks. The meeting will include a lightning tutorial for new users Monday morning June 5, a workshop for potential PETSc contributors, a speed dating session to discuss your application needs with PETSc developers, mini tutorials on advanced PETSc topics, as well as user presentations on their work using PETSc. Mark your calendars for the 2024 PETSc users meeting, May 23,24 in Cologne, Germany. Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Thu Jan 26 04:00:33 2023 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Thu, 26 Jan 2023 11:00:33 +0100 Subject: [petsc-users] Bug in DMLocalizeCoordinates/Periodic for PETSc versions >= 3.18? Message-ID: <0cf3b6ef-4735-b0ae-73a3-2950f6883e0f@ovgu.de> Dear Petsc-Team, Since Petsc-3.18 our code no longer runs successfully with periodic geometries. Since this version, the call to DMGetPeriodicity has changed arguments, but even after adapting our code, for some reason calling DMLocalizeCoordinates, at least the way we do, no longer works. To illustrate, I have made the following example, please find it attached. The example creates a box mesh (10 x 10 x 10 cells) with unit size, partitions this and distributes this. Then, the co-ordinates are localized. The vertices of cell 9 (which on a 1 processor execution of the code lies on a periodic domain) are printed before the partitioning and after. For PETSc versions <3.18, the output is: Before partitioning/distributing/localizing: Cell 9 has Vecclosuresize 24 and vertices: 9.000000e-01 0.000000e+00 0.000000e+00 9.000000e-01 1.000000e-01 0.000000e+00 0.000000e+00 1.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 9.000000e-01 0.000000e+00 1.000000e-01 0.000000e+00 0.000000e+00 1.000000e-01 0.000000e+00 1.000000e-01 1.000000e-01 9.000000e-01 1.000000e-01 1.000000e-01 After partitioning/distributing/localizing: Cell 9 has Vecclosuresize 48 and vertices: 9.000000e-01 0.000000e+00 0.000000e+00 9.000000e-01 1.000000e-01 0.000000e+00 1.000000e+00 1.000000e-01 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 9.000000e-01 0.000000e+00 1.000000e-01 1.000000e+00 0.000000e+00 1.000000e-01 1.000000e+00 1.000000e-01 1.000000e-01 9.000000e-01 1.000000e-01 1.000000e-01 (the 'far away' X vertices are correctly put to 1.0) But, with PETSc versions >= 3.18, we get the following output: Before partitioning/distributing/localizing: Cell 9 has Vecclosuresize 24 and vertices: 9.000000e-01 0.000000e+00 0.000000e+00 9.000000e-01 1.000000e-01 0.000000e+00 0.000000e+00 1.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 9.000000e-01 0.000000e+00 1.000000e-01 0.000000e+00 0.000000e+00 1.000000e-01 0.000000e+00 1.000000e-01 1.000000e-01 9.000000e-01 1.000000e-01 1.000000e-01 After partitioning/distributing/localizing: Cell 9 has Vecclosuresize 24 and vertices: 9.000000e-01 0.000000e+00 0.000000e+00 9.000000e-01 1.000000e-01 0.000000e+00 0.000000e+00 1.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 9.000000e-01 0.000000e+00 1.000000e-01 0.000000e+00 0.000000e+00 1.000000e-01 0.000000e+00 1.000000e-01 1.000000e-01 9.000000e-01 1.000000e-01 1.000000e-01 and both the vertices as well as the Vecclosuresize seem to be different from the earlier PETSc versions. Is this a bug? Or are we calling functions in the incorrect order? Thanks, best regards, Berend. -------------- next part -------------- A non-text attachment was scrubbed... Name: examplecode.c Type: text/x-csrc Size: 3712 bytes Desc: not available URL: From venugovh at mail.uc.edu Mon Jan 30 09:36:44 2023 From: venugovh at mail.uc.edu (Venugopal, Vysakh (venugovh)) Date: Mon, 30 Jan 2023 15:36:44 +0000 Subject: [petsc-users] global indices of a vector in each process Message-ID: Hello, I am using a DMCreateGlobalVector to create a vector V. If V is divided into m processes, is there a way to get the global indices of V assigned to each process? For example: V = [10, 20, 30, 40, 50, 60, 70, 80]. If MPI process 0 has [10, 40, 50, 60] and process 1 has [20, 30, 70, 80], is it possible to get the indices for process 0 as [0,3,4,5] and process 1 as [1,2,6,7]? Thanks, Vysakh --- Vysakh Venugopal Ph.D. Candidate Department of Mechanical Engineering University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Jan 30 10:18:22 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 30 Jan 2023 11:18:22 -0500 Subject: [petsc-users] global indices of a vector in each process In-Reply-To: References: Message-ID: <3A539C96-300C-4341-A206-4AAE853F084D@petsc.dev> VecGetOwnershipRange() works with any vector, including DM created vectors. You could not have the global ownership you provide below. Perhaps you are thinking about the Natural ownership values? For those if you using DMDA take a look at DMDAGetAO() and DMDACreateNaturalVector() Barry > On Jan 30, 2023, at 10:36 AM, Venugopal, Vysakh (venugovh) via petsc-users wrote: > > Hello, > > I am using a DMCreateGlobalVector to create a vector V. If V is divided into m processes, is there a way to get the global indices of V assigned to each process? > > For example: V = [10, 20, 30, 40, 50, 60, 70, 80]. > If MPI process 0 has [10, 40, 50, 60] and process 1 has [20, 30, 70, 80], is it possible to get the indices for process 0 as [0,3,4,5] and process 1 as [1,2,6,7]? > > Thanks, > > Vysakh > --- > Vysakh Venugopal > Ph.D. Candidate > Department of Mechanical Engineering > University of Cincinnati, Cincinnati, OH 45221-0072 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guglielmo2 at llnl.gov Mon Jan 30 11:53:23 2023 From: guglielmo2 at llnl.gov (Guglielmo, Tyler Hardy) Date: Mon, 30 Jan 2023 17:53:23 +0000 Subject: [petsc-users] Kronecker Product Message-ID: Hi all, I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! Best, Tyler +++++++++++++++++++++++++++++ Tyler Guglielmo Postdoctoral Researcher Lawrence Livermore National Lab Office: 925-423-6186 Cell: 210-480-8000 +++++++++++++++++++++++++++++ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Jan 30 13:11:49 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 30 Jan 2023 14:11:49 -0500 Subject: [petsc-users] Kronecker Product In-Reply-To: References: Message-ID: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij Barry > On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users wrote: > > Hi all, > > I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? > > An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! > > Best, > Tyler > > +++++++++++++++++++++++++++++ > Tyler Guglielmo > Postdoctoral Researcher > Lawrence Livermore National Lab > Office: 925-423-6186 > Cell: 210-480-8000 > +++++++++++++++++++++++++++++ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guglielmo2 at llnl.gov Mon Jan 30 13:24:36 2023 From: guglielmo2 at llnl.gov (Guglielmo, Tyler Hardy) Date: Mon, 30 Jan 2023 19:24:36 +0000 Subject: [petsc-users] Kronecker Product In-Reply-To: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> Message-ID: Thanks Barry, I saw that function, but wasn?t sure how to apply it since the documentation says that S and T are dense matrices, but in my case all matrices involved are sparse. Is there a way to work around the dense requirement? Best, Tyler From: Barry Smith Date: Monday, January 30, 2023 at 11:12 AM To: Guglielmo, Tyler Hardy Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Kronecker Product Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij Barry On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users wrote: Hi all, I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! Best, Tyler +++++++++++++++++++++++++++++ Tyler Guglielmo Postdoctoral Researcher Lawrence Livermore National Lab Office: 925-423-6186 Cell: 210-480-8000 +++++++++++++++++++++++++++++ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 30 13:30:59 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Jan 2023 14:30:59 -0500 Subject: [petsc-users] Kronecker Product In-Reply-To: References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> Message-ID: On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks Barry, > > > > I saw that function, but wasn?t sure how to apply it since the > documentation says that S and T are dense matrices, but in my case all > matrices involved are sparse. Is there a way to work around the dense > requirement? > We don't have parallel sparse-sparse. It would not be too hard to write, but it would be some work. It is hard to understand the use case. Is one matrix much smaller? If not, and you inherit the distribution from A, it seems like it might be very suboptimal, and otherwise you would have to redistribute on the fly and it would get very complicated. Thanks, Matt > Best, > > Tyler > > > > *From: *Barry Smith > *Date: *Monday, January 30, 2023 at 11:12 AM > *To: *Guglielmo, Tyler Hardy > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Kronecker Product > > > > Do you need the explicit sparse representation of the Kronecker > product? Or do you want to apply it as an operator or solve systems with > it? If the latter you can use > https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij > > > > > Barry > > > > > > > > On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi all, > > > > I am wondering if there is any functionality for taking Kronecker products > of large sparse matrices that are parallel? MatSeqAIJKron is as close as I > have found, but it seems like this does not work for parallel matrices. > Any ideas here? > > > > An option could be to make A and B sequential, compute the Kronecker > product, C, then scatter C into a parallel matrix? This seems like a > horribly inefficient procedure. I?m still fairly new to petsc, so thanks > for patience :)! > > > > Best, > > Tyler > > > > +++++++++++++++++++++++++++++ > > Tyler Guglielmo > > Postdoctoral Researcher > > Lawrence Livermore National Lab > > Office: 925-423-6186 > > Cell: 210-480-8000 > > +++++++++++++++++++++++++++++ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guglielmo2 at llnl.gov Mon Jan 30 13:48:27 2023 From: guglielmo2 at llnl.gov (Guglielmo, Tyler Hardy) Date: Mon, 30 Jan 2023 19:48:27 +0000 Subject: [petsc-users] Kronecker Product In-Reply-To: References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> Message-ID: Both matrices (A and B) would be approximately the same size and large. The use case (for me at least) is to create several large sparse matrices which will be combined in various ways through Kronecker products. The combination happens at every time step in an evolution, so it really needs to be fast as well. I?m thinking mpi/petsc is probably not the most optimal way for dealing with this, and might just have to work with single node multi-threading. Best, Tyler From: Matthew Knepley Date: Monday, January 30, 2023 at 11:31 AM To: Guglielmo, Tyler Hardy Cc: Barry Smith , petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Kronecker Product On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users > wrote: Thanks Barry, I saw that function, but wasn?t sure how to apply it since the documentation says that S and T are dense matrices, but in my case all matrices involved are sparse. Is there a way to work around the dense requirement? We don't have parallel sparse-sparse. It would not be too hard to write, but it would be some work. It is hard to understand the use case. Is one matrix much smaller? If not, and you inherit the distribution from A, it seems like it might be very suboptimal, and otherwise you would have to redistribute on the fly and it would get very complicated. Thanks, Matt Best, Tyler From: Barry Smith > Date: Monday, January 30, 2023 at 11:12 AM To: Guglielmo, Tyler Hardy > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij Barry On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users > wrote: Hi all, I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! Best, Tyler +++++++++++++++++++++++++++++ Tyler Guglielmo Postdoctoral Researcher Lawrence Livermore National Lab Office: 925-423-6186 Cell: 210-480-8000 +++++++++++++++++++++++++++++ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Jan 30 14:00:43 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 30 Jan 2023 15:00:43 -0500 Subject: [petsc-users] Kronecker Product In-Reply-To: References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> Message-ID: <074DC183-D3A2-45D0-9181-79C67F034456@petsc.dev> What is large? If A and B have dimensions of 1000, then the Kronecker product is of size 1,000,000. Do you want the Kronecker product to be explicitly formed or just available as matrix vector products? If just explicitly available then I think you can just store sparse A (for example) completely on all ranks, 10,000 by 10,000 sparse matrix is small for sequential) while B is distributed. Barry > On Jan 30, 2023, at 2:48 PM, Guglielmo, Tyler Hardy wrote: > > Both matrices (A and B) would be approximately the same size and large. The use case (for me at least) is to create several large sparse matrices which will be combined in various ways through Kronecker products. The combination happens at every time step in an evolution, so it really needs to be fast as well. I?m thinking mpi/petsc is probably not the most optimal way for dealing with this, and might just have to work with single node multi-threading. > > Best, > Tyler > > From: Matthew Knepley > > Date: Monday, January 30, 2023 at 11:31 AM > To: Guglielmo, Tyler Hardy > > Cc: Barry Smith >, petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Kronecker Product > > On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users > wrote: > Thanks Barry, > > I saw that function, but wasn?t sure how to apply it since the documentation says that S and T are dense matrices, but in my case all matrices involved are sparse. Is there a way to work around the dense requirement? > > We don't have parallel sparse-sparse. It would not be too hard to write, but it would be some work. > > It is hard to understand the use case. Is one matrix much smaller? If not, and you inherit the distribution from A, it seems > like it might be very suboptimal, and otherwise you would have to redistribute on the fly and it would get very complicated. > > Thanks, > > Matt > > Best, > Tyler > > From: Barry Smith > > Date: Monday, January 30, 2023 at 11:12 AM > To: Guglielmo, Tyler Hardy > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Kronecker Product > > > Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij > > Barry > > > > > > On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users > wrote: > > Hi all, > > I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? > > An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! > > Best, > Tyler > > +++++++++++++++++++++++++++++ > Tyler Guglielmo > Postdoctoral Researcher > Lawrence Livermore National Lab > Office: 925-423-6186 > Cell: 210-480-8000 > +++++++++++++++++++++++++++++ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guglielmo2 at llnl.gov Mon Jan 30 14:07:51 2023 From: guglielmo2 at llnl.gov (Guglielmo, Tyler Hardy) Date: Mon, 30 Jan 2023 20:07:51 +0000 Subject: [petsc-users] Kronecker Product In-Reply-To: <074DC183-D3A2-45D0-9181-79C67F034456@petsc.dev> References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> <074DC183-D3A2-45D0-9181-79C67F034456@petsc.dev> Message-ID: I would need the Kronecker product to be explicitly available to perform matrix exponentials. A and B are of order 5000, so not too large. I will give storing them on all ranks a shot. Thanks for the tips! Best, Tyler From: Barry Smith Date: Monday, January 30, 2023 at 12:01 PM To: Guglielmo, Tyler Hardy Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Kronecker Product What is large? If A and B have dimensions of 1000, then the Kronecker product is of size 1,000,000. Do you want the Kronecker product to be explicitly formed or just available as matrix vector products? If just explicitly available then I think you can just store sparse A (for example) completely on all ranks, 10,000 by 10,000 sparse matrix is small for sequential) while B is distributed. Barry On Jan 30, 2023, at 2:48 PM, Guglielmo, Tyler Hardy wrote: Both matrices (A and B) would be approximately the same size and large. The use case (for me at least) is to create several large sparse matrices which will be combined in various ways through Kronecker products. The combination happens at every time step in an evolution, so it really needs to be fast as well. I?m thinking mpi/petsc is probably not the most optimal way for dealing with this, and might just have to work with single node multi-threading. Best, Tyler From: Matthew Knepley > Date: Monday, January 30, 2023 at 11:31 AM To: Guglielmo, Tyler Hardy > Cc: Barry Smith >, petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users > wrote: Thanks Barry, I saw that function, but wasn?t sure how to apply it since the documentation says that S and T are dense matrices, but in my case all matrices involved are sparse. Is there a way to work around the dense requirement? We don't have parallel sparse-sparse. It would not be too hard to write, but it would be some work. It is hard to understand the use case. Is one matrix much smaller? If not, and you inherit the distribution from A, it seems like it might be very suboptimal, and otherwise you would have to redistribute on the fly and it would get very complicated. Thanks, Matt Best, Tyler From: Barry Smith > Date: Monday, January 30, 2023 at 11:12 AM To: Guglielmo, Tyler Hardy > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij Barry On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users > wrote: Hi all, I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! Best, Tyler +++++++++++++++++++++++++++++ Tyler Guglielmo Postdoctoral Researcher Lawrence Livermore National Lab Office: 925-423-6186 Cell: 210-480-8000 +++++++++++++++++++++++++++++ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 30 14:24:02 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Jan 2023 15:24:02 -0500 Subject: [petsc-users] Kronecker Product In-Reply-To: References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> <074DC183-D3A2-45D0-9181-79C67F034456@petsc.dev> Message-ID: On Mon, Jan 30, 2023 at 3:08 PM Guglielmo, Tyler Hardy via petsc-users < petsc-users at mcs.anl.gov> wrote: > I would need the Kronecker product to be explicitly available to perform > matrix exponentials. A and B are of order 5000, so not too large. I will > give storing them on all ranks a shot. Thanks for the tips! > Were you going to do exponentials by explicit factorization? For large matrices, I thought it was common to use matrix-free methods ( https://slepc.upv.es/documentation/current/docs/manualpages/MFN/index.html) Thanks, Matt > > > Best, > > Tyler > > > > *From: *Barry Smith > *Date: *Monday, January 30, 2023 at 12:01 PM > *To: *Guglielmo, Tyler Hardy > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Kronecker Product > > > > What is large? If A and B have dimensions of 1000, then the Kronecker > product is of size 1,000,000. Do you want the Kronecker product to be > explicitly formed or just available as matrix vector products? If just > explicitly available then I think you can just store sparse A (for example) > completely on all ranks, 10,000 by 10,000 sparse matrix is small for > sequential) while B is distributed. > > > > Barry > > > > > > On Jan 30, 2023, at 2:48 PM, Guglielmo, Tyler Hardy > wrote: > > > > Both matrices (A and B) would be approximately the same size and large. > The use case (for me at least) is to create several large sparse matrices > which will be combined in various ways through Kronecker products. The > combination happens at every time step in an evolution, so it really needs > to be fast as well. I?m thinking mpi/petsc is probably not the most > optimal way for dealing with this, and might just have to work with single > node multi-threading. > > > > Best, > > Tyler > > > > *From: *Matthew Knepley > *Date: *Monday, January 30, 2023 at 11:31 AM > *To: *Guglielmo, Tyler Hardy > *Cc: *Barry Smith , petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] Kronecker Product > > On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Thanks Barry, > > > > I saw that function, but wasn?t sure how to apply it since the > documentation says that S and T are dense matrices, but in my case all > matrices involved are sparse. Is there a way to work around the dense > requirement? > > > > We don't have parallel sparse-sparse. It would not be too hard to write, > but it would be some work. > > > > It is hard to understand the use case. Is one matrix much smaller? If not, > and you inherit the distribution from A, it seems > > like it might be very suboptimal, and otherwise you would have to > redistribute on the fly and it would get very complicated. > > > > Thanks, > > > > Matt > > > > Best, > > Tyler > > > > *From: *Barry Smith > *Date: *Monday, January 30, 2023 at 11:12 AM > *To: *Guglielmo, Tyler Hardy > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Kronecker Product > > > > Do you need the explicit sparse representation of the Kronecker > product? Or do you want to apply it as an operator or solve systems with > it? If the latter you can use > https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij > > > > > Barry > > > > > > > > On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi all, > > > > I am wondering if there is any functionality for taking Kronecker products > of large sparse matrices that are parallel? MatSeqAIJKron is as close as I > have found, but it seems like this does not work for parallel matrices. > Any ideas here? > > > > An option could be to make A and B sequential, compute the Kronecker > product, C, then scatter C into a parallel matrix? This seems like a > horribly inefficient procedure. I?m still fairly new to petsc, so thanks > for patience :)! > > > > Best, > > Tyler > > > > +++++++++++++++++++++++++++++ > > Tyler Guglielmo > > Postdoctoral Researcher > > Lawrence Livermore National Lab > > Office: 925-423-6186 > > Cell: 210-480-8000 > > +++++++++++++++++++++++++++++ > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guglielmo2 at llnl.gov Mon Jan 30 15:23:04 2023 From: guglielmo2 at llnl.gov (Guglielmo, Tyler Hardy) Date: Mon, 30 Jan 2023 21:23:04 +0000 Subject: [petsc-users] Kronecker Product In-Reply-To: References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> <074DC183-D3A2-45D0-9181-79C67F034456@petsc.dev> Message-ID: I have an implementation of the slepc MFN matrix exponential which is implicitly using ExpoKit. You have to supply a matrix into the Slepc MFN operator to set the problem up as far as I know. Tyler From: Matthew Knepley Date: Monday, January 30, 2023 at 12:24 PM To: Guglielmo, Tyler Hardy Cc: Barry Smith , petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Kronecker Product On Mon, Jan 30, 2023 at 3:08 PM Guglielmo, Tyler Hardy via petsc-users > wrote: I would need the Kronecker product to be explicitly available to perform matrix exponentials. A and B are of order 5000, so not too large. I will give storing them on all ranks a shot. Thanks for the tips! Were you going to do exponentials by explicit factorization? For large matrices, I thought it was common to use matrix-free methods (https://slepc.upv.es/documentation/current/docs/manualpages/MFN/index.html) Thanks, Matt Best, Tyler From: Barry Smith > Date: Monday, January 30, 2023 at 12:01 PM To: Guglielmo, Tyler Hardy > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product What is large? If A and B have dimensions of 1000, then the Kronecker product is of size 1,000,000. Do you want the Kronecker product to be explicitly formed or just available as matrix vector products? If just explicitly available then I think you can just store sparse A (for example) completely on all ranks, 10,000 by 10,000 sparse matrix is small for sequential) while B is distributed. Barry On Jan 30, 2023, at 2:48 PM, Guglielmo, Tyler Hardy > wrote: Both matrices (A and B) would be approximately the same size and large. The use case (for me at least) is to create several large sparse matrices which will be combined in various ways through Kronecker products. The combination happens at every time step in an evolution, so it really needs to be fast as well. I?m thinking mpi/petsc is probably not the most optimal way for dealing with this, and might just have to work with single node multi-threading. Best, Tyler From: Matthew Knepley > Date: Monday, January 30, 2023 at 11:31 AM To: Guglielmo, Tyler Hardy > Cc: Barry Smith >, petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users > wrote: Thanks Barry, I saw that function, but wasn?t sure how to apply it since the documentation says that S and T are dense matrices, but in my case all matrices involved are sparse. Is there a way to work around the dense requirement? We don't have parallel sparse-sparse. It would not be too hard to write, but it would be some work. It is hard to understand the use case. Is one matrix much smaller? If not, and you inherit the distribution from A, it seems like it might be very suboptimal, and otherwise you would have to redistribute on the fly and it would get very complicated. Thanks, Matt Best, Tyler From: Barry Smith > Date: Monday, January 30, 2023 at 11:12 AM To: Guglielmo, Tyler Hardy > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij Barry On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users > wrote: Hi all, I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! Best, Tyler +++++++++++++++++++++++++++++ Tyler Guglielmo Postdoctoral Researcher Lawrence Livermore National Lab Office: 925-423-6186 Cell: 210-480-8000 +++++++++++++++++++++++++++++ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Jan 30 16:30:06 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 30 Jan 2023 23:30:06 +0100 Subject: [petsc-users] Kronecker Product In-Reply-To: References: <5C1E5862-E7F5-4E9C-A7BD-DB8EAD19D2AB@petsc.dev> <074DC183-D3A2-45D0-9181-79C67F034456@petsc.dev> Message-ID: The matrix can be a shell matrix, only the matrix-vector product operation is required. Jose > El 30 ene 2023, a las 22:23, Guglielmo, Tyler Hardy via petsc-users escribi?: > > I have an implementation of the slepc MFN matrix exponential which is implicitly using ExpoKit. You have to supply a matrix into the Slepc MFN operator to set the problem up as far as I know. > > Tyler > > From: Matthew Knepley > Date: Monday, January 30, 2023 at 12:24 PM > To: Guglielmo, Tyler Hardy > Cc: Barry Smith , petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product > > On Mon, Jan 30, 2023 at 3:08 PM Guglielmo, Tyler Hardy via petsc-users wrote: > I would need the Kronecker product to be explicitly available to perform matrix exponentials. A and B are of order 5000, so not too large. I will give storing them on all ranks a shot. Thanks for the tips! > > Were you going to do exponentials by explicit factorization? For large matrices, I thought it was common to > use matrix-free methods (https://slepc.upv.es/documentation/current/docs/manualpages/MFN/index.html) > > Thanks, > > Matt > > > Best, > Tyler > > From: Barry Smith > Date: Monday, January 30, 2023 at 12:01 PM > To: Guglielmo, Tyler Hardy > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product > > > What is large? If A and B have dimensions of 1000, then the Kronecker product is of size 1,000,000. Do you want the Kronecker product to be explicitly formed or just available as matrix vector products? If just explicitly available then I think you can just store sparse A (for example) completely on all ranks, 10,000 by 10,000 sparse matrix is small for sequential) while B is distributed. > > Barry > > > > On Jan 30, 2023, at 2:48 PM, Guglielmo, Tyler Hardy wrote: > > Both matrices (A and B) would be approximately the same size and large. The use case (for me at least) is to create several large sparse matrices which will be combined in various ways through Kronecker products. The combination happens at every time step in an evolution, so it really needs to be fast as well. I?m thinking mpi/petsc is probably not the most optimal way for dealing with this, and might just have to work with single node multi-threading. > > Best, > Tyler > > From: Matthew Knepley > Date: Monday, January 30, 2023 at 11:31 AM > To: Guglielmo, Tyler Hardy > Cc: Barry Smith , petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product > > On Mon, Jan 30, 2023 at 2:24 PM Guglielmo, Tyler Hardy via petsc-users wrote: > Thanks Barry, > > I saw that function, but wasn?t sure how to apply it since the documentation says that S and T are dense matrices, but in my case all matrices involved are sparse. Is there a way to work around the dense requirement? > > We don't have parallel sparse-sparse. It would not be too hard to write, but it would be some work. > > It is hard to understand the use case. Is one matrix much smaller? If not, and you inherit the distribution from A, it seems > like it might be very suboptimal, and otherwise you would have to redistribute on the fly and it would get very complicated. > > Thanks, > > Matt > > Best, > Tyler > > From: Barry Smith > Date: Monday, January 30, 2023 at 11:12 AM > To: Guglielmo, Tyler Hardy > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Kronecker Product > > > Do you need the explicit sparse representation of the Kronecker product? Or do you want to apply it as an operator or solve systems with it? If the latter you can use https://petsc.org/release/docs/manualpages/Mat/MatCreateKAIJ/#matcreatekaij > > Barry > > > > > > On Jan 30, 2023, at 12:53 PM, Guglielmo, Tyler Hardy via petsc-users wrote: > > Hi all, > > I am wondering if there is any functionality for taking Kronecker products of large sparse matrices that are parallel? MatSeqAIJKron is as close as I have found, but it seems like this does not work for parallel matrices. Any ideas here? > > An option could be to make A and B sequential, compute the Kronecker product, C, then scatter C into a parallel matrix? This seems like a horribly inefficient procedure. I?m still fairly new to petsc, so thanks for patience :)! > > Best, > Tyler > > +++++++++++++++++++++++++++++ > Tyler Guglielmo > Postdoctoral Researcher > Lawrence Livermore National Lab > Office: 925-423-6186 > Cell: 210-480-8000 > +++++++++++++++++++++++++++++ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From adantra at gmail.com Mon Jan 30 17:56:11 2023 From: adantra at gmail.com (Adolfo Rodriguez) Date: Mon, 30 Jan 2023 17:56:11 -0600 Subject: [petsc-users] Composite preconditioners in petsc4py Message-ID: Hi, how do you use composite preconditioners in petsc4py. I used to have a script that worked but it does not work anymore. Something has changed. Any simple examples? Regards, Adolfo -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Tue Jan 31 04:48:33 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Tue, 31 Jan 2023 19:48:33 +0900 Subject: [petsc-users] Question about error Message-ID: Hello, There are different results from different commands. However, it should be computed same result between different commands. The first command is just as below './app' (app is my petsc application) and the results are as below. [image: image.png] And the second command is just as below 'mpiexec -n 1 ./app' and I got the result what I want. Actually first of all, I suspected my a1.c:195. However there is no error. How can I fix this problem?? I don't know where to start. We continue to investigate from the input data to the structure, but we are unable to find any suspicious parts. Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 284401 bytes Desc: not available URL: From knepley at gmail.com Tue Jan 31 06:55:29 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 31 Jan 2023 07:55:29 -0500 Subject: [petsc-users] Composite preconditioners in petsc4py In-Reply-To: References: Message-ID: On Mon, Jan 30, 2023 at 6:56 PM Adolfo Rodriguez wrote: > Hi, > > how do you use composite preconditioners in petsc4py. I used to have a > script that worked but it does not work anymore. Something has changed. Any > simple examples? > Do you mean using PCCOMPOSITE? Thanks, Matt > Regards, > > Adolfo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jan 31 06:58:02 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 31 Jan 2023 07:58:02 -0500 Subject: [petsc-users] Question about error In-Reply-To: References: Message-ID: On Tue, Jan 31, 2023 at 5:49 AM ??? wrote: > Hello, > > > There are different results from different commands. > However, it should be computed same result between different commands. > > The first command is just as below > './app' (app is my petsc application) > and the results are as below. > [image: image.png] > > And the second command is just as below > 'mpiexec -n 1 ./app' > and I got the result what I want. > > Actually first of all, I suspected my a1.c:195. However there is no error. > How can I fix this problem?? > I don't know where to start. > We continue to investigate from the input data to the structure, but we > are unable to find any suspicious parts. > What MPI are you using? It is possible that it requires you to run with mpiexec. To track down the error, you can run with -start_in_debugger, and when you get the signal, print out the stack. Thanks, Matt > Thanks, > Hyung Kim > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 284401 bytes Desc: not available URL: From adantra at gmail.com Tue Jan 31 07:44:21 2023 From: adantra at gmail.com (Adolfo Rodriguez) Date: Tue, 31 Jan 2023 13:44:21 +0000 Subject: [petsc-users] Composite preconditioners in petsc4py In-Reply-To: References: Message-ID: Matthew, Yes, exactly: PCCOMPOSITR. Get Outlook for iOS ________________________________ From: Matthew Knepley Sent: Tuesday, January 31, 2023 6:55:29 AM To: Adolfo Rodriguez Cc: petsc-users Subject: Re: [petsc-users] Composite preconditioners in petsc4py On Mon, Jan 30, 2023 at 6:56 PM Adolfo Rodriguez > wrote: Hi, how do you use composite preconditioners in petsc4py. I used to have a script that worked but it does not work anymore. Something has changed. Any simple examples? Do you mean using PCCOMPOSITE? Thanks, Matt Regards, Adolfo -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jan 31 09:13:26 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 31 Jan 2023 10:13:26 -0500 Subject: [petsc-users] Composite preconditioners in petsc4py In-Reply-To: References: Message-ID: On Tue, Jan 31, 2023 at 8:44 AM Adolfo Rodriguez wrote: > Matthew, > > Yes, exactly: PCCOMPOSITR. > The list of options for it is here: https://petsc.org/main/docs/manualpages/PC/PCCOMPOSITE/ Is there a particular thing you want to do? Thanks, Matt > Get Outlook for iOS > ------------------------------ > *From:* Matthew Knepley > *Sent:* Tuesday, January 31, 2023 6:55:29 AM > *To:* Adolfo Rodriguez > *Cc:* petsc-users > *Subject:* Re: [petsc-users] Composite preconditioners in petsc4py > > On Mon, Jan 30, 2023 at 6:56 PM Adolfo Rodriguez > wrote: > > Hi, > > how do you use composite preconditioners in petsc4py. I used to have a > script that worked but it does not work anymore. Something has changed. Any > simple examples? > > > Do you mean using PCCOMPOSITE? > > Thanks, > > Matt > > > Regards, > > Adolfo > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.stone at opengosim.com Tue Jan 31 10:11:23 2023 From: daniel.stone at opengosim.com (Daniel Stone) Date: Tue, 31 Jan 2023 16:11:23 +0000 Subject: [petsc-users] win32fe? Message-ID: Hello all, I am currently having to figure out a way to get petsc working on windows, using compilers from the intel oneAPI package. This means new compiler names, such as "icx" for the c compiler and "ifx" for the fortran one. I see from the installation instructions, and from old notes from a collegue, how to set up the cygwin environment and to use, e.g., --with-cc="win32fe icl" (if using an older intel compiler) when configuring. Unfortunately,win32fe only workes with a number of fixed (older) compilers, and simply cannot be made to work with icx or ifx, as far as I can see. When I try to not use the compilers without win32fe (e.g. --withcc=icx"), I get a slew of problems when the configure script trys to compile and link test c programs. It seems that icx works fine, but gets extemely confused when having to target output files to, e.g., /tmp or /anywhere/non/trivial, when running under cygwin. Is this the sort of problem that win32fe is intended to solve? I can't find any source code for win32fe, nor really any explainations of what it's supposed to be doing or how it works. Since it looks like I have to find a way of making it work with icx and ifx, I'm a bit stuck. Can anyone provide any insight as to what/how win32 does, or where I can look for some? Many thanks, Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Jan 31 11:32:13 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 31 Jan 2023 11:32:13 -0600 (CST) Subject: [petsc-users] win32fe? In-Reply-To: References: Message-ID: <57213448-6801-b436-aa29-0b94999bc3da@mcs.anl.gov> Assuming 'icx/ifx' compiler options are same as 'icl/ifort' - you can try: --with-cc='win32fe icl --use icx' --with-cxx='win32fe icl --use icx' --with-fc='win32e ifort --use ifx' Also check 'win32fe --help' However I think the above assumption [i.e icx is a drop in replacement for icl] is likely invalid - so this might not work. Note: petsc/main has shell wrappers for 'win32fe icl' - so one could add similar wrappers for icx - if the above would work. balay at p1 /home/balay/petsc (main =) $ cat lib/petsc/bin/win32fe/win_cl #!/usr/bin/env sh # # Wrapper for Microsoft Windows cl using win32fe as a full path compiler # p=`dirname $0` ${p}/win32fe cl $* The sources for win32fe are at https://bitbucket.org/petsc/win32fe And 'win32fe icl --verbose' is useful to understand the commands its invoking internally. Satish On Tue, 31 Jan 2023, Daniel Stone wrote: > Hello all, > > I am currently having to figure out a way to get petsc working on windows, > using compilers from the intel oneAPI package. This means new compiler > names, such as "icx" for the c compiler and "ifx" for the fortran one. > > I see from the installation instructions, and from old notes from a > collegue, how to set up the cygwin environment and to use, e.g., > --with-cc="win32fe icl" (if using an older intel compiler) when configuring. > > Unfortunately,win32fe only workes with a number of fixed (older) compilers, > and simply cannot be made to work with icx or ifx, as far as I can see. > When I try to not use the compilers without win32fe (e.g. --withcc=icx"), > I get a slew of problems when the configure script trys to compile and link > test c programs. It seems that icx works fine, but gets extemely confused > when having to target output files to, e.g., /tmp or /anywhere/non/trivial, > when running under cygwin. Is this the sort of problem that win32fe is > intended to solve? > > I can't find any source code for win32fe, nor really any explainations of > what it's supposed to be doing or how it works. Since it looks like I have > to find a way of making it work with icx and ifx, I'm a bit stuck. Can > anyone > provide any insight as to what/how win32 does, or where I can look for some? > > Many thanks, > > Daniel > From balay at mcs.anl.gov Tue Jan 31 11:49:37 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 31 Jan 2023 11:49:37 -0600 (CST) Subject: [petsc-users] win32fe? In-Reply-To: <57213448-6801-b436-aa29-0b94999bc3da@mcs.anl.gov> References: <57213448-6801-b436-aa29-0b94999bc3da@mcs.anl.gov> Message-ID: <04a8cac9-f898-f7f1-fdf8-95ad008dd21c@mcs.anl.gov> Some additional notes on win32fe: - PETSc uses cygwin python/make/shell etc for build tools - i.e they all work with cygwin (aka unix) paths. However cl,icl etc work with MS native paths. So win32fe parses the compiler options and converts the cygwin paths to native paths before invoking windows compilers. - And the compile targets used [in makefiles] assume compilers behave like gcc (most unix compilers - essentially behave the same way - except for some differences in compiler options). However MS compilers require additional syntax restrictions - i.e options have to be set in a particular order. Win32fe does this reordering of compiler options before invoking cl. - And there are other quirks with MS build tools [like lib] that win32fe handles - and gives a unix 'ar' like interface Note: This was developed 20+ years ago - and only minor updates since then [as the primary developer has moved on] Satish On Tue, 31 Jan 2023, Satish Balay via petsc-users wrote: > Assuming 'icx/ifx' compiler options are same as 'icl/ifort' - you can try: > > --with-cc='win32fe icl --use icx' --with-cxx='win32fe icl --use icx' --with-fc='win32e ifort --use ifx' > > Also check 'win32fe --help' > > However I think the above assumption [i.e icx is a drop in replacement for icl] is likely invalid - so this might not work. > > Note: petsc/main has shell wrappers for 'win32fe icl' - so one could add similar wrappers for icx - if the above would work. > > balay at p1 /home/balay/petsc (main =) > $ cat lib/petsc/bin/win32fe/win_cl > #!/usr/bin/env sh > # > # Wrapper for Microsoft Windows cl using win32fe as a full path compiler > # > p=`dirname $0` > ${p}/win32fe cl $* > > > The sources for win32fe are at https://bitbucket.org/petsc/win32fe > > And 'win32fe icl --verbose' is useful to understand the commands its invoking internally. > > Satish > > On Tue, 31 Jan 2023, Daniel Stone wrote: > > > Hello all, > > > > I am currently having to figure out a way to get petsc working on windows, > > using compilers from the intel oneAPI package. This means new compiler > > names, such as "icx" for the c compiler and "ifx" for the fortran one. > > > > I see from the installation instructions, and from old notes from a > > collegue, how to set up the cygwin environment and to use, e.g., > > --with-cc="win32fe icl" (if using an older intel compiler) when configuring. > > > > Unfortunately,win32fe only workes with a number of fixed (older) compilers, > > and simply cannot be made to work with icx or ifx, as far as I can see. > > When I try to not use the compilers without win32fe (e.g. --withcc=icx"), > > I get a slew of problems when the configure script trys to compile and link > > test c programs. It seems that icx works fine, but gets extemely confused > > when having to target output files to, e.g., /tmp or /anywhere/non/trivial, > > when running under cygwin. Is this the sort of problem that win32fe is > > intended to solve? > > > > I can't find any source code for win32fe, nor really any explainations of > > what it's supposed to be doing or how it works. Since it looks like I have > > to find a way of making it work with icx and ifx, I'm a bit stuck. Can > > anyone > > provide any insight as to what/how win32 does, or where I can look for some? > > > > Many thanks, > > > > Daniel > > >