From facklerpw at ornl.gov Mon Oct 2 09:40:28 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Mon, 2 Oct 2023 14:40:28 +0000 Subject: [petsc-users] Unexpected performance losses switching to COO interface Message-ID: We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration). Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible. The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this. The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated. The tarballs linked here and here are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction. Thanks for your help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: TotalSolve.png Type: image/png Size: 15036 bytes Desc: TotalSolve.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RHSJacobian.png Type: image/png Size: 16082 bytes Desc: RHSJacobian.png URL: From junchao.zhang at gmail.com Mon Oct 2 09:52:41 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 2 Oct 2023 09:52:41 -0500 Subject: [petsc-users] Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Hi, Philip, I will look into the tarballs and get back to you. Thanks. --Junchao Zhang On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users < petsc-users at mcs.anl.gov> wrote: > We finally have xolotl ported to use the new COO interface and the > aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port > to our previous version (using MatSetValuesStencil and the default Mat and > Vec implementations), we expected to see an improvement in performance for > both the "serial" and "cuda" builds (here I'm referring to the kokkos > configuration). > > Attached are two plots that show timings for three different cases. All of > these were run on Ascent (the Summit-like training system) with 6 MPI tasks > (on a single node). The CUDA cases were given one GPU per task (and used > CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases > we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent > as possible. > > The performance of RHSJacobian (where the bulk of computation happens in > xolotl) behaved basically as expected (better than expected in the serial > build). NE_3 case in CUDA was the only one that performed worse, but not > surprisingly, since its workload for the GPUs is much smaller. We've still > got more optimization to do on this. > > The real surprise was how much worse the overall solve times were. This > seems to be due simply to switching to the kokkos-based implementation. I'm > wondering if there are any changes we can make in configuration or runtime > arguments to help with PETSc's performance here. Any help looking into this > would be appreciated. > > The tarballs linked here > > and here > > are profiling databases which, once extracted, can be viewed with > hpcviewer. I don't know how helpful that will be, but hopefully it can give > you some direction. > > Thanks for your help, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Tue Oct 3 05:05:22 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Tue, 3 Oct 2023 12:05:22 +0200 Subject: [petsc-users] Concatenation of local-to-global matrix Message-ID: <83F9C3F4-CA98-45F1-ADBB-EB58588B3AC0@corintis.com> I am trying to multiply a sequential PETsc matrix with an mpi PETSc matrix in parallel. The final step is to concatenate the product matrix, which is a local sequential PETSc matrix that is different for every proc, so that I get the full mpi matrix as a result. This has proven to work, but setting the values one by one using a loop is very inefficient and slow. In the following MFE, I am trying to make this concatenation more efficient by setting the values in batches. However it doesn?t work and I am wondering why: """Experimenting with PETSc mat-mat multiplication""" import time import numpy as np from colorama import Fore from firedrake import COMM_SELF, COMM_WORLD from firedrake.petsc import PETSc from mpi4py import MPI from numpy.testing import assert_array_almost_equal from utilities import Print size = COMM_WORLD.size rank = COMM_WORLD.rank def create_petsc_matrix(input_array, sparse=True): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: PETSc mat: PETSc matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape size = ((None, global_rows), (global_cols, global_cols)) # Create a sparse or dense matrix based on the 'sparse' argument if sparse: matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def get_local_submatrix(A): """Get the local submatrix of A Args: A (mpi PETSc mat): partitioned PETSc matrix Returns: seq mat: PETSc matrix """ local_rows_start, local_rows_end = A.getOwnershipRange() local_rows = local_rows_end - local_rows_start comm = A.getComm() rows = PETSc.IS().createStride( local_rows, first=local_rows_start, step=1, comm=comm ) _, k = A.getSize() # Get the number of columns (k) from A's size cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm) # Getting the local submatrix # TODO: To be replaced by MatMPIAIJGetLocalMat() in the future (see petsc-users mailing list). There is a missing petsc4py binding, need to add it myself (and please create a merge request) A_local = A.createSubMatrices(rows, cols)[0] return A_local def create_petsc_matrix_seq(input_array): """Building a sequential PETSc matrix from an array Args: input_array (np array): Input array Returns: seq mat: PETSc matrix """ assert len(input_array.shape) == 2 m, n = input_array.shape matrix = PETSc.Mat().createAIJ(size=(m, n), comm=COMM_SELF) matrix.setUp() matrix.setValues(range(m), range(n), input_array, addv=False) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def multiply_matrices_seq(A_seq, B_seq): """Multiply 2 sequential matrices Args: A_seq (seqaij): local submatrix of A B_seq (seqaij): sequential matrix B Returns: seq mat: PETSc matrix that is the product of A_seq and B_seq """ _, A_seq_cols = A_seq.getSize() B_seq_rows, _ = B_seq.getSize() assert ( A_seq_cols == B_seq_rows ), f"Incompatible matrix sizes for multiplication: {A_seq_cols} != {B_seq_rows}" C_local = A_seq.matMult(B_seq) return C_local def concatenate_local_to_global_matrix(local_matrix, mat_type=None): """Create the global matrix C from the local submatrix local_matrix Args: local_matrix (seqaij): local submatrix of global_matrix partition_like (mpiaij): partitioned PETSc matrix mat_type (str): type of the global matrix. Defaults to None. If None, the type of local_matrix is used. Returns: mpi PETSc mat: partitioned PETSc matrix """ local_matrix_rows, local_matrix_cols = local_matrix.getSize() global_rows = COMM_WORLD.allreduce(local_matrix_rows, op=MPI.SUM) # Determine the local portion of the vector size = ((None, global_rows), (local_matrix_cols, local_matrix_cols)) if mat_type is None: mat_type = local_matrix.getType() if "dense" in mat_type: sparse = False else: sparse = True if sparse: global_matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: global_matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) global_matrix.setUp() # The exscan operation is used to get the starting global row for each process. # The result of the exclusive scan is the sum of the local rows from previous ranks. global_row_start = COMM_WORLD.exscan(local_matrix_rows, op=MPI.SUM) if rank == 0: global_row_start = 0 concatenate_start = time.time() # This works but is very inefficient # for i in range(local_matrix_rows): # cols, values = local_matrix.getRow(i) # global_row = i + global_row_start # global_matrix.setValues(global_row, cols, values) all_cols = [] all_values = [] all_global_rows = [i + global_row_start for i in range(local_matrix_rows)] for i in range(len(all_global_rows)): cols, values = local_matrix.getRow(i) # print(f"cols: {cols}, values: {values}") all_cols.append(cols) all_values.append(values) for j in range(local_matrix_cols): columns = [all_cols[i][j] for i in range(len(all_cols))] values = [all_values[i][j] for i in range(len(all_values))] global_matrix.setValues(all_global_rows, columns, values) concatenate_end = time.time() Print( f" -Setting values: {concatenate_end - concatenate_start: 2.2f} s", Fore.GREEN, ) global_matrix.assemblyBegin() global_matrix.assemblyEnd() return global_matrix # -------------------------------------------- # EXP: Multiplication of an mpi PETSc matrix with a sequential PETSc matrix # C = A * B # [m x k] = [m x k] * [k x k] # -------------------------------------------- m, k = 11, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, k)) B_np = np.random.randint(low=0, high=6, size=(k, k)) # Create B as a sequential matrix on each process B_seq = create_petsc_matrix_seq(B_np) A = create_petsc_matrix(A_np) # Getting the correct local submatrix to be multiplied by B_seq A_local = get_local_submatrix(A) # Multiplication of 2 sequential matrices C_local = multiply_matrices_seq(A_local, B_seq) # Creating the global C matrix C = concatenate_local_to_global_matrix(C_local) if size > 1 else C_local # -------------------------------------------- # TEST: Multiplication of 2 numpy matrices # -------------------------------------------- AB_np = np.dot(A_np, B_np) Print(f"MATRIX AB_np [{AB_np.shape[0]}x{AB_np.shape[1]}]") Print(f"{AB_np}") # Get the local values from C local_rows_start, local_rows_end = C.getOwnershipRange() C_local = C.getValues(range(local_rows_start, local_rows_end), range(k)) # Assert the correctness of the multiplication for the local subset assert_array_almost_equal(C_local, AB_np[local_rows_start:local_rows_end, :], decimal=5) Any idea how to fix this? Thanks, Thanos -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 3 08:07:36 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 3 Oct 2023 09:07:36 -0400 Subject: [petsc-users] Concatenation of local-to-global matrix In-Reply-To: <83F9C3F4-CA98-45F1-ADBB-EB58588B3AC0@corintis.com> References: <83F9C3F4-CA98-45F1-ADBB-EB58588B3AC0@corintis.com> Message-ID: Take a look at MatCreateMPIMatConcatenateSeqMat_MPIAIJ() in src/mat/impls/aij/mpi/mpiaij.c In that file you will find several routines similar to what you are building. Note the preallocation: MatPreallocateBegin(comm, m, n, dnz, onz); for (i = 0; i < m; i++) { PetscCall(MatGetRow_SeqAIJ(inmat, i, &nnz, &indx, NULL)); PetscCall(MatPreallocateSet(i + rstart, nnz, indx, dnz, onz)); PetscCall(MatRestoreRow_SeqAIJ(inmat, i, &nnz, &indx, NULL)); } ... PetscCall(MatSeqAIJSetPreallocation(*outmat, 0, dnz)); PetscCall(MatMPIAIJSetPreallocation(*outmat, 0, dnz, 0, onz)); Probably best to reuse the C code than have slower Python code. > On Oct 3, 2023, at 6:05 AM, Thanasis Boutsikakis wrote: > > I am trying to multiply a sequential PETsc matrix with an mpi PETSc matrix in parallel. The final step is to concatenate the product matrix, which is a local sequential PETSc matrix that is different for every proc, so that I get the full mpi matrix as a result. This has proven to work, but setting the values one by one using a loop is very inefficient and slow. > > In the following MFE, I am trying to make this concatenation more efficient by setting the values in batches. However it doesn?t work and I am wondering why: > > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import Print > > size = COMM_WORLD.size > rank = COMM_WORLD.rank > > def create_petsc_matrix(input_array, sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > def get_local_submatrix(A): > """Get the local submatrix of A > > Args: > A (mpi PETSc mat): partitioned PETSc matrix > > Returns: > seq mat: PETSc matrix > """ > local_rows_start, local_rows_end = A.getOwnershipRange() > local_rows = local_rows_end - local_rows_start > comm = A.getComm() > rows = PETSc.IS().createStride( > local_rows, first=local_rows_start, step=1, comm=comm > ) > _, k = A.getSize() # Get the number of columns (k) from A's size > cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm) > > # Getting the local submatrix > # TODO: To be replaced by MatMPIAIJGetLocalMat() in the future (see petsc-users mailing list). There is a missing petsc4py binding, need to add it myself (and please create a merge request) > A_local = A.createSubMatrices(rows, cols)[0] > return A_local > > > def create_petsc_matrix_seq(input_array): > """Building a sequential PETSc matrix from an array > > Args: > input_array (np array): Input array > > Returns: > seq mat: PETSc matrix > """ > assert len(input_array.shape) == 2 > > m, n = input_array.shape > matrix = PETSc.Mat().createAIJ(size=(m, n), comm=COMM_SELF) > matrix.setUp() > > matrix.setValues(range(m), range(n), input_array, addv=False) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > def multiply_matrices_seq(A_seq, B_seq): > """Multiply 2 sequential matrices > > Args: > A_seq (seqaij): local submatrix of A > B_seq (seqaij): sequential matrix B > > Returns: > seq mat: PETSc matrix that is the product of A_seq and B_seq > """ > _, A_seq_cols = A_seq.getSize() > B_seq_rows, _ = B_seq.getSize() > assert ( > A_seq_cols == B_seq_rows > ), f"Incompatible matrix sizes for multiplication: {A_seq_cols} != {B_seq_rows}" > C_local = A_seq.matMult(B_seq) > return C_local > > > def concatenate_local_to_global_matrix(local_matrix, mat_type=None): > """Create the global matrix C from the local submatrix local_matrix > > Args: > local_matrix (seqaij): local submatrix of global_matrix > partition_like (mpiaij): partitioned PETSc matrix > mat_type (str): type of the global matrix. Defaults to None. If None, the type of local_matrix is used. > > Returns: > mpi PETSc mat: partitioned PETSc matrix > """ > local_matrix_rows, local_matrix_cols = local_matrix.getSize() > global_rows = COMM_WORLD.allreduce(local_matrix_rows, op=MPI.SUM) > > # Determine the local portion of the vector > size = ((None, global_rows), (local_matrix_cols, local_matrix_cols)) > > if mat_type is None: > mat_type = local_matrix.getType() > > if "dense" in mat_type: > sparse = False > else: > sparse = True > > if sparse: > global_matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > global_matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > global_matrix.setUp() > > # The exscan operation is used to get the starting global row for each process. > # The result of the exclusive scan is the sum of the local rows from previous ranks. > global_row_start = COMM_WORLD.exscan(local_matrix_rows, op=MPI.SUM) > if rank == 0: > global_row_start = 0 > > concatenate_start = time.time() > > # This works but is very inefficient > # for i in range(local_matrix_rows): > # cols, values = local_matrix.getRow(i) > # global_row = i + global_row_start > # global_matrix.setValues(global_row, cols, values) > > all_cols = [] > all_values = [] > all_global_rows = [i + global_row_start for i in range(local_matrix_rows)] > > for i in range(len(all_global_rows)): > cols, values = local_matrix.getRow(i) > # print(f"cols: {cols}, values: {values}") > all_cols.append(cols) > all_values.append(values) > > for j in range(local_matrix_cols): > columns = [all_cols[i][j] for i in range(len(all_cols))] > values = [all_values[i][j] for i in range(len(all_values))] > > global_matrix.setValues(all_global_rows, columns, values) > > concatenate_end = time.time() > Print( > f" -Setting values: {concatenate_end - concatenate_start: 2.2f} s", > Fore.GREEN, > ) > > global_matrix.assemblyBegin() > global_matrix.assemblyEnd() > > return global_matrix > > > # -------------------------------------------- > # EXP: Multiplication of an mpi PETSc matrix with a sequential PETSc matrix > # C = A * B > # [m x k] = [m x k] * [k x k] > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, k)) > B_np = np.random.randint(low=0, high=6, size=(k, k)) > > # Create B as a sequential matrix on each process > B_seq = create_petsc_matrix_seq(B_np) > > A = create_petsc_matrix(A_np) > > # Getting the correct local submatrix to be multiplied by B_seq > A_local = get_local_submatrix(A) > > # Multiplication of 2 sequential matrices > C_local = multiply_matrices_seq(A_local, B_seq) > > # Creating the global C matrix > C = concatenate_local_to_global_matrix(C_local) if size > 1 else C_local > > # -------------------------------------------- > # TEST: Multiplication of 2 numpy matrices > # -------------------------------------------- > AB_np = np.dot(A_np, B_np) > Print(f"MATRIX AB_np [{AB_np.shape[0]}x{AB_np.shape[1]}]") > Print(f"{AB_np}") > > # Get the local values from C > local_rows_start, local_rows_end = C.getOwnershipRange() > C_local = C.getValues(range(local_rows_start, local_rows_end), range(k)) > > # Assert the correctness of the multiplication for the local subset > assert_array_almost_equal(C_local, AB_np[local_rows_start:local_rows_end, :], decimal=5) > > > > Any idea how to fix this? > > Thanks, > Thanos > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gongding at cn.cogenda.com Tue Oct 3 12:51:38 2023 From: gongding at cn.cogenda.com (Gong Ding) Date: Wed, 4 Oct 2023 01:51:38 +0800 Subject: [petsc-users] How to do a precondition in SNES flow Message-ID: Hi all, I'd like to do a? special jacobian precondition during the snes iteration, for which jacobian matrix and RHS vector must be modified explicitly. In the SNESComputeJacobian, the preconditioner P is built after assembly of jacobian matrix. I need to multiply P to J and RHS vector? explicitly as left precondition before the solve stage of J*dx = rhs. However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. Any suggest? Gong Ding -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 3 13:13:32 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 3 Oct 2023 14:13:32 -0400 Subject: [petsc-users] How to do a precondition in SNES flow In-Reply-To: References: Message-ID: <7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev> Simply evaluate the Jacobian during your SNESComputeFunction and save it for SNESComputeJacobian. > On Oct 3, 2023, at 1:51 PM, Gong Ding wrote: > > Hi all, > > I'd like to do a special jacobian precondition during the snes iteration, for which jacobian matrix and RHS vector must be modified explicitly. > > In the SNESComputeJacobian, the preconditioner P is built after assembly of jacobian matrix. > > I need to multiply P to J and RHS vector explicitly as left precondition before the solve stage of J*dx = rhs. > > However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. > > As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. > > Any suggest? > > Gong Ding -------------- next part -------------- An HTML attachment was scrubbed... URL: From gongding at cn.cogenda.com Tue Oct 3 13:44:31 2023 From: gongding at cn.cogenda.com (Gong Ding) Date: Wed, 4 Oct 2023 02:44:31 +0800 Subject: [petsc-users] How to do a precondition in SNES flow In-Reply-To: <7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev> References: <7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev> Message-ID: Any better choice if I can do right precondition? Merge Jacobian to Function evaluation has performance lost. On 2023/10/4 02:13, Barry Smith wrote: > > ? Simply evaluate the Jacobian during your SNESComputeFunction and > save it for SNESComputeJacobian. > > > >> On Oct 3, 2023, at 1:51 PM, Gong Ding wrote: >> >> Hi all, >> >> I'd like to do a? special jacobian precondition during the snes >> iteration, for which jacobian matrix and RHS vector must be modified >> explicitly. >> >> In the SNESComputeJacobian, the preconditioner P is built after >> assembly of jacobian matrix. >> >> I need to multiply P to J and RHS vector? explicitly as left >> precondition before the solve stage of J*dx = rhs. >> >> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. >> >> As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. >> >> Any suggest? >> >> Gong Ding > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 3 13:47:49 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Oct 2023 14:47:49 -0400 Subject: [petsc-users] How to do a precondition in SNES flow In-Reply-To: References: Message-ID: On Tue, Oct 3, 2023 at 1:51?PM Gong Ding wrote: > Hi all, > > I'd like to do a special jacobian precondition during the snes iteration, > for which jacobian matrix and RHS vector must be modified explicitly. > > In the SNESComputeJacobian, the preconditioner P is built after assembly > of jacobian matrix. > > I need to multiply P to J and RHS vector explicitly as left precondition > before the solve stage of J*dx = rhs. > > What you are proposing is exactly what PETSc does with left preconditioning, multiplies both sides by the preconditioner. What do you want to change? Thanks, Matt > However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. > > As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. > > Any suggest? > > Gong Ding > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 3 13:57:09 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 3 Oct 2023 14:57:09 -0400 Subject: [petsc-users] How to do a precondition in SNES flow In-Reply-To: References: <7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev> Message-ID: > On Oct 3, 2023, at 2:44 PM, Gong Ding wrote: > > Any better choice if I can do right precondition? > > Merge Jacobian to Function evaluation has performance lost. > Why ? Should still just compute the Jacobian once, just earlier in the process. > On 2023/10/4 02:13, Barry Smith wrote: >> >> Simply evaluate the Jacobian during your SNESComputeFunction and save it for SNESComputeJacobian. >> >> >> >>> On Oct 3, 2023, at 1:51 PM, Gong Ding wrote: >>> >>> Hi all, >>> >>> I'd like to do a special jacobian precondition during the snes iteration, for which jacobian matrix and RHS vector must be modified explicitly. >>> >>> In the SNESComputeJacobian, the preconditioner P is built after assembly of jacobian matrix. >>> >>> I need to multiply P to J and RHS vector explicitly as left precondition before the solve stage of J*dx = rhs. >>> >>> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. >>> >>> As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. >>> >>> Any suggest? >>> >>> Gong Ding >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gongding at cn.cogenda.com Tue Oct 3 14:04:50 2023 From: gongding at cn.cogenda.com (Gong Ding) Date: Wed, 4 Oct 2023 03:04:50 +0800 Subject: [petsc-users] How to do a precondition in SNES flow In-Reply-To: References: Message-ID: <739f4aec-9da4-623d-8d48-973eeab4193e@cn.cogenda.com> On 2023/10/4 02:47, Matthew Knepley wrote: > On Tue, Oct 3, 2023 at 1:51?PM Gong Ding wrote: > > Hi all, > > I'd like to do a? special jacobian precondition during the snes > iteration, for which jacobian matrix and RHS vector must be > modified explicitly. > > In the SNESComputeJacobian, the preconditioner P is built after > assembly of jacobian matrix. > > I need to multiply P to J and RHS vector? explicitly as left > precondition before the solve stage of J*dx = rhs. > > What you are proposing is exactly what PETSc does with left > preconditioning, multiplies both sides by the preconditioner. What do > you want to change? I'd like to multiply precondition matrix into jacobian matrix, and do LU factorization to jacobian matrix. not with iterative method. Something like Kelley, C. T. "Newton's Method in Three Precisions." /arXiv preprint arXiv:2307.16051/ (2023). BTW: does petsc have the plan to support multi-precision? > > ? Thanks, > > ? ? ?Matt > > However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. > > As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. > > Any suggest? > > Gong Ding > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 3 14:32:27 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Oct 2023 15:32:27 -0400 Subject: [petsc-users] How to do a precondition in SNES flow In-Reply-To: <739f4aec-9da4-623d-8d48-973eeab4193e@cn.cogenda.com> References: <739f4aec-9da4-623d-8d48-973eeab4193e@cn.cogenda.com> Message-ID: On Tue, Oct 3, 2023 at 3:05?PM Gong Ding wrote: > On 2023/10/4 02:47, Matthew Knepley wrote: > > On Tue, Oct 3, 2023 at 1:51?PM Gong Ding wrote: > >> Hi all, >> >> I'd like to do a special jacobian precondition during the snes >> iteration, for which jacobian matrix and RHS vector must be modified >> explicitly. >> >> In the SNESComputeJacobian, the preconditioner P is built after assembly >> of jacobian matrix. >> >> I need to multiply P to J and RHS vector explicitly as left precondition >> before the solve stage of J*dx = rhs. >> >> What you are proposing is exactly what PETSc does with left > preconditioning, multiplies both sides by the preconditioner. What do you > want to change? > > I'd like to multiply precondition matrix into jacobian matrix, and do LU > factorization to jacobian matrix. not with iterative method. Something like > > Kelley, C. T. "Newton's Method in Three Precisions." *arXiv preprint > arXiv:2307.16051* (2023). > > BTW: does petsc have the plan to support multi-precision? > 1. Tim is just solving the Newton equation with LU. You can do this using -pc_type lu 2. We do not support this kind of multi-precision. We had a plan to do this, but no one to work on it. It does not seem to be a priority of users so far. Thanks, Matt > Thanks, > > Matt > >> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction. >> >> As a result, I must find a hook function after SNESComputeJacobian and before the solve stage. >> >> Any suggest? >> >> Gong Ding >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsosajones at oakland.edu Tue Oct 3 14:55:01 2023 From: gsosajones at oakland.edu (Giselle Sosa Jones) Date: Tue, 3 Oct 2023 15:55:01 -0400 Subject: [petsc-users] Scalapack issue Message-ID: Hello, I have a Mac with M1 chip and I struggled a lot to install PETSc on it. I did it eventually (thanks to your help), but with the latest MacOS update, things stopped working. I am trying to configure the latest version of PETSc, and I have the following error popping up: Cannot use scalapack without Fortran, make sure you do NOT have --with-fc=0 I have gfortran installed with brew. I am going to send my configure.log file to the other mailing list. Thank you for your help in advance. Best, Giselle -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kramer at imperial.ac.uk Tue Oct 3 23:30:38 2023 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Wed, 4 Oct 2023 15:30:38 +1100 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> <99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk> Message-ID: <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk> Hi Mark Thanks again for re-enabling the square graph aggressive coarsening option which seems to have restored performance for most of our cases. Unfortunately we do have a remaining issue, which only seems to occur for the larger mesh size ("level 7" which has 6,389,890 vertices and we normally run on 1536 cpus): we either get a "Petsc has generated inconsistent data" error, or a hang - both when constructing the square graph matrix. So this is with the new -pc_gamg_aggressive_square_graph=true option, without the option there's no error but of course we would get back to the worse performance. Backtrace for the "inconsistent data" error. Note this is actually just petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge of adams/gamg-add-old-coarsening into main - with one unrelated commit from firedrake [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: j 8 not equal to expected number of sends 9 [0]PETSC ERROR: Petsc Development GIT revision: v3.4.2-43104-ga3b76b71a1? GIT Date: 2023-09-18 10:26:04 +0100 [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a? named gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct? 4 14:30:41 2023 [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix --with-make-np=4 --with-debugging=0 --with-shared-libraries=1 --with-fortran-bindings=0 --with-zlib --with-c2html=0 --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --download-hdf5 --download-hypre --download-superlu_dist --download-ptscotch --download-suitesparse --download-pastix --download-hwloc --download-metis --download-scalapack --download-mumps --download-chaco --download-ml CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441 [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270 [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867 [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 [0]PETSC ERROR: #4 MatProductSymbolic() at /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 [0]PETSC ERROR: #7 PCSetUp_GAMG() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 [0]PETSC ERROR: #8 PCSetUp() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 [0]PETSC ERROR: #9 PCApply() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 [0]PETSC ERROR: #10 PCApply() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 [0]PETSC ERROR: #11 KSP_PCApply() at /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 [0]PETSC ERROR: #12 KSPSolve_CG() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 [0]PETSC ERROR: #13 KSPSolve_Private() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 [0]PETSC ERROR: #14 KSPSolve() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175 [0]PETSC ERROR: #16 PCApply() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 [0]PETSC ERROR: #17 KSP_PCApply() at /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 [0]PETSC ERROR: #18 KSPSolve_PREONLY() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25 [0]PETSC ERROR: #19 KSPSolve_Private() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 [0]PETSC ERROR: #20 KSPSolve() at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49 [0]PETSC ERROR: #22 SNESSolve() at /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635 Last -info :pc messages: [0] PCSetUp(): Setting up PC for first time [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0) N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536 [0] PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in graph (1.588710e+07 1.765233e+06) [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: Square Graph on level 1 [0] fixAggregatesWithSquare(): isMPI = yes [0] PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_: New grid 380144 nodes [0] PCGAMGOptProlongator_AGG(): Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00 min=9.015236e-02 PC=jacobi [0] PCGAMGOptProlongator_AGG(): Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra 0.0901524 4.48938 [0] PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_: Coarse grid reduction from 1536 to 1536 active processes [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1) N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes [0] PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in graph (5.310360e+05 5.353000e+03) [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: Square Graph on level 2 The hang (on a slightly different model configuration but on the same mesh and n/o cores) seems to occur in the same location. If I use gdb to attach to the running processes, it seems on some cores it has somehow manages to fall out of the pcsetup and is waiting in the first norm calculation in the outside CG iteration: #0? 0x000014cce9999119 in hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so #1? 0x000014ccef2c2737 in _coll_ml_allreduce () from /apps/hcoll/4.7.3202/lib/libhcoll.so.1 #2? 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1, rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 , op=0x14cd26cfbc20 , comm=0x3076fb0, module=0x43a0110) at /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228 #3? 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1, recvbuf=, count=1, datatype=, op=0x14cd26cfbc20 , comm=0x3076fb0) at pallreduce.c:113 #4? 0x000014cd271c9889 in VecNorm_MPI_Default (xin=, type=, z=, VecNorm_SeqFn=) at /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168 #5? VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39 #6? 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648, val=0x22d) at /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214 #7? 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163 etc. but with other cores still stuck at: #0? 0x000015375cf41e8a in ucp_worker_progress () from /apps/ucx/1.12.0/lib/libucp.so.0 #1? 0x000015377d4bd57b in opal_progress () at /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231 #2? 0x000015377d4c3ba5 in ompi_sync_wait_mt (sync=sync at entry=0x7ffd6aedf6f0) at /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85 #3? 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8, requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124 #4? 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0, indx=0x7ffd6aedfa60, status=) at pwaitany.c:86 #5? 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884 #6? 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ (C=0x2cc7500) at /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 #7? 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 #8? 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500, Gmat1=0x1, Gmat2=0xc0fe132c) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 #9? 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500, a_Gmat1=0x1, agg_lists=0xc0fe132c) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524, __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082 #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS (func=0x15378f302890, args=0x83b3218, nargsf=, kwnames=) at ../Objects/descrobject.c:405 #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, args=0x83b3218, callable=0x15378f302890, tstate=0x23e0020) at ../Include/cpython/abstract.h:114 #16 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x83b3218, callable=0x15378f302890) at ../Include/cpython/abstract.h:123 #17 call_function (kwnames=0x0, oparg=, pp_stack=, trace_info=0x7ffd6aee0390, tstate=) at ../Python/ceval.c:5867 #18 _PyEval_EvalFrameDefault (tstate=, f=, throwflag=) at ../Python/ceval.c:4198 #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080, tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46 #20 _PyEval_Vector (tstate=, con=, locals=, args=, argcount=4, kwnames=) at ../Python/ceval.c:5065 #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=, args=0x1, _nargs=, kwargs=) at src/petsc4py/PETSc.c:548022 #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500, __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979 #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1, y=0xc0fe132c) at /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 Let me know if there is anything further we can try to debug this issue Kind regards Stephan Kramer On 02/09/2023 01:58, Mark Adams wrote: > Fantastic! > > I fixed a memory free problem. You should be OK now. > I am pretty sure you are good but I would like to wait to get any feedback > from you. > We should have a release at the end of the month and it would be nice to > get this into it. > > Thanks, > Mark > > > On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer > wrote: > >> Hi Mark >> >> Sorry took a while to report back. We have tried your branch but hit a >> few issues, some of which we're not entirely sure are related. >> >> First switching off minimum degree ordering, and then switching to the >> old version of aggressive coarsening, as you suggested, got us back to >> the coarsening behaviour that we had previously, but then we also >> observed an even further worsening of the iteration count: it had >> previously gone up by 50% already (with the newer main petsc), but now >> was more than double "old" petsc. Took us a while to realize this was >> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. >> Switching this also back to the old default we get back to very similar >> coarsening levels (see below for more details if it is of interest) and >> iteration counts. >> >> So that's all very good news. However, we were also starting seeing >> memory errors (double free or corruption) when we switched off the >> minimum degree ordering. Because this was at an earlier version of your >> branch we then rebuild, hoping this was just an earlier bug that had >> been fixed, but then we were having MPI-lockup issues. We have now >> figured out the MPI issues are completely unrelated - some combination >> with a newer mpi build and firedrake on our cluster which also occur >> using main branches of everything. So switching back to an older MPI >> build we are hoping to now test your most recent version of >> adams/gamg-add-old-coarsening with these options and see whether the >> memory errors are still there. Will let you know >> >> Best wishes >> Stephan Kramer >> >> Coarsening details with various options for Level 6 of the test case: >> >> In our original setup (using "old" petsc), we had: >> >> rows=516, cols=516, bs=6 >> rows=12660, cols=12660, bs=6 >> rows=346974, cols=346974, bs=6 >> rows=19169670, cols=19169670, bs=3 >> >> Then with the newer main petsc we had >> >> rows=666, cols=666, bs=6 >> rows=7740, cols=7740, bs=6 >> rows=34902, cols=34902, bs=6 >> rows=736578, cols=736578, bs=6 >> rows=19169670, cols=19169670, bs=3 >> >> Then on your branch with minimum_degree_ordering False: >> >> rows=504, cols=504, bs=6 >> rows=2274, cols=2274, bs=6 >> rows=11010, cols=11010, bs=6 >> rows=35790, cols=35790, bs=6 >> rows=430686, cols=430686, bs=6 >> rows=19169670, cols=19169670, bs=3 >> >> And with minimum_degree_ordering False and use_aggressive_square_graph >> True: >> >> rows=498, cols=498, bs=6 >> rows=12672, cols=12672, bs=6 >> rows=346974, cols=346974, bs=6 >> rows=19169670, cols=19169670, bs=3 >> >> So that is indeed pretty much back to what it was before >> >> >> >> >> >> >> >> >> On 31/08/2023 23:40, Mark Adams wrote: >>> Hi Stephan, >>> >>> This branch is settling down. adams/gamg-add-old-coarsening >>> >>> I made the old, not minimum degree, ordering the default but kept the new >>> "aggressive" coarsening as the default, so I am hoping that just adding >>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests will >>> get you back to where you were before. >>> Fingers crossed ... let me know if you have any success or not. >>> >>> Thanks, >>> Mark >>> >>> >>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams wrote: >>> >>>> Hi Stephan, >>>> >>>> I have a branch that you can try: adams/gamg-add-old-coarsening >>>> >>> Things to test: >>>> * First, verify that nothing unintended changed by reproducing your bad >>>> results with this branch (the defaults are the same) >>>> * Try not using the minimum degree ordering that I suggested >>>> with: -pc_gamg_use_minimum_degree_ordering false >>>> -- I am eager to see if that is the main problem. >>>> * Go back to what I think is the old method: >>>> -pc_gamg_use_minimum_degree_ordering >>>> false -pc_gamg_use_aggressive_square_graph true >>>> >>>> When we get back to where you were, I would like to try to get modern >>>> stuff working. >>>> I did add a -pc_gamg_aggressive_mis_k <2> >>>> You could to another step of MIS coarsening with >> -pc_gamg_aggressive_mis_k >>>> 3 >>>> >>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters. >>>> >>>> Thanks, >>>> Mark >>>> >>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams wrote: >>>> >>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer < >> s.kramer at imperial.ac.uk> >>>>> wrote: >>>>> >>>>>> Many thanks for looking into this, Mark >>>>>>> My 3D tests were not that different and I see you lowered the >>>>>> threshold. >>>>>>> Note, you can set the threshold to zero, but your test is running so >>>>>> much >>>>>>> differently than mine there is something else going on. >>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot >>>>>> for >>>>>>> in 3D. >>>>>>> >>>>>>> So it is not clear what the problem is. Some questions: >>>>>>> >>>>>>> * do you have a picture of this mesh to show me? >>>>>> It's just a standard hexahedral cubed sphere mesh with the refinement >>>>>> level giving the number of times each of the six sides have been >>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 >>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x >> 16 = >>>>>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) >> going >>>>>> to the next Level >>>>>> >>>>> I see, and I assume these are pretty stretched elements. >>>>> >>>>> >>>>>>> * what do you mean by Q1-Q2 elements? >>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity >>>>>> and (tri)linear for pressure >>>>>> >>>>>> I guess you could argue we could/should just do good old geometric >>>>>> multigrid instead. More generally we do use this solver configuration >> a >>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our >>>>>> adaptive mesh runs - would it be worth to see if we have the same >>>>>> performance issues with tetrahedral P2-P1? >>>>>> >>>>> No, you have a clear reproducer, if not minimal. >>>>> The first coarsening is very different. >>>>> >>>>> I am working on this and I see that I added a heuristic for thin bodies >>>>> where you order the vertices in greedy algorithms with minimum degree >> first. >>>>> This will tend to pick corners first, edges then faces, etc. >>>>> That may be the problem. I would like to understand it better (see >> below). >>>>> >>>>> >>>>>>> It would be nice to see if the new and old codes are similar without >>>>>>> aggressive coarsening. >>>>>>> This was the intended change of the major change in this time frame >> as >>>>>> you >>>>>>> noticed. >>>>>>> If these jobs are easy to run, could you check that the old and new >>>>>>> versions are similar with "-pc_gamg_square_graph 0 ", ( and you >> only >>>>>> need >>>>>>> one time step). >>>>>>> All you need to do is check that the first coarse grid has about the >>>>>> same >>>>>>> number of equations (large). >>>>>> Unfortunately we're seeing some memory errors when we use this option, >>>>>> and I'm not entirely clear whether we're just running out of memory >> and >>>>>> need to put it on a special queue. >>>>>> >>>>>> The run with square_graph 0 using new PETSc managed to get through one >>>>>> solve at level 5, and is giving the following mg levels: >>>>>> >>>>>> rows=174, cols=174, bs=6 >>>>>> total: nonzeros=30276, allocated nonzeros=30276 >>>>>> -- >>>>>> rows=2106, cols=2106, bs=6 >>>>>> total: nonzeros=4238532, allocated nonzeros=4238532 >>>>>> -- >>>>>> rows=21828, cols=21828, bs=6 >>>>>> total: nonzeros=62588232, allocated nonzeros=62588232 >>>>>> -- >>>>>> rows=589824, cols=589824, bs=6 >>>>>> total: nonzeros=1082528928, allocated nonzeros=1082528928 >>>>>> -- >>>>>> rows=2433222, cols=2433222, bs=3 >>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 >>>>>> >>>>>> comparing with square_graph 100 with new PETSc >>>>>> >>>>>> rows=96, cols=96, bs=6 >>>>>> total: nonzeros=9216, allocated nonzeros=9216 >>>>>> -- >>>>>> rows=1440, cols=1440, bs=6 >>>>>> total: nonzeros=647856, allocated nonzeros=647856 >>>>>> -- >>>>>> rows=97242, cols=97242, bs=6 >>>>>> total: nonzeros=65656836, allocated nonzeros=65656836 >>>>>> -- >>>>>> rows=2433222, cols=2433222, bs=3 >>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 >>>>>> >>>>>> and old PETSc with square_graph 100 >>>>>> >>>>>> rows=90, cols=90, bs=6 >>>>>> total: nonzeros=8100, allocated nonzeros=8100 >>>>>> -- >>>>>> rows=1872, cols=1872, bs=6 >>>>>> total: nonzeros=1234080, allocated nonzeros=1234080 >>>>>> -- >>>>>> rows=47652, cols=47652, bs=6 >>>>>> total: nonzeros=23343264, allocated nonzeros=23343264 >>>>>> -- >>>>>> rows=2433222, cols=2433222, bs=3 >>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 >>>>>> -- >>>>>> >>>>>> Unfortunately old PETSc with square_graph 0 did not complete a single >>>>>> solve before giving the memory error >>>>>> >>>>> OK, thanks for trying. >>>>> >>>>> I am working on this and I will give you a branch to test, but if you >> can >>>>> rebuild PETSc here is a quick test that might fix your problem. >>>>> In src/ksp/pc/impls/gamg/agg.c you will see: >>>>> >>>>> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); >>>>> >>>>> If you can comment this out in the new code and compare with the old, >>>>> that might fix the problem. >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>>>> BTW, I am starting to think I should add the old method back as an >>>>>> option. >>>>>>> I did not think this change would cause large differences. >>>>>> Yes, I think that would be much appreciated. Let us know if we can do >>>>>> any testing >>>>>> >>>>>> Best wishes >>>>>> Stephan >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Note that we are providing the rigid body near nullspace, >>>>>>>> hence the bs=3 to bs=6. >>>>>>>> We have tried different values for the gamg_threshold but it doesn't >>>>>>>> really seem to significantly alter the coarsening amount in that >> first >>>>>>>> step. >>>>>>>> >>>>>>>> Do you have any suggestions for further things we should try/look >> at? >>>>>>>> Any feedback would be much appreciated >>>>>>>> >>>>>>>> Best wishes >>>>>>>> Stephan Kramer >>>>>>>> >>>>>>>> Full logs including log_view timings available from >>>>>>>> https://github.com/stephankramer/petsc-scaling/ >>>>>>>> >>>>>>>> In particular: >>>>>>>> >>>>>>>> >>>>>>>> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> From mfadams at lbl.gov Wed Oct 4 09:11:46 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 4 Oct 2023 10:11:46 -0400 Subject: [petsc-users] performance regression with GAMG In-Reply-To: <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk> References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> <99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk> <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk> Message-ID: Thanks Stephan, It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat is waiting for messages that were not sent. A bug. Can you try my branch, which is ready to merge, adams/gamg-fast-filter. We added a new filtering method in main that uses low memory but I found it was slow, so this branch brings back the old filter code, used by default, and keeps the low memory version as an option. It is possible this low memory filtering messed up the internals of the Mat in some way. I hope this is it, but if not we can continue. This MR also makes square graph the default. I have found it does create better aggregates and on GPUs, with Kokkos bug fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs) Mark On Wed, Oct 4, 2023 at 12:30?AM Stephan Kramer wrote: > Hi Mark > > Thanks again for re-enabling the square graph aggressive coarsening > option which seems to have restored performance for most of our cases. > Unfortunately we do have a remaining issue, which only seems to occur > for the larger mesh size ("level 7" which has 6,389,890 vertices and we > normally run on 1536 cpus): we either get a "Petsc has generated > inconsistent data" error, or a hang - both when constructing the square > graph matrix. So this is with the new > -pc_gamg_aggressive_square_graph=true option, without the option there's > no error but of course we would get back to the worse performance. > > Backtrace for the "inconsistent data" error. Note this is actually just > petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge > of adams/gamg-add-old-coarsening into main - with one unrelated commit > from firedrake > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: j 8 not equal to expected number of sends 9 > [0]PETSC ERROR: Petsc Development GIT revision: > v3.4.2-43104-ga3b76b71a1 GIT Date: 2023-09-18 10:26:04 +0100 > [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a named > gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct 4 14:30:41 2023 > [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix > --with-make-np=4 --with-debugging=0 --with-shared-libraries=1 > --with-fortran-bindings=0 --with-zlib --with-c2html=0 > --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx > --with-fc=mpifort --download-hdf5 --download-hypre > --download-superlu_dist --download-ptscotch --download-suitesparse > --download-pastix --download-hwloc --download-metis --download-scalapack > --download-mumps --download-chaco --download-ml > CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441 > [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at > /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270 > [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867 > [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 > [0]PETSC ERROR: #4 MatProductSymbolic() at > /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 > [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 > [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 > [0]PETSC ERROR: #7 PCSetUp_GAMG() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 > [0]PETSC ERROR: #8 PCSetUp() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 > [0]PETSC ERROR: #9 PCApply() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 > [0]PETSC ERROR: #10 PCApply() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 > [0]PETSC ERROR: #11 KSP_PCApply() at > /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 > [0]PETSC ERROR: #12 KSPSolve_CG() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 > [0]PETSC ERROR: #13 KSPSolve_Private() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 > [0]PETSC ERROR: #14 KSPSolve() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 > [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at > > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175 > [0]PETSC ERROR: #16 PCApply() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 > [0]PETSC ERROR: #17 KSP_PCApply() at > /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 > [0]PETSC ERROR: #18 KSPSolve_PREONLY() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25 > [0]PETSC ERROR: #19 KSPSolve_Private() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 > [0]PETSC ERROR: #20 KSPSolve() at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 > [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at > /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49 > [0]PETSC ERROR: #22 SNESSolve() at > /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635 > > Last -info :pc messages: > > [0] PCSetUp(): Setting up PC for first time > [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0) > N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536 > [0] PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in > graph (1.588710e+07 1.765233e+06) > [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: > Square Graph on level 1 > [0] fixAggregatesWithSquare(): isMPI = yes > [0] PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_: > New grid 380144 nodes > [0] PCGAMGOptProlongator_AGG(): > Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00 > min=9.015236e-02 PC=jacobi > [0] PCGAMGOptProlongator_AGG(): > Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra > 0.0901524 4.48938 > [0] PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_: > Coarse grid reduction from 1536 to 1536 active processes > [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1) > N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes > [0] PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in > graph (5.310360e+05 5.353000e+03) > [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: > Square Graph on level 2 > > The hang (on a slightly different model configuration but on the same > mesh and n/o cores) seems to occur in the same location. If I use gdb to > attach to the running processes, it seems on some cores it has somehow > manages to fall out of the pcsetup and is waiting in the first norm > calculation in the outside CG iteration: > > #0 0x000014cce9999119 in > hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from > /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so > #1 0x000014ccef2c2737 in _coll_ml_allreduce () from > /apps/hcoll/4.7.3202/lib/libhcoll.so.1 > #2 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1, > rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 , > op=0x14cd26cfbc20 , comm=0x3076fb0, module=0x43a0110) > at > > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228 > #3 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1, > recvbuf=, count=1, datatype=, > op=0x14cd26cfbc20 , comm=0x3076fb0) at pallreduce.c:113 > #4 0x000014cd271c9889 in VecNorm_MPI_Default (xin=, > type=, z=, VecNorm_SeqFn=) > at > > /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168 > #5 VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at > /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39 > #6 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648, > val=0x22d) at > /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214 > #7 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163 > etc. > > but with other cores still stuck at: > > #0 0x000015375cf41e8a in ucp_worker_progress () from > /apps/ucx/1.12.0/lib/libucp.so.0 > #1 0x000015377d4bd57b in opal_progress () at > > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231 > #2 0x000015377d4c3ba5 in ompi_sync_wait_mt > (sync=sync at entry=0x7ffd6aedf6f0) at > > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85 > #3 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8, > requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at > > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124 > #4 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0, > indx=0x7ffd6aedfa60, status=) at pwaitany.c:86 > #5 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ > (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884 > #6 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ > (C=0x2cc7500) at > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 > #7 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at > /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 > #8 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500, > Gmat1=0x1, Gmat2=0xc0fe132c) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 > #9 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500, > a_Gmat1=0x1, agg_lists=0xc0fe132c) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 > #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 > #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 > #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 > #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply > (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524, > __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082 > #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS > (func=0x15378f302890, args=0x83b3218, nargsf=, > kwnames=) at ../Objects/descrobject.c:405 > #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0, > nargsf=, args=0x83b3218, callable=0x15378f302890, > tstate=0x23e0020) at ../Include/cpython/abstract.h:114 > #16 PyObject_Vectorcall (kwnames=0x0, nargsf=, > args=0x83b3218, callable=0x15378f302890) at > ../Include/cpython/abstract.h:123 > #17 call_function (kwnames=0x0, oparg=, > pp_stack=, trace_info=0x7ffd6aee0390, > tstate=) at ../Python/ceval.c:5867 > #18 _PyEval_EvalFrameDefault (tstate=, f=, > throwflag=) at ../Python/ceval.c:4198 > #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080, > tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46 > #20 _PyEval_Vector (tstate=, con=, > locals=, args=, argcount=4, > kwnames=) at ../Python/ceval.c:5065 > #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func= out>, args=0x1, _nargs=, kwargs=) at > src/petsc4py/PETSc.c:548022 > #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500, > __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979 > #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 > #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1, > y=0xc0fe132c) at > /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 > #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at > /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 > > Let me know if there is anything further we can try to debug this issue > > Kind regards > Stephan Kramer > > > On 02/09/2023 01:58, Mark Adams wrote: > > Fantastic! > > > > I fixed a memory free problem. You should be OK now. > > I am pretty sure you are good but I would like to wait to get any > feedback > > from you. > > We should have a release at the end of the month and it would be nice to > > get this into it. > > > > Thanks, > > Mark > > > > > > On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer > > wrote: > > > >> Hi Mark > >> > >> Sorry took a while to report back. We have tried your branch but hit a > >> few issues, some of which we're not entirely sure are related. > >> > >> First switching off minimum degree ordering, and then switching to the > >> old version of aggressive coarsening, as you suggested, got us back to > >> the coarsening behaviour that we had previously, but then we also > >> observed an even further worsening of the iteration count: it had > >> previously gone up by 50% already (with the newer main petsc), but now > >> was more than double "old" petsc. Took us a while to realize this was > >> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. > >> Switching this also back to the old default we get back to very similar > >> coarsening levels (see below for more details if it is of interest) and > >> iteration counts. > >> > >> So that's all very good news. However, we were also starting seeing > >> memory errors (double free or corruption) when we switched off the > >> minimum degree ordering. Because this was at an earlier version of your > >> branch we then rebuild, hoping this was just an earlier bug that had > >> been fixed, but then we were having MPI-lockup issues. We have now > >> figured out the MPI issues are completely unrelated - some combination > >> with a newer mpi build and firedrake on our cluster which also occur > >> using main branches of everything. So switching back to an older MPI > >> build we are hoping to now test your most recent version of > >> adams/gamg-add-old-coarsening with these options and see whether the > >> memory errors are still there. Will let you know > >> > >> Best wishes > >> Stephan Kramer > >> > >> Coarsening details with various options for Level 6 of the test case: > >> > >> In our original setup (using "old" petsc), we had: > >> > >> rows=516, cols=516, bs=6 > >> rows=12660, cols=12660, bs=6 > >> rows=346974, cols=346974, bs=6 > >> rows=19169670, cols=19169670, bs=3 > >> > >> Then with the newer main petsc we had > >> > >> rows=666, cols=666, bs=6 > >> rows=7740, cols=7740, bs=6 > >> rows=34902, cols=34902, bs=6 > >> rows=736578, cols=736578, bs=6 > >> rows=19169670, cols=19169670, bs=3 > >> > >> Then on your branch with minimum_degree_ordering False: > >> > >> rows=504, cols=504, bs=6 > >> rows=2274, cols=2274, bs=6 > >> rows=11010, cols=11010, bs=6 > >> rows=35790, cols=35790, bs=6 > >> rows=430686, cols=430686, bs=6 > >> rows=19169670, cols=19169670, bs=3 > >> > >> And with minimum_degree_ordering False and use_aggressive_square_graph > >> True: > >> > >> rows=498, cols=498, bs=6 > >> rows=12672, cols=12672, bs=6 > >> rows=346974, cols=346974, bs=6 > >> rows=19169670, cols=19169670, bs=3 > >> > >> So that is indeed pretty much back to what it was before > >> > >> > >> > >> > >> > >> > >> > >> > >> On 31/08/2023 23:40, Mark Adams wrote: > >>> Hi Stephan, > >>> > >>> This branch is settling down. adams/gamg-add-old-coarsening > >>> < > https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening> > >>> I made the old, not minimum degree, ordering the default but kept the > new > >>> "aggressive" coarsening as the default, so I am hoping that just adding > >>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests > will > >>> get you back to where you were before. > >>> Fingers crossed ... let me know if you have any success or not. > >>> > >>> Thanks, > >>> Mark > >>> > >>> > >>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams wrote: > >>> > >>>> Hi Stephan, > >>>> > >>>> I have a branch that you can try: adams/gamg-add-old-coarsening > >>>> < > https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening > >>>> Things to test: > >>>> * First, verify that nothing unintended changed by reproducing your > bad > >>>> results with this branch (the defaults are the same) > >>>> * Try not using the minimum degree ordering that I suggested > >>>> with: -pc_gamg_use_minimum_degree_ordering false > >>>> -- I am eager to see if that is the main problem. > >>>> * Go back to what I think is the old method: > >>>> -pc_gamg_use_minimum_degree_ordering > >>>> false -pc_gamg_use_aggressive_square_graph true > >>>> > >>>> When we get back to where you were, I would like to try to get modern > >>>> stuff working. > >>>> I did add a -pc_gamg_aggressive_mis_k <2> > >>>> You could to another step of MIS coarsening with > >> -pc_gamg_aggressive_mis_k > >>>> 3 > >>>> > >>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters. > >>>> > >>>> Thanks, > >>>> Mark > >>>> > >>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams wrote: > >>>> > >>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer < > >> s.kramer at imperial.ac.uk> > >>>>> wrote: > >>>>> > >>>>>> Many thanks for looking into this, Mark > >>>>>>> My 3D tests were not that different and I see you lowered the > >>>>>> threshold. > >>>>>>> Note, you can set the threshold to zero, but your test is running > so > >>>>>> much > >>>>>>> differently than mine there is something else going on. > >>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to > shoot > >>>>>> for > >>>>>>> in 3D. > >>>>>>> > >>>>>>> So it is not clear what the problem is. Some questions: > >>>>>>> > >>>>>>> * do you have a picture of this mesh to show me? > >>>>>> It's just a standard hexahedral cubed sphere mesh with the > refinement > >>>>>> level giving the number of times each of the six sides have been > >>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to > 16 > >>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x > >> 16 = > >>>>>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) > >> going > >>>>>> to the next Level > >>>>>> > >>>>> I see, and I assume these are pretty stretched elements. > >>>>> > >>>>> > >>>>>>> * what do you mean by Q1-Q2 elements? > >>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for > velocity > >>>>>> and (tri)linear for pressure > >>>>>> > >>>>>> I guess you could argue we could/should just do good old geometric > >>>>>> multigrid instead. More generally we do use this solver > configuration > >> a > >>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our > >>>>>> adaptive mesh runs - would it be worth to see if we have the same > >>>>>> performance issues with tetrahedral P2-P1? > >>>>>> > >>>>> No, you have a clear reproducer, if not minimal. > >>>>> The first coarsening is very different. > >>>>> > >>>>> I am working on this and I see that I added a heuristic for thin > bodies > >>>>> where you order the vertices in greedy algorithms with minimum degree > >> first. > >>>>> This will tend to pick corners first, edges then faces, etc. > >>>>> That may be the problem. I would like to understand it better (see > >> below). > >>>>> > >>>>> > >>>>>>> It would be nice to see if the new and old codes are similar > without > >>>>>>> aggressive coarsening. > >>>>>>> This was the intended change of the major change in this time frame > >> as > >>>>>> you > >>>>>>> noticed. > >>>>>>> If these jobs are easy to run, could you check that the old and new > >>>>>>> versions are similar with "-pc_gamg_square_graph 0 ", ( and you > >> only > >>>>>> need > >>>>>>> one time step). > >>>>>>> All you need to do is check that the first coarse grid has about > the > >>>>>> same > >>>>>>> number of equations (large). > >>>>>> Unfortunately we're seeing some memory errors when we use this > option, > >>>>>> and I'm not entirely clear whether we're just running out of memory > >> and > >>>>>> need to put it on a special queue. > >>>>>> > >>>>>> The run with square_graph 0 using new PETSc managed to get through > one > >>>>>> solve at level 5, and is giving the following mg levels: > >>>>>> > >>>>>> rows=174, cols=174, bs=6 > >>>>>> total: nonzeros=30276, allocated nonzeros=30276 > >>>>>> -- > >>>>>> rows=2106, cols=2106, bs=6 > >>>>>> total: nonzeros=4238532, allocated nonzeros=4238532 > >>>>>> -- > >>>>>> rows=21828, cols=21828, bs=6 > >>>>>> total: nonzeros=62588232, allocated nonzeros=62588232 > >>>>>> -- > >>>>>> rows=589824, cols=589824, bs=6 > >>>>>> total: nonzeros=1082528928, allocated > nonzeros=1082528928 > >>>>>> -- > >>>>>> rows=2433222, cols=2433222, bs=3 > >>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 > >>>>>> > >>>>>> comparing with square_graph 100 with new PETSc > >>>>>> > >>>>>> rows=96, cols=96, bs=6 > >>>>>> total: nonzeros=9216, allocated nonzeros=9216 > >>>>>> -- > >>>>>> rows=1440, cols=1440, bs=6 > >>>>>> total: nonzeros=647856, allocated nonzeros=647856 > >>>>>> -- > >>>>>> rows=97242, cols=97242, bs=6 > >>>>>> total: nonzeros=65656836, allocated nonzeros=65656836 > >>>>>> -- > >>>>>> rows=2433222, cols=2433222, bs=3 > >>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 > >>>>>> > >>>>>> and old PETSc with square_graph 100 > >>>>>> > >>>>>> rows=90, cols=90, bs=6 > >>>>>> total: nonzeros=8100, allocated nonzeros=8100 > >>>>>> -- > >>>>>> rows=1872, cols=1872, bs=6 > >>>>>> total: nonzeros=1234080, allocated nonzeros=1234080 > >>>>>> -- > >>>>>> rows=47652, cols=47652, bs=6 > >>>>>> total: nonzeros=23343264, allocated nonzeros=23343264 > >>>>>> -- > >>>>>> rows=2433222, cols=2433222, bs=3 > >>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 > >>>>>> -- > >>>>>> > >>>>>> Unfortunately old PETSc with square_graph 0 did not complete a > single > >>>>>> solve before giving the memory error > >>>>>> > >>>>> OK, thanks for trying. > >>>>> > >>>>> I am working on this and I will give you a branch to test, but if you > >> can > >>>>> rebuild PETSc here is a quick test that might fix your problem. > >>>>> In src/ksp/pc/impls/gamg/agg.c you will see: > >>>>> > >>>>> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); > >>>>> > >>>>> If you can comment this out in the new code and compare with the old, > >>>>> that might fix the problem. > >>>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>>> > >>>>>>> BTW, I am starting to think I should add the old method back as an > >>>>>> option. > >>>>>>> I did not think this change would cause large differences. > >>>>>> Yes, I think that would be much appreciated. Let us know if we can > do > >>>>>> any testing > >>>>>> > >>>>>> Best wishes > >>>>>> Stephan > >>>>>> > >>>>>> > >>>>>>> Thanks, > >>>>>>> Mark > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Note that we are providing the rigid body near nullspace, > >>>>>>>> hence the bs=3 to bs=6. > >>>>>>>> We have tried different values for the gamg_threshold but it > doesn't > >>>>>>>> really seem to significantly alter the coarsening amount in that > >> first > >>>>>>>> step. > >>>>>>>> > >>>>>>>> Do you have any suggestions for further things we should try/look > >> at? > >>>>>>>> Any feedback would be much appreciated > >>>>>>>> > >>>>>>>> Best wishes > >>>>>>>> Stephan Kramer > >>>>>>>> > >>>>>>>> Full logs including log_view timings available from > >>>>>>>> https://github.com/stephankramer/petsc-scaling/ > >>>>>>>> > >>>>>>>> In particular: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 4 11:12:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 4 Oct 2023 12:12:56 -0400 Subject: [petsc-users] Compute integral using DMPlexComputeIntegralFEM In-Reply-To: References: Message-ID: On Fri, Sep 8, 2023 at 6:26?PM David Andrs wrote: > Hi all! > > I am trying to use DMPlexComputeIntegralFEM to compute an integral > $\int_\Omega u d\Omega$. My domain is a square (-1, 1)^2 (2x2 QUAD4 > elements), I add first order Lagrange FE field on it, set the solution > vector (computed by a previous simulation). > > The value I am seeing computed by PETSc is -4, but the hand-calculated > value of this integral is -4.6. I also checked this in paraview using the > ?Integrate Variables? filter and it also returns -4.6 (this was to double > check that my hand-calculated value is correct). > Sorry it took so long. You caught me at a bad time. Something must be wrong with your analytic integrals. Here is me doing them by hand. You have a 3x3 vertex arrangement with coefficients 1 0 -3 -2 -1 -2 -3 0 1 >From the symmetry, the integrals of the cells along each diagonal must be equal. Now, the shape functions for Q_1 are (1 - x) y x y (1 - x)(1 - y) x (1 - y) Thus the integral for the lower left cell is \int^1_0 dx \int^1_0 dy -3 + 3 x + y - 2 xy = -3 + 3/2 + 1/2 - 2/4 = -1.5 which is also the upper right cell. The integral for the lower right cell is \int^1_0 dx \int^1_0 dy x - y - 2 xy = 1/2 - 1/2 - 2/4 = -1/2 which is also the upper left cell. Thus we get -1.5 - 1.5 - 0.5 - 0.5 = -4, which is what Plex gets. THanks, Matt So, I must be missing something obvious in my code. Attached is the minimal > PETSc code to show what I am doing. This is against PETSc 3.19.4. > > Thanks in advance for your help, > > David > > -- > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Wed Oct 4 18:02:37 2023 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Wed, 4 Oct 2023 18:02:37 -0500 Subject: [petsc-users] Scattering a vector to/from a subset of processors Message-ID: Suppose I am running on 12 processors, and I have a vector "v" of size 36 partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3). I've been trying to use VecScatter to do this, but I'm not sure what IndexSets to use for the sender and receiver. The result I am trying to achieve is this: Assume the vector is v = <0, 1, 2, ..., 35> Start Finish Proc | Entries Proc | Entries 0 | 0,...,8 0 | 0, 1, 2 1 | 9,...,17 1 | 3, 4, 5 2 | 18,...,26 2 | 6, 7, 8 3 | 27,...,35 3 | 9, 10, 11 4 | None 4 | 12, 13, 14 5 | None 5 | 15, 16, 17 6 | None 6 | 18, 19, 20 7 | None 7 | 21, 22, 23 8 | None 8 | 24, 25, 26 9 | None 9 | 27, 28, 29 10 | None 10 | 30, 31, 32 11 | None 11 | 33, 34, 35 Appreciate any help you can provide on this. Thanks, Sreeram -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Oct 4 18:40:50 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 4 Oct 2023 18:40:50 -0500 Subject: [petsc-users] Scattering a vector to/from a subset of processors In-Reply-To: References: Message-ID: Hi, Sreeram, You can try this code. Since x, y are both MPI vectors, we just need to say we want to scatter x[0:N] to y[0:N]. The 12 index sets with your example on the 12 processes would be [0..8], [9..17], [18..26], [27..35], [], ..., []. Actually, you can do it arbitrarily, say, with 12 index sets [0..17], [18..35], .., []. PETSc will figure out how to do the communication. PetscInt rstart, rend, N; IS ix; VecScatter vscat; Vec y; MPI_Comm comm; VecType type; PetscObjectGetComm((PetscObject)x, &comm); VecGetType(x, &type); VecGetSize(x, &N); VecGetOwnershipRange(x, &rstart, &rend); VecCreate(comm, &y); VecSetSizes(y, PETSC_DECIDE, N); VecSetType(y, type); ISCreateStride(PetscObjectComm((PetscObject)x), rend - rstart, rstart, 1, &ix); VecScatterCreate(x, ix, y, ix, &vscat); --Junchao Zhang On Wed, Oct 4, 2023 at 6:03?PM Sreeram R Venkat wrote: > Suppose I am running on 12 processors, and I have a vector "v" of size 36 > partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has > a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it > over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3). I've > been trying to use VecScatter to do this, but I'm not sure what IndexSets > to use for the sender and receiver. > > The result I am trying to achieve is this: > > Assume the vector is v = <0, 1, 2, ..., 35> > > Start Finish > Proc | Entries Proc | Entries > 0 | 0,...,8 0 | 0, 1, 2 > 1 | 9,...,17 1 | 3, 4, 5 > 2 | 18,...,26 2 | 6, 7, 8 > 3 | 27,...,35 3 | 9, 10, 11 > 4 | None 4 | 12, 13, 14 > 5 | None 5 | 15, 16, 17 > 6 | None 6 | 18, 19, 20 > 7 | None 7 | 21, 22, 23 > 8 | None 8 | 24, 25, 26 > 9 | None 9 | 27, 28, 29 > 10 | None 10 | 30, 31, 32 > 11 | None 11 | 33, 34, 35 > > Appreciate any help you can provide on this. > > Thanks, > Sreeram > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Thu Oct 5 06:09:11 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Thu, 5 Oct 2023 13:09:11 +0200 Subject: [petsc-users] Galerkin projection using petsc4py Message-ID: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> Hi everyone, I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is Phi.transposeMatMult(A, A1) File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult petsc4py.PETSc.Error: error code 56 [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 [0] No support for this operation for this object type [0] Call MatProductCreate() first Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) """Experimenting with PETSc mat-mat multiplication""" import time import numpy as np from colorama import Fore from firedrake import COMM_SELF, COMM_WORLD from firedrake.petsc import PETSc from mpi4py import MPI from numpy.testing import assert_array_almost_equal from utilities import ( Print, create_petsc_matrix, ) nproc = COMM_WORLD.size rank = COMM_WORLD.rank # -------------------------------------------- # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi # A' = Phi.T * A * Phi # [k x k] <- [k x m] x [m x m] x [m x k] # -------------------------------------------- m, k = 11, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, m)) Phi_np = np.random.randint(low=0, high=6, size=(m, k)) # Create A as an mpi matrix distributed on each process A = create_petsc_matrix(A_np) # Create Phi as an mpi matrix distributed on each process Phi = create_petsc_matrix(Phi_np) A1 = create_petsc_matrix(np.zeros((k, m))) # Now A1 contains the result of Phi^T * A Phi.transposeMatMult(A, A1) -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Thu Oct 5 06:18:07 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Thu, 5 Oct 2023 13:18:07 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> Message-ID: <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> Sorry, forgot function create_petsc_matrix() def create_petsc_matrix(input_array sparse=True): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: PETSc mat: PETSc matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape size = ((None, global_rows), (global_cols, global_cols)) # Create a sparse or dense matrix based on the 'sparse' argument if sparse: matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix > On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: > > Hi everyone, > > I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is > > Phi.transposeMatMult(A, A1) > File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult > petsc4py.PETSc.Error: error code 56 > [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 > [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 > [0] No support for this operation for this object type > [0] Call MatProductCreate() first > > Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) > > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import ( > Print, > create_petsc_matrix, > ) > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np) > > A1 = create_petsc_matrix(np.zeros((k, m))) > > # Now A1 contains the result of Phi^T * A > Phi.transposeMatMult(A, A1) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Thu Oct 5 06:22:11 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 5 Oct 2023 13:22:11 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> Message-ID: How about using ptap which will use MatPtAP? It will be more efficient (and it will help you bypass the issue). Thanks, Pierre > On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis wrote: > > Sorry, forgot function create_petsc_matrix() > > def create_petsc_matrix(input_array sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > >> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: >> >> Hi everyone, >> >> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >> >> Phi.transposeMatMult(A, A1) >> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >> petsc4py.PETSc.Error: error code 56 >> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >> [0] No support for this operation for this object type >> [0] Call MatProductCreate() first >> >> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >> >> """Experimenting with PETSc mat-mat multiplication""" >> >> import time >> >> import numpy as np >> from colorama import Fore >> from firedrake import COMM_SELF, COMM_WORLD >> from firedrake.petsc import PETSc >> from mpi4py import MPI >> from numpy.testing import assert_array_almost_equal >> >> from utilities import ( >> Print, >> create_petsc_matrix, >> ) >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> # -------------------------------------------- >> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >> # A' = Phi.T * A * Phi >> # [k x k] <- [k x m] x [m x m] x [m x k] >> # -------------------------------------------- >> >> m, k = 11, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, m)) >> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> # Create A as an mpi matrix distributed on each process >> A = create_petsc_matrix(A_np) >> >> # Create Phi as an mpi matrix distributed on each process >> Phi = create_petsc_matrix(Phi_np) >> >> A1 = create_petsc_matrix(np.zeros((k, m))) >> >> # Now A1 contains the result of Phi^T * A >> Phi.transposeMatMult(A, A1) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Thu Oct 5 07:02:16 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Thu, 5 Oct 2023 14:02:16 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> Message-ID: Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 """Experimenting with PETSc mat-mat multiplication""" import time import numpy as np from colorama import Fore from firedrake import COMM_SELF, COMM_WORLD from firedrake.petsc import PETSc from mpi4py import MPI from numpy.testing import assert_array_almost_equal from utilities import ( Print, create_petsc_matrix, print_matrix_partitioning, ) nproc = COMM_WORLD.size rank = COMM_WORLD.rank # -------------------------------------------- # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi # A' = Phi.T * A * Phi # [k x k] <- [k x m] x [m x m] x [m x k] # -------------------------------------------- m, k = 11, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, m)) Phi_np = np.random.randint(low=0, high=6, size=(m, k)) # -------------------------------------------- # TEST: Galerking projection of numpy matrices A_np and Phi_np # -------------------------------------------- Aprime_np = Phi_np.T @ A_np @ Phi_np Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") Print(f"{Aprime_np}") # Create A as an mpi matrix distributed on each process A = create_petsc_matrix(A_np, sparse=False) # Create Phi as an mpi matrix distributed on each process Phi = create_petsc_matrix(Phi_np, sparse=False) # Create an empty PETSc matrix object to store the result of the PtAP operation. # This will hold the result A' = Phi.T * A * Phi after the computation. A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) # Perform the PtAP (Phi Transpose times A times Phi) operation. # In mathematical terms, this operation is A' = Phi.T * A * Phi. # A_prime will store the result of the operation. Phi.PtAP(A, A_prime) > On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: > > How about using ptap which will use MatPtAP? > It will be more efficient (and it will help you bypass the issue). > > Thanks, > Pierre > >> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis wrote: >> >> Sorry, forgot function create_petsc_matrix() >> >> def create_petsc_matrix(input_array sparse=True): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> PETSc mat: PETSc matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> >> size = ((None, global_rows), (global_cols, global_cols)) >> >> # Create a sparse or dense matrix based on the 'sparse' argument >> if sparse: >> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >> else: >> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: >>> >>> Hi everyone, >>> >>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>> >>> Phi.transposeMatMult(A, A1) >>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>> petsc4py.PETSc.Error: error code 56 >>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>> [0] No support for this operation for this object type >>> [0] Call MatProductCreate() first >>> >>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>> >>> """Experimenting with PETSc mat-mat multiplication""" >>> >>> import time >>> >>> import numpy as np >>> from colorama import Fore >>> from firedrake import COMM_SELF, COMM_WORLD >>> from firedrake.petsc import PETSc >>> from mpi4py import MPI >>> from numpy.testing import assert_array_almost_equal >>> >>> from utilities import ( >>> Print, >>> create_petsc_matrix, >>> ) >>> >>> nproc = COMM_WORLD.size >>> rank = COMM_WORLD.rank >>> >>> # -------------------------------------------- >>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>> # A' = Phi.T * A * Phi >>> # [k x k] <- [k x m] x [m x m] x [m x k] >>> # -------------------------------------------- >>> >>> m, k = 11, 7 >>> # Generate the random numpy matrices >>> np.random.seed(0) # sets the seed to 0 >>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>> >>> # Create A as an mpi matrix distributed on each process >>> A = create_petsc_matrix(A_np) >>> >>> # Create Phi as an mpi matrix distributed on each process >>> Phi = create_petsc_matrix(Phi_np) >>> >>> A1 = create_petsc_matrix(np.zeros((k, m))) >>> >>> # Now A1 contains the result of Phi^T * A >>> Phi.transposeMatMult(A, A1) >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Thu Oct 5 07:17:52 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 5 Oct 2023 14:17:52 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> Message-ID: Not a petsc4py expert here, but you may to try instead: A_prime = A.ptap(Phi) Thanks, Pierre > On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis wrote: > > Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? > > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. > Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import ( > Print, > create_petsc_matrix, > print_matrix_partitioning, > ) > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # -------------------------------------------- > # TEST: Galerking projection of numpy matrices A_np and Phi_np > # -------------------------------------------- > Aprime_np = Phi_np.T @ A_np @ Phi_np > Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") > Print(f"{Aprime_np}") > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np, sparse=False) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np, sparse=False) > > # Create an empty PETSc matrix object to store the result of the PtAP operation. > # This will hold the result A' = Phi.T * A * Phi after the computation. > A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) > > # Perform the PtAP (Phi Transpose times A times Phi) operation. > # In mathematical terms, this operation is A' = Phi.T * A * Phi. > # A_prime will store the result of the operation. > Phi.PtAP(A, A_prime) > >> On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: >> >> How about using ptap which will use MatPtAP? >> It will be more efficient (and it will help you bypass the issue). >> >> Thanks, >> Pierre >> >>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis wrote: >>> >>> Sorry, forgot function create_petsc_matrix() >>> >>> def create_petsc_matrix(input_array sparse=True): >>> """Create a PETSc matrix from an input_array >>> >>> Args: >>> input_array (np array): Input array >>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>> >>> Returns: >>> PETSc mat: PETSc matrix >>> """ >>> # Check if input_array is 1D and reshape if necessary >>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>> global_rows, global_cols = input_array.shape >>> >>> size = ((None, global_rows), (global_cols, global_cols)) >>> >>> # Create a sparse or dense matrix based on the 'sparse' argument >>> if sparse: >>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>> else: >>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>> matrix.setUp() >>> >>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>> >>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>> # Calculate the correct row in the array for the current process >>> row_in_array = counter + local_rows_start >>> matrix.setValues( >>> i, range(global_cols), input_array[row_in_array, :], addv=False >>> ) >>> >>> # Assembly the matrix to compute the final structure >>> matrix.assemblyBegin() >>> matrix.assemblyEnd() >>> >>> return matrix >>> >>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: >>>> >>>> Hi everyone, >>>> >>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>> >>>> Phi.transposeMatMult(A, A1) >>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>> petsc4py.PETSc.Error: error code 56 >>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>> [0] No support for this operation for this object type >>>> [0] Call MatProductCreate() first >>>> >>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>> >>>> """Experimenting with PETSc mat-mat multiplication""" >>>> >>>> import time >>>> >>>> import numpy as np >>>> from colorama import Fore >>>> from firedrake import COMM_SELF, COMM_WORLD >>>> from firedrake.petsc import PETSc >>>> from mpi4py import MPI >>>> from numpy.testing import assert_array_almost_equal >>>> >>>> from utilities import ( >>>> Print, >>>> create_petsc_matrix, >>>> ) >>>> >>>> nproc = COMM_WORLD.size >>>> rank = COMM_WORLD.rank >>>> >>>> # -------------------------------------------- >>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>> # A' = Phi.T * A * Phi >>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>> # -------------------------------------------- >>>> >>>> m, k = 11, 7 >>>> # Generate the random numpy matrices >>>> np.random.seed(0) # sets the seed to 0 >>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>> >>>> # Create A as an mpi matrix distributed on each process >>>> A = create_petsc_matrix(A_np) >>>> >>>> # Create Phi as an mpi matrix distributed on each process >>>> Phi = create_petsc_matrix(Phi_np) >>>> >>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>> >>>> # Now A1 contains the result of Phi^T * A >>>> Phi.transposeMatMult(A, A1) >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Thu Oct 5 07:23:20 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Thu, 5 Oct 2023 14:23:20 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> Message-ID: <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> This works Pierre. Amazing input, thanks a lot! > On 5 Oct 2023, at 14:17, Pierre Jolivet wrote: > > Not a petsc4py expert here, but you may to try instead: > A_prime = A.ptap(Phi) > > Thanks, > Pierre > >> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis wrote: >> >> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >> >> [0]PETSC ERROR: ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> >> """Experimenting with PETSc mat-mat multiplication""" >> >> import time >> >> import numpy as np >> from colorama import Fore >> from firedrake import COMM_SELF, COMM_WORLD >> from firedrake.petsc import PETSc >> from mpi4py import MPI >> from numpy.testing import assert_array_almost_equal >> >> from utilities import ( >> Print, >> create_petsc_matrix, >> print_matrix_partitioning, >> ) >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> # -------------------------------------------- >> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >> # A' = Phi.T * A * Phi >> # [k x k] <- [k x m] x [m x m] x [m x k] >> # -------------------------------------------- >> >> m, k = 11, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, m)) >> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> # -------------------------------------------- >> # TEST: Galerking projection of numpy matrices A_np and Phi_np >> # -------------------------------------------- >> Aprime_np = Phi_np.T @ A_np @ Phi_np >> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >> Print(f"{Aprime_np}") >> >> # Create A as an mpi matrix distributed on each process >> A = create_petsc_matrix(A_np, sparse=False) >> >> # Create Phi as an mpi matrix distributed on each process >> Phi = create_petsc_matrix(Phi_np, sparse=False) >> >> # Create an empty PETSc matrix object to store the result of the PtAP operation. >> # This will hold the result A' = Phi.T * A * Phi after the computation. >> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >> >> # Perform the PtAP (Phi Transpose times A times Phi) operation. >> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >> # A_prime will store the result of the operation. >> Phi.PtAP(A, A_prime) >> >>> On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: >>> >>> How about using ptap which will use MatPtAP? >>> It will be more efficient (and it will help you bypass the issue). >>> >>> Thanks, >>> Pierre >>> >>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis wrote: >>>> >>>> Sorry, forgot function create_petsc_matrix() >>>> >>>> def create_petsc_matrix(input_array sparse=True): >>>> """Create a PETSc matrix from an input_array >>>> >>>> Args: >>>> input_array (np array): Input array >>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>> >>>> Returns: >>>> PETSc mat: PETSc matrix >>>> """ >>>> # Check if input_array is 1D and reshape if necessary >>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>> global_rows, global_cols = input_array.shape >>>> >>>> size = ((None, global_rows), (global_cols, global_cols)) >>>> >>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>> if sparse: >>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>> else: >>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>> matrix.setUp() >>>> >>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>> >>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>> # Calculate the correct row in the array for the current process >>>> row_in_array = counter + local_rows_start >>>> matrix.setValues( >>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>> ) >>>> >>>> # Assembly the matrix to compute the final structure >>>> matrix.assemblyBegin() >>>> matrix.assemblyEnd() >>>> >>>> return matrix >>>> >>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>> >>>>> Phi.transposeMatMult(A, A1) >>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>> petsc4py.PETSc.Error: error code 56 >>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>> [0] No support for this operation for this object type >>>>> [0] Call MatProductCreate() first >>>>> >>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>> >>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>> >>>>> import time >>>>> >>>>> import numpy as np >>>>> from colorama import Fore >>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>> from firedrake.petsc import PETSc >>>>> from mpi4py import MPI >>>>> from numpy.testing import assert_array_almost_equal >>>>> >>>>> from utilities import ( >>>>> Print, >>>>> create_petsc_matrix, >>>>> ) >>>>> >>>>> nproc = COMM_WORLD.size >>>>> rank = COMM_WORLD.rank >>>>> >>>>> # -------------------------------------------- >>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>> # A' = Phi.T * A * Phi >>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>> # -------------------------------------------- >>>>> >>>>> m, k = 11, 7 >>>>> # Generate the random numpy matrices >>>>> np.random.seed(0) # sets the seed to 0 >>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>> >>>>> # Create A as an mpi matrix distributed on each process >>>>> A = create_petsc_matrix(A_np) >>>>> >>>>> # Create Phi as an mpi matrix distributed on each process >>>>> Phi = create_petsc_matrix(Phi_np) >>>>> >>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>> >>>>> # Now A1 contains the result of Phi^T * A >>>>> Phi.transposeMatMult(A, A1) >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Oct 5 10:58:50 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Oct 2023 10:58:50 -0500 Subject: [petsc-users] Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Hi, Philip, I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface? MatSetValues() needs to copy the data from device to host and thus is expensive. Do you have profiling results with COO enabled? [image: Screenshot 2023-10-05 at 10.55.29?AM.png] --Junchao Zhang On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang wrote: > Hi, Philip, > I will look into the tarballs and get back to you. > Thanks. > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> We finally have xolotl ported to use the new COO interface and the >> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port >> to our previous version (using MatSetValuesStencil and the default Mat and >> Vec implementations), we expected to see an improvement in performance for >> both the "serial" and "cuda" builds (here I'm referring to the kokkos >> configuration). >> >> Attached are two plots that show timings for three different cases. All >> of these were run on Ascent (the Summit-like training system) with 6 MPI >> tasks (on a single node). The CUDA cases were given one GPU per task (and >> used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all >> cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as >> consistent as possible. >> >> The performance of RHSJacobian (where the bulk of computation happens in >> xolotl) behaved basically as expected (better than expected in the serial >> build). NE_3 case in CUDA was the only one that performed worse, but not >> surprisingly, since its workload for the GPUs is much smaller. We've still >> got more optimization to do on this. >> >> The real surprise was how much worse the overall solve times were. This >> seems to be due simply to switching to the kokkos-based implementation. I'm >> wondering if there are any changes we can make in configuration or runtime >> arguments to help with PETSc's performance here. Any help looking into this >> would be appreciated. >> >> The tarballs linked here >> >> and here >> >> are profiling databases which, once extracted, can be viewed with >> hpcviewer. I don't know how helpful that will be, but hopefully it can give >> you some direction. >> >> Thanks for your help, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-10-05 at 10.55.29?AM.png Type: image/png Size: 144341 bytes Desc: not available URL: From srvenkat at utexas.edu Thu Oct 5 12:57:00 2023 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Thu, 5 Oct 2023 12:57:00 -0500 Subject: [petsc-users] Scattering a vector to/from a subset of processors In-Reply-To: References: Message-ID: Thank you. This works for me. Sreeram On Wed, Oct 4, 2023 at 6:41?PM Junchao Zhang wrote: > Hi, Sreeram, > You can try this code. Since x, y are both MPI vectors, we just need to > say we want to scatter x[0:N] to y[0:N]. The 12 index sets with your > example on the 12 processes would be [0..8], [9..17], [18..26], [27..35], > [], ..., []. Actually, you can do it arbitrarily, say, with 12 index sets > [0..17], [18..35], .., []. PETSc will figure out how to do the > communication. > > PetscInt rstart, rend, N; > IS ix; > VecScatter vscat; > Vec y; > MPI_Comm comm; > VecType type; > > PetscObjectGetComm((PetscObject)x, &comm); > VecGetType(x, &type); > VecGetSize(x, &N); > VecGetOwnershipRange(x, &rstart, &rend); > > VecCreate(comm, &y); > VecSetSizes(y, PETSC_DECIDE, N); > VecSetType(y, type); > > ISCreateStride(PetscObjectComm((PetscObject)x), rend - rstart, rstart, 1, > &ix); > VecScatterCreate(x, ix, y, ix, &vscat); > > --Junchao Zhang > > > On Wed, Oct 4, 2023 at 6:03?PM Sreeram R Venkat > wrote: > >> Suppose I am running on 12 processors, and I have a vector "v" of size 36 >> partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has >> a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it >> over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3). I've >> been trying to use VecScatter to do this, but I'm not sure what IndexSets >> to use for the sender and receiver. >> >> The result I am trying to achieve is this: >> >> Assume the vector is v = <0, 1, 2, ..., 35> >> >> Start Finish >> Proc | Entries Proc | Entries >> 0 | 0,...,8 0 | 0, 1, 2 >> 1 | 9,...,17 1 | 3, 4, 5 >> 2 | 18,...,26 2 | 6, 7, 8 >> 3 | 27,...,35 3 | 9, 10, 11 >> 4 | None 4 | 12, 13, 14 >> 5 | None 5 | 15, 16, 17 >> 6 | None 6 | 18, 19, 20 >> 7 | None 7 | 21, 22, 23 >> 8 | None 8 | 24, 25, 26 >> 9 | None 9 | 27, 28, 29 >> 10 | None 10 | 30, 31, 32 >> 11 | None 11 | 33, 34, 35 >> >> Appreciate any help you can provide on this. >> >> Thanks, >> Sreeram >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrsd at gmail.com Thu Oct 5 15:59:58 2023 From: andrsd at gmail.com (David Andrs) Date: Thu, 5 Oct 2023 14:59:58 -0600 Subject: [petsc-users] Compute integral using DMPlexComputeIntegralFEM In-Reply-To: References: Message-ID: Hi Matt! Thanks for getting back to this. I found a mistake in my hand calculation - I used wrong shape functions (pretty stupid mistake :face_palm:). Thanks again for your help, David On Wed, Oct 4, 2023 at 10:13?AM Matthew Knepley wrote: > On Fri, Sep 8, 2023 at 6:26?PM David Andrs wrote: > >> Hi all! >> >> I am trying to use DMPlexComputeIntegralFEM to compute an integral >> $\int_\Omega u d\Omega$. My domain is a square (-1, 1)^2 (2x2 QUAD4 >> elements), I add first order Lagrange FE field on it, set the solution >> vector (computed by a previous simulation). >> >> The value I am seeing computed by PETSc is -4, but the hand-calculated >> value of this integral is -4.6. I also checked this in paraview using the >> ?Integrate Variables? filter and it also returns -4.6 (this was to double >> check that my hand-calculated value is correct). >> > > Sorry it took so long. You caught me at a bad time. > > Something must be wrong with your analytic integrals. Here is me doing > them by hand. You have a 3x3 vertex arrangement with coefficients > > 1 0 -3 > -2 -1 -2 > -3 0 1 > > From the symmetry, the integrals of the cells along each diagonal must be > equal. Now, the shape functions for Q_1 are > > (1 - x) y x y > > (1 - x)(1 - y) x (1 - y) > > Thus the integral for the lower left cell is > > \int^1_0 dx \int^1_0 dy -3 + 3 x + y - 2 xy = -3 + 3/2 + 1/2 - 2/4 = -1.5 > > which is also the upper right cell. The integral for the lower right cell > is > > \int^1_0 dx \int^1_0 dy x - y - 2 xy = 1/2 - 1/2 - 2/4 = -1/2 > > which is also the upper left cell. Thus we get -1.5 - 1.5 - 0.5 - 0.5 = > -4, which is what Plex gets. > > THanks, > > Matt > > So, I must be missing something obvious in my code. Attached is the >> minimal PETSc code to show what I am doing. This is against PETSc 3.19.4. >> >> Thanks in advance for your help, >> >> David >> >> -- >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Oct 5 16:29:30 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 5 Oct 2023 16:29:30 -0500 Subject: [petsc-users] Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Wait a moment, it seems it was because we do not have a GPU implementation of MatShift... Let me see how to add it. --Junchao Zhang On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang wrote: > Hi, Philip, > I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() > instead of the COO interface? MatSetValues() needs to copy the data from > device to host and thus is expensive. > Do you have profiling results with COO enabled? > > [image: Screenshot 2023-10-05 at 10.55.29?AM.png] > > > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: > >> Hi, Philip, >> I will look into the tarballs and get back to you. >> Thanks. >> --Junchao Zhang >> >> >> On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> We finally have xolotl ported to use the new COO interface and the >>> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port >>> to our previous version (using MatSetValuesStencil and the default Mat and >>> Vec implementations), we expected to see an improvement in performance for >>> both the "serial" and "cuda" builds (here I'm referring to the kokkos >>> configuration). >>> >>> Attached are two plots that show timings for three different cases. All >>> of these were run on Ascent (the Summit-like training system) with 6 MPI >>> tasks (on a single node). The CUDA cases were given one GPU per task (and >>> used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all >>> cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as >>> consistent as possible. >>> >>> The performance of RHSJacobian (where the bulk of computation happens in >>> xolotl) behaved basically as expected (better than expected in the serial >>> build). NE_3 case in CUDA was the only one that performed worse, but not >>> surprisingly, since its workload for the GPUs is much smaller. We've still >>> got more optimization to do on this. >>> >>> The real surprise was how much worse the overall solve times were. This >>> seems to be due simply to switching to the kokkos-based implementation. I'm >>> wondering if there are any changes we can make in configuration or runtime >>> arguments to help with PETSc's performance here. Any help looking into this >>> would be appreciated. >>> >>> The tarballs linked here >>> >>> and here >>> >>> are profiling databases which, once extracted, can be viewed with >>> hpcviewer. I don't know how helpful that will be, but hopefully it can give >>> you some direction. >>> >>> Thanks for your help, >>> >>> >>> *Philip Fackler * >>> Research Software Engineer, Application Engineering Group >>> Advanced Computing Systems Research Section >>> Computer Science and Mathematics Division >>> *Oak Ridge National Laboratory* >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-10-05 at 10.55.29?AM.png Type: image/png Size: 144341 bytes Desc: not available URL: From facklerpw at ornl.gov Thu Oct 5 16:52:09 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Thu, 5 Oct 2023 21:52:09 +0000 Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Aha! That makes sense. Thank you. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Thursday, October 5, 2023 17:29 To: Fackler, Philip Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net ; Blondel, Sophie Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface Wait a moment, it seems it was because we do not have a GPU implementation of MatShift... Let me see how to add it. --Junchao Zhang On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang > wrote: Hi, Philip, I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface? MatSetValues() needs to copy the data from device to host and thus is expensive. Do you have profiling results with COO enabled? [Screenshot 2023-10-05 at 10.55.29?AM.png] --Junchao Zhang On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: Hi, Philip, I will look into the tarballs and get back to you. Thanks. --Junchao Zhang On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users > wrote: We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration). Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible. The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this. The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated. The tarballs linked here and here are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction. Thanks for your help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-10-05 at 10.55.29?AM.png Type: image/png Size: 144341 bytes Desc: Screenshot 2023-10-05 at 10.55.29?AM.png URL: From kenneth.c.hall at duke.edu Thu Oct 5 17:37:04 2023 From: kenneth.c.hall at duke.edu (Kenneth C Hall) Date: Thu, 5 Oct 2023 22:37:04 +0000 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) Message-ID: Hi all, I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x. I am trying to implement in SLEPc/NEP. To get started, I have defined a much smaller and simpler system of the form A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP. I have set the problem up using shell matrices (as that is my ultimate goal). The full code is attached, but here is a smaller snippet of code: !.... Create matrix-free operators for A and B PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr)) PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr)) PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr)) PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr)) !.... Create nonlinear eigensolver PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr)) !.... Set the problem type PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr)) ! !.... set the solver type PetscCall(NEPSetType(nep, NEPNLEIGS, ierr)) ! !.... Set functions and Jacobians for NEP PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr)) PetscCall(NEPSetJacobian(nep, B, MyNEPJacobian, PETSC_NULL_INTEGER, ierr)) The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B. The NEP viewer and error messages are attached. Any help on getting this problem properly set up would be greatly appreciated. Kenneth Hall ATTACHMENTS: test_nep.f90 code_output -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: code_output Type: application/octet-stream Size: 3674 bytes Desc: code_output URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_nep.f90 Type: application/octet-stream Size: 7440 bytes Desc: test_nep.f90 URL: From s.kramer at imperial.ac.uk Thu Oct 5 18:22:48 2023 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Fri, 6 Oct 2023 10:22:48 +1100 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> <99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk> <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk> Message-ID: <0aec9ffa-ccc1-9481-47c7-c32e69903f45@imperial.ac.uk> Great, that seems to fix the issue indeed - i.e. on the branch with the low memory filtering switched off (by default) we no longer see the "inconsistent data" error or hangs, and going back to the square graph aggressive coarsening brings us back the old performance. So we'd be keen to have that branch merged indeed Many thanks for your assistance with this Stephan On 05/10/2023 01:11, Mark Adams wrote: > Thanks Stephan, > > It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat > is waiting for messages that were not sent. A bug. > > Can you try my branch, which is ready to merge, adams/gamg-fast-filter. > We added a new filtering method in main that uses low memory but I found it > was slow, so this branch brings back the old filter code, used by default, > and keeps the low memory version as an option. > It is possible this low memory filtering messed up the internals of the Mat > in some way. > I hope this is it, but if not we can continue. > > This MR also makes square graph the default. > I have found it does create better aggregates and on GPUs, with Kokkos bug > fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs) > > Mark > > > > > On Wed, Oct 4, 2023 at 12:30?AM Stephan Kramer > wrote: > >> Hi Mark >> >> Thanks again for re-enabling the square graph aggressive coarsening >> option which seems to have restored performance for most of our cases. >> Unfortunately we do have a remaining issue, which only seems to occur >> for the larger mesh size ("level 7" which has 6,389,890 vertices and we >> normally run on 1536 cpus): we either get a "Petsc has generated >> inconsistent data" error, or a hang - both when constructing the square >> graph matrix. So this is with the new >> -pc_gamg_aggressive_square_graph=true option, without the option there's >> no error but of course we would get back to the worse performance. >> >> Backtrace for the "inconsistent data" error. Note this is actually just >> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge >> of adams/gamg-add-old-coarsening into main - with one unrelated commit >> from firedrake >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Petsc has generated inconsistent data >> [0]PETSC ERROR: j 8 not equal to expected number of sends 9 >> [0]PETSC ERROR: Petsc Development GIT revision: >> v3.4.2-43104-ga3b76b71a1 GIT Date: 2023-09-18 10:26:04 +0100 >> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a named >> gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct 4 14:30:41 2023 >> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix >> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1 >> --with-fortran-bindings=0 --with-zlib --with-c2html=0 >> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=mpifort --download-hdf5 --download-hypre >> --download-superlu_dist --download-ptscotch --download-suitesparse >> --download-pastix --download-hwloc --download-metis --download-scalapack >> --download-mumps --download-chaco --download-ml >> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441 >> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at >> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270 >> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867 >> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 >> [0]PETSC ERROR: #4 MatProductSymbolic() at >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 >> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 >> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 >> [0]PETSC ERROR: #7 PCSetUp_GAMG() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 >> [0]PETSC ERROR: #8 PCSetUp() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 >> [0]PETSC ERROR: #9 PCApply() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 >> [0]PETSC ERROR: #10 PCApply() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 >> [0]PETSC ERROR: #11 KSP_PCApply() at >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 >> [0]PETSC ERROR: #12 KSPSolve_CG() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 >> [0]PETSC ERROR: #13 KSPSolve_Private() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 >> [0]PETSC ERROR: #14 KSPSolve() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 >> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175 >> [0]PETSC ERROR: #16 PCApply() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 >> [0]PETSC ERROR: #17 KSP_PCApply() at >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 >> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25 >> [0]PETSC ERROR: #19 KSPSolve_Private() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 >> [0]PETSC ERROR: #20 KSPSolve() at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 >> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at >> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49 >> [0]PETSC ERROR: #22 SNESSolve() at >> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635 >> >> Last -info :pc messages: >> >> [0] PCSetUp(): Setting up PC for first time >> [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0) >> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536 >> [0] PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in >> graph (1.588710e+07 1.765233e+06) >> [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: >> Square Graph on level 1 >> [0] fixAggregatesWithSquare(): isMPI = yes >> [0] PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_: >> New grid 380144 nodes >> [0] PCGAMGOptProlongator_AGG(): >> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00 >> min=9.015236e-02 PC=jacobi >> [0] PCGAMGOptProlongator_AGG(): >> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra >> 0.0901524 4.48938 >> [0] PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_: >> Coarse grid reduction from 1536 to 1536 active processes >> [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1) >> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes >> [0] PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in >> graph (5.310360e+05 5.353000e+03) >> [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: >> Square Graph on level 2 >> >> The hang (on a slightly different model configuration but on the same >> mesh and n/o cores) seems to occur in the same location. If I use gdb to >> attach to the running processes, it seems on some cores it has somehow >> manages to fall out of the pcsetup and is waiting in the first norm >> calculation in the outside CG iteration: >> >> #0 0x000014cce9999119 in >> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from >> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so >> #1 0x000014ccef2c2737 in _coll_ml_allreduce () from >> /apps/hcoll/4.7.3202/lib/libhcoll.so.1 >> #2 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1, >> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 , >> op=0x14cd26cfbc20 , comm=0x3076fb0, module=0x43a0110) >> at >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228 >> #3 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1, >> recvbuf=, count=1, datatype=, >> op=0x14cd26cfbc20 , comm=0x3076fb0) at pallreduce.c:113 >> #4 0x000014cd271c9889 in VecNorm_MPI_Default (xin=, >> type=, z=, VecNorm_SeqFn=) >> at >> >> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168 >> #5 VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39 >> #6 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648, >> val=0x22d) at >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214 >> #7 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163 >> etc. >> >> but with other cores still stuck at: >> >> #0 0x000015375cf41e8a in ucp_worker_progress () from >> /apps/ucx/1.12.0/lib/libucp.so.0 >> #1 0x000015377d4bd57b in opal_progress () at >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231 >> #2 0x000015377d4c3ba5 in ompi_sync_wait_mt >> (sync=sync at entry=0x7ffd6aedf6f0) at >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85 >> #3 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8, >> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124 >> #4 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0, >> indx=0x7ffd6aedfa60, status=) at pwaitany.c:86 >> #5 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ >> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884 >> #6 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ >> (C=0x2cc7500) at >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 >> #7 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 >> #8 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500, >> Gmat1=0x1, Gmat2=0xc0fe132c) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 >> #9 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500, >> a_Gmat1=0x1, agg_lists=0xc0fe132c) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 >> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 >> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 >> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 >> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply >> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524, >> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082 >> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS >> (func=0x15378f302890, args=0x83b3218, nargsf=, >> kwnames=) at ../Objects/descrobject.c:405 >> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0, >> nargsf=, args=0x83b3218, callable=0x15378f302890, >> tstate=0x23e0020) at ../Include/cpython/abstract.h:114 >> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=, >> args=0x83b3218, callable=0x15378f302890) at >> ../Include/cpython/abstract.h:123 >> #17 call_function (kwnames=0x0, oparg=, >> pp_stack=, trace_info=0x7ffd6aee0390, >> tstate=) at ../Python/ceval.c:5867 >> #18 _PyEval_EvalFrameDefault (tstate=, f=, >> throwflag=) at ../Python/ceval.c:4198 >> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080, >> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46 >> #20 _PyEval_Vector (tstate=, con=, >> locals=, args=, argcount=4, >> kwnames=) at ../Python/ceval.c:5065 >> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=> out>, args=0x1, _nargs=, kwargs=) at >> src/petsc4py/PETSc.c:548022 >> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500, >> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979 >> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 >> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1, >> y=0xc0fe132c) at >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 >> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 >> >> Let me know if there is anything further we can try to debug this issue >> >> Kind regards >> Stephan Kramer >> >> >> On 02/09/2023 01:58, Mark Adams wrote: >>> Fantastic! >>> >>> I fixed a memory free problem. You should be OK now. >>> I am pretty sure you are good but I would like to wait to get any >> feedback >>> from you. >>> We should have a release at the end of the month and it would be nice to >>> get this into it. >>> >>> Thanks, >>> Mark >>> >>> >>> On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer >>> wrote: >>> >>>> Hi Mark >>>> >>>> Sorry took a while to report back. We have tried your branch but hit a >>>> few issues, some of which we're not entirely sure are related. >>>> >>>> First switching off minimum degree ordering, and then switching to the >>>> old version of aggressive coarsening, as you suggested, got us back to >>>> the coarsening behaviour that we had previously, but then we also >>>> observed an even further worsening of the iteration count: it had >>>> previously gone up by 50% already (with the newer main petsc), but now >>>> was more than double "old" petsc. Took us a while to realize this was >>>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. >>>> Switching this also back to the old default we get back to very similar >>>> coarsening levels (see below for more details if it is of interest) and >>>> iteration counts. >>>> >>>> So that's all very good news. However, we were also starting seeing >>>> memory errors (double free or corruption) when we switched off the >>>> minimum degree ordering. Because this was at an earlier version of your >>>> branch we then rebuild, hoping this was just an earlier bug that had >>>> been fixed, but then we were having MPI-lockup issues. We have now >>>> figured out the MPI issues are completely unrelated - some combination >>>> with a newer mpi build and firedrake on our cluster which also occur >>>> using main branches of everything. So switching back to an older MPI >>>> build we are hoping to now test your most recent version of >>>> adams/gamg-add-old-coarsening with these options and see whether the >>>> memory errors are still there. Will let you know >>>> >>>> Best wishes >>>> Stephan Kramer >>>> >>>> Coarsening details with various options for Level 6 of the test case: >>>> >>>> In our original setup (using "old" petsc), we had: >>>> >>>> rows=516, cols=516, bs=6 >>>> rows=12660, cols=12660, bs=6 >>>> rows=346974, cols=346974, bs=6 >>>> rows=19169670, cols=19169670, bs=3 >>>> >>>> Then with the newer main petsc we had >>>> >>>> rows=666, cols=666, bs=6 >>>> rows=7740, cols=7740, bs=6 >>>> rows=34902, cols=34902, bs=6 >>>> rows=736578, cols=736578, bs=6 >>>> rows=19169670, cols=19169670, bs=3 >>>> >>>> Then on your branch with minimum_degree_ordering False: >>>> >>>> rows=504, cols=504, bs=6 >>>> rows=2274, cols=2274, bs=6 >>>> rows=11010, cols=11010, bs=6 >>>> rows=35790, cols=35790, bs=6 >>>> rows=430686, cols=430686, bs=6 >>>> rows=19169670, cols=19169670, bs=3 >>>> >>>> And with minimum_degree_ordering False and use_aggressive_square_graph >>>> True: >>>> >>>> rows=498, cols=498, bs=6 >>>> rows=12672, cols=12672, bs=6 >>>> rows=346974, cols=346974, bs=6 >>>> rows=19169670, cols=19169670, bs=3 >>>> >>>> So that is indeed pretty much back to what it was before >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 31/08/2023 23:40, Mark Adams wrote: >>>>> Hi Stephan, >>>>> >>>>> This branch is settling down. adams/gamg-add-old-coarsening >>>>> < >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening> >>>>> I made the old, not minimum degree, ordering the default but kept the >> new >>>>> "aggressive" coarsening as the default, so I am hoping that just adding >>>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests >> will >>>>> get you back to where you were before. >>>>> Fingers crossed ... let me know if you have any success or not. >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams wrote: >>>>> >>>>>> Hi Stephan, >>>>>> >>>>>> I have a branch that you can try: adams/gamg-add-old-coarsening >>>>>> < >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening >>>>>> Things to test: >>>>>> * First, verify that nothing unintended changed by reproducing your >> bad >>>>>> results with this branch (the defaults are the same) >>>>>> * Try not using the minimum degree ordering that I suggested >>>>>> with: -pc_gamg_use_minimum_degree_ordering false >>>>>> -- I am eager to see if that is the main problem. >>>>>> * Go back to what I think is the old method: >>>>>> -pc_gamg_use_minimum_degree_ordering >>>>>> false -pc_gamg_use_aggressive_square_graph true >>>>>> >>>>>> When we get back to where you were, I would like to try to get modern >>>>>> stuff working. >>>>>> I did add a -pc_gamg_aggressive_mis_k <2> >>>>>> You could to another step of MIS coarsening with >>>> -pc_gamg_aggressive_mis_k >>>>>> 3 >>>>>> >>>>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters. >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams wrote: >>>>>> >>>>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer < >>>> s.kramer at imperial.ac.uk> >>>>>>> wrote: >>>>>>> >>>>>>>> Many thanks for looking into this, Mark >>>>>>>>> My 3D tests were not that different and I see you lowered the >>>>>>>> threshold. >>>>>>>>> Note, you can set the threshold to zero, but your test is running >> so >>>>>>>> much >>>>>>>>> differently than mine there is something else going on. >>>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to >> shoot >>>>>>>> for >>>>>>>>> in 3D. >>>>>>>>> >>>>>>>>> So it is not clear what the problem is. Some questions: >>>>>>>>> >>>>>>>>> * do you have a picture of this mesh to show me? >>>>>>>> It's just a standard hexahedral cubed sphere mesh with the >> refinement >>>>>>>> level giving the number of times each of the six sides have been >>>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to >> 16 >>>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x >>>> 16 = >>>>>>>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) >>>> going >>>>>>>> to the next Level >>>>>>>> >>>>>>> I see, and I assume these are pretty stretched elements. >>>>>>> >>>>>>> >>>>>>>>> * what do you mean by Q1-Q2 elements? >>>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for >> velocity >>>>>>>> and (tri)linear for pressure >>>>>>>> >>>>>>>> I guess you could argue we could/should just do good old geometric >>>>>>>> multigrid instead. More generally we do use this solver >> configuration >>>> a >>>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our >>>>>>>> adaptive mesh runs - would it be worth to see if we have the same >>>>>>>> performance issues with tetrahedral P2-P1? >>>>>>>> >>>>>>> No, you have a clear reproducer, if not minimal. >>>>>>> The first coarsening is very different. >>>>>>> >>>>>>> I am working on this and I see that I added a heuristic for thin >> bodies >>>>>>> where you order the vertices in greedy algorithms with minimum degree >>>> first. >>>>>>> This will tend to pick corners first, edges then faces, etc. >>>>>>> That may be the problem. I would like to understand it better (see >>>> below). >>>>>>> >>>>>>>>> It would be nice to see if the new and old codes are similar >> without >>>>>>>>> aggressive coarsening. >>>>>>>>> This was the intended change of the major change in this time frame >>>> as >>>>>>>> you >>>>>>>>> noticed. >>>>>>>>> If these jobs are easy to run, could you check that the old and new >>>>>>>>> versions are similar with "-pc_gamg_square_graph 0 ", ( and you >>>> only >>>>>>>> need >>>>>>>>> one time step). >>>>>>>>> All you need to do is check that the first coarse grid has about >> the >>>>>>>> same >>>>>>>>> number of equations (large). >>>>>>>> Unfortunately we're seeing some memory errors when we use this >> option, >>>>>>>> and I'm not entirely clear whether we're just running out of memory >>>> and >>>>>>>> need to put it on a special queue. >>>>>>>> >>>>>>>> The run with square_graph 0 using new PETSc managed to get through >> one >>>>>>>> solve at level 5, and is giving the following mg levels: >>>>>>>> >>>>>>>> rows=174, cols=174, bs=6 >>>>>>>> total: nonzeros=30276, allocated nonzeros=30276 >>>>>>>> -- >>>>>>>> rows=2106, cols=2106, bs=6 >>>>>>>> total: nonzeros=4238532, allocated nonzeros=4238532 >>>>>>>> -- >>>>>>>> rows=21828, cols=21828, bs=6 >>>>>>>> total: nonzeros=62588232, allocated nonzeros=62588232 >>>>>>>> -- >>>>>>>> rows=589824, cols=589824, bs=6 >>>>>>>> total: nonzeros=1082528928, allocated >> nonzeros=1082528928 >>>>>>>> -- >>>>>>>> rows=2433222, cols=2433222, bs=3 >>>>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 >>>>>>>> >>>>>>>> comparing with square_graph 100 with new PETSc >>>>>>>> >>>>>>>> rows=96, cols=96, bs=6 >>>>>>>> total: nonzeros=9216, allocated nonzeros=9216 >>>>>>>> -- >>>>>>>> rows=1440, cols=1440, bs=6 >>>>>>>> total: nonzeros=647856, allocated nonzeros=647856 >>>>>>>> -- >>>>>>>> rows=97242, cols=97242, bs=6 >>>>>>>> total: nonzeros=65656836, allocated nonzeros=65656836 >>>>>>>> -- >>>>>>>> rows=2433222, cols=2433222, bs=3 >>>>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 >>>>>>>> >>>>>>>> and old PETSc with square_graph 100 >>>>>>>> >>>>>>>> rows=90, cols=90, bs=6 >>>>>>>> total: nonzeros=8100, allocated nonzeros=8100 >>>>>>>> -- >>>>>>>> rows=1872, cols=1872, bs=6 >>>>>>>> total: nonzeros=1234080, allocated nonzeros=1234080 >>>>>>>> -- >>>>>>>> rows=47652, cols=47652, bs=6 >>>>>>>> total: nonzeros=23343264, allocated nonzeros=23343264 >>>>>>>> -- >>>>>>>> rows=2433222, cols=2433222, bs=3 >>>>>>>> total: nonzeros=456526098, allocated nonzeros=456526098 >>>>>>>> -- >>>>>>>> >>>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a >> single >>>>>>>> solve before giving the memory error >>>>>>>> >>>>>>> OK, thanks for trying. >>>>>>> >>>>>>> I am working on this and I will give you a branch to test, but if you >>>> can >>>>>>> rebuild PETSc here is a quick test that might fix your problem. >>>>>>> In src/ksp/pc/impls/gamg/agg.c you will see: >>>>>>> >>>>>>> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); >>>>>>> >>>>>>> If you can comment this out in the new code and compare with the old, >>>>>>> that might fix the problem. >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>>>> BTW, I am starting to think I should add the old method back as an >>>>>>>> option. >>>>>>>>> I did not think this change would cause large differences. >>>>>>>> Yes, I think that would be much appreciated. Let us know if we can >> do >>>>>>>> any testing >>>>>>>> >>>>>>>> Best wishes >>>>>>>> Stephan >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Note that we are providing the rigid body near nullspace, >>>>>>>>>> hence the bs=3 to bs=6. >>>>>>>>>> We have tried different values for the gamg_threshold but it >> doesn't >>>>>>>>>> really seem to significantly alter the coarsening amount in that >>>> first >>>>>>>>>> step. >>>>>>>>>> >>>>>>>>>> Do you have any suggestions for further things we should try/look >>>> at? >>>>>>>>>> Any feedback would be much appreciated >>>>>>>>>> >>>>>>>>>> Best wishes >>>>>>>>>> Stephan Kramer >>>>>>>>>> >>>>>>>>>> Full logs including log_view timings available from >>>>>>>>>> https://github.com/stephankramer/petsc-scaling/ >>>>>>>>>> >>>>>>>>>> In particular: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> From mfadams at lbl.gov Thu Oct 5 19:51:38 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 5 Oct 2023 20:51:38 -0400 Subject: [petsc-users] performance regression with GAMG In-Reply-To: <0aec9ffa-ccc1-9481-47c7-c32e69903f45@imperial.ac.uk> References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> <99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk> <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk> <0aec9ffa-ccc1-9481-47c7-c32e69903f45@imperial.ac.uk> Message-ID: Fantastic, it will get merged soon. Thank you for your diligence and patience. This would have been a time bomb waiting to explode. Mark On Thu, Oct 5, 2023 at 7:23?PM Stephan Kramer wrote: > Great, that seems to fix the issue indeed - i.e. on the branch with the > low memory filtering switched off (by default) we no longer see the > "inconsistent data" error or hangs, and going back to the square graph > aggressive coarsening brings us back the old performance. So we'd be > keen to have that branch merged indeed > Many thanks for your assistance with this > Stephan > > On 05/10/2023 01:11, Mark Adams wrote: > > Thanks Stephan, > > > > It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat > > is waiting for messages that were not sent. A bug. > > > > Can you try my branch, which is ready to merge, adams/gamg-fast-filter. > > We added a new filtering method in main that uses low memory but I found > it > > was slow, so this branch brings back the old filter code, used by > default, > > and keeps the low memory version as an option. > > It is possible this low memory filtering messed up the internals of the > Mat > > in some way. > > I hope this is it, but if not we can continue. > > > > This MR also makes square graph the default. > > I have found it does create better aggregates and on GPUs, with Kokkos > bug > > fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs) > > > > Mark > > > > > > > > > > On Wed, Oct 4, 2023 at 12:30?AM Stephan Kramer > > wrote: > > > >> Hi Mark > >> > >> Thanks again for re-enabling the square graph aggressive coarsening > >> option which seems to have restored performance for most of our cases. > >> Unfortunately we do have a remaining issue, which only seems to occur > >> for the larger mesh size ("level 7" which has 6,389,890 vertices and we > >> normally run on 1536 cpus): we either get a "Petsc has generated > >> inconsistent data" error, or a hang - both when constructing the square > >> graph matrix. So this is with the new > >> -pc_gamg_aggressive_square_graph=true option, without the option there's > >> no error but of course we would get back to the worse performance. > >> > >> Backtrace for the "inconsistent data" error. Note this is actually just > >> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge > >> of adams/gamg-add-old-coarsening into main - with one unrelated commit > >> from firedrake > >> > >> [0]PETSC ERROR: --------------------- Error Message > >> -------------------------------------------------------------- > >> [0]PETSC ERROR: Petsc has generated inconsistent data > >> [0]PETSC ERROR: j 8 not equal to expected number of sends 9 > >> [0]PETSC ERROR: Petsc Development GIT revision: > >> v3.4.2-43104-ga3b76b71a1 GIT Date: 2023-09-18 10:26:04 +0100 > >> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a named > >> gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct 4 14:30:41 2023 > >> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix > >> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1 > >> --with-fortran-bindings=0 --with-zlib --with-c2html=0 > >> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx > >> --with-fc=mpifort --download-hdf5 --download-hypre > >> --download-superlu_dist --download-ptscotch --download-suitesparse > >> --download-pastix --download-hwloc --download-metis --download-scalapack > >> --download-mumps --download-chaco --download-ml > >> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441 > >> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at > >> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270 > >> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at > >> > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867 > >> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at > >> > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 > >> [0]PETSC ERROR: #4 MatProductSymbolic() at > >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 > >> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 > >> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 > >> [0]PETSC ERROR: #7 PCSetUp_GAMG() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 > >> [0]PETSC ERROR: #8 PCSetUp() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 > >> [0]PETSC ERROR: #9 PCApply() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 > >> [0]PETSC ERROR: #10 PCApply() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 > >> [0]PETSC ERROR: #11 KSP_PCApply() at > >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 > >> [0]PETSC ERROR: #12 KSPSolve_CG() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 > >> [0]PETSC ERROR: #13 KSPSolve_Private() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 > >> [0]PETSC ERROR: #14 KSPSolve() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 > >> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at > >> > >> > /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175 > >> [0]PETSC ERROR: #16 PCApply() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 > >> [0]PETSC ERROR: #17 KSP_PCApply() at > >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 > >> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25 > >> [0]PETSC ERROR: #19 KSPSolve_Private() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 > >> [0]PETSC ERROR: #20 KSPSolve() at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 > >> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at > >> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49 > >> [0]PETSC ERROR: #22 SNESSolve() at > >> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635 > >> > >> Last -info :pc messages: > >> > >> [0] PCSetUp(): Setting up PC for first time > >> [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0) > >> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536 > >> [0] PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in > >> graph (1.588710e+07 1.765233e+06) > >> [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: > >> Square Graph on level 1 > >> [0] fixAggregatesWithSquare(): isMPI = yes > >> [0] PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_: > >> New grid 380144 nodes > >> [0] PCGAMGOptProlongator_AGG(): > >> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00 > >> min=9.015236e-02 PC=jacobi > >> [0] PCGAMGOptProlongator_AGG(): > >> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra > >> 0.0901524 4.48938 > >> [0] PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_: > >> Coarse grid reduction from 1536 to 1536 active processes > >> [0] PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1) > >> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes > >> [0] PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in > >> graph (5.310360e+05 5.353000e+03) > >> [0] PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: > >> Square Graph on level 2 > >> > >> The hang (on a slightly different model configuration but on the same > >> mesh and n/o cores) seems to occur in the same location. If I use gdb to > >> attach to the running processes, it seems on some cores it has somehow > >> manages to fall out of the pcsetup and is waiting in the first norm > >> calculation in the outside CG iteration: > >> > >> #0 0x000014cce9999119 in > >> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from > >> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so > >> #1 0x000014ccef2c2737 in _coll_ml_allreduce () from > >> /apps/hcoll/4.7.3202/lib/libhcoll.so.1 > >> #2 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1, > >> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 , > >> op=0x14cd26cfbc20 , comm=0x3076fb0, module=0x43a0110) > >> at > >> > >> > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228 > >> #3 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1, > >> recvbuf=, count=1, datatype=, > >> op=0x14cd26cfbc20 , comm=0x3076fb0) at pallreduce.c:113 > >> #4 0x000014cd271c9889 in VecNorm_MPI_Default (xin=, > >> type=, z=, VecNorm_SeqFn=) > >> at > >> > >> > /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168 > >> #5 VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at > >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39 > >> #6 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648, > >> val=0x22d) at > >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214 > >> #7 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163 > >> etc. > >> > >> but with other cores still stuck at: > >> > >> #0 0x000015375cf41e8a in ucp_worker_progress () from > >> /apps/ucx/1.12.0/lib/libucp.so.0 > >> #1 0x000015377d4bd57b in opal_progress () at > >> > >> > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231 > >> #2 0x000015377d4c3ba5 in ompi_sync_wait_mt > >> (sync=sync at entry=0x7ffd6aedf6f0) at > >> > >> > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85 > >> #3 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8, > >> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at > >> > >> > /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124 > >> #4 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0, > >> indx=0x7ffd6aedfa60, status=) at pwaitany.c:86 > >> #5 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ > >> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at > >> > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884 > >> #6 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ > >> (C=0x2cc7500) at > >> > /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 > >> #7 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at > >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 > >> #8 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500, > >> Gmat1=0x1, Gmat2=0xc0fe132c) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 > >> #9 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500, > >> a_Gmat1=0x1, agg_lists=0xc0fe132c) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 > >> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 > >> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 > >> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 > >> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply > >> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524, > >> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082 > >> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS > >> (func=0x15378f302890, args=0x83b3218, nargsf=, > >> kwnames=) at ../Objects/descrobject.c:405 > >> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0, > >> nargsf=, args=0x83b3218, callable=0x15378f302890, > >> tstate=0x23e0020) at ../Include/cpython/abstract.h:114 > >> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=, > >> args=0x83b3218, callable=0x15378f302890) at > >> ../Include/cpython/abstract.h:123 > >> #17 call_function (kwnames=0x0, oparg=, > >> pp_stack=, trace_info=0x7ffd6aee0390, > >> tstate=) at ../Python/ceval.c:5867 > >> #18 _PyEval_EvalFrameDefault (tstate=, f=, > >> throwflag=) at ../Python/ceval.c:4198 > >> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080, > >> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46 > >> #20 _PyEval_Vector (tstate=, con=, > >> locals=, args=, argcount=4, > >> kwnames=) at ../Python/ceval.c:5065 > >> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func= >> out>, args=0x1, _nargs=, kwargs=) at > >> src/petsc4py/PETSc.c:548022 > >> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500, > >> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979 > >> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 > >> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1, > >> y=0xc0fe132c) at > >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 > >> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at > >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 > >> > >> Let me know if there is anything further we can try to debug this issue > >> > >> Kind regards > >> Stephan Kramer > >> > >> > >> On 02/09/2023 01:58, Mark Adams wrote: > >>> Fantastic! > >>> > >>> I fixed a memory free problem. You should be OK now. > >>> I am pretty sure you are good but I would like to wait to get any > >> feedback > >>> from you. > >>> We should have a release at the end of the month and it would be nice > to > >>> get this into it. > >>> > >>> Thanks, > >>> Mark > >>> > >>> > >>> On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer > > >>> wrote: > >>> > >>>> Hi Mark > >>>> > >>>> Sorry took a while to report back. We have tried your branch but hit a > >>>> few issues, some of which we're not entirely sure are related. > >>>> > >>>> First switching off minimum degree ordering, and then switching to the > >>>> old version of aggressive coarsening, as you suggested, got us back to > >>>> the coarsening behaviour that we had previously, but then we also > >>>> observed an even further worsening of the iteration count: it had > >>>> previously gone up by 50% already (with the newer main petsc), but now > >>>> was more than double "old" petsc. Took us a while to realize this was > >>>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. > >>>> Switching this also back to the old default we get back to very > similar > >>>> coarsening levels (see below for more details if it is of interest) > and > >>>> iteration counts. > >>>> > >>>> So that's all very good news. However, we were also starting seeing > >>>> memory errors (double free or corruption) when we switched off the > >>>> minimum degree ordering. Because this was at an earlier version of > your > >>>> branch we then rebuild, hoping this was just an earlier bug that had > >>>> been fixed, but then we were having MPI-lockup issues. We have now > >>>> figured out the MPI issues are completely unrelated - some combination > >>>> with a newer mpi build and firedrake on our cluster which also occur > >>>> using main branches of everything. So switching back to an older MPI > >>>> build we are hoping to now test your most recent version of > >>>> adams/gamg-add-old-coarsening with these options and see whether the > >>>> memory errors are still there. Will let you know > >>>> > >>>> Best wishes > >>>> Stephan Kramer > >>>> > >>>> Coarsening details with various options for Level 6 of the test case: > >>>> > >>>> In our original setup (using "old" petsc), we had: > >>>> > >>>> rows=516, cols=516, bs=6 > >>>> rows=12660, cols=12660, bs=6 > >>>> rows=346974, cols=346974, bs=6 > >>>> rows=19169670, cols=19169670, bs=3 > >>>> > >>>> Then with the newer main petsc we had > >>>> > >>>> rows=666, cols=666, bs=6 > >>>> rows=7740, cols=7740, bs=6 > >>>> rows=34902, cols=34902, bs=6 > >>>> rows=736578, cols=736578, bs=6 > >>>> rows=19169670, cols=19169670, bs=3 > >>>> > >>>> Then on your branch with minimum_degree_ordering False: > >>>> > >>>> rows=504, cols=504, bs=6 > >>>> rows=2274, cols=2274, bs=6 > >>>> rows=11010, cols=11010, bs=6 > >>>> rows=35790, cols=35790, bs=6 > >>>> rows=430686, cols=430686, bs=6 > >>>> rows=19169670, cols=19169670, bs=3 > >>>> > >>>> And with minimum_degree_ordering False and use_aggressive_square_graph > >>>> True: > >>>> > >>>> rows=498, cols=498, bs=6 > >>>> rows=12672, cols=12672, bs=6 > >>>> rows=346974, cols=346974, bs=6 > >>>> rows=19169670, cols=19169670, bs=3 > >>>> > >>>> So that is indeed pretty much back to what it was before > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On 31/08/2023 23:40, Mark Adams wrote: > >>>>> Hi Stephan, > >>>>> > >>>>> This branch is settling down. adams/gamg-add-old-coarsening > >>>>> < > >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening> > >>>>> I made the old, not minimum degree, ordering the default but kept the > >> new > >>>>> "aggressive" coarsening as the default, so I am hoping that just > adding > >>>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests > >> will > >>>>> get you back to where you were before. > >>>>> Fingers crossed ... let me know if you have any success or not. > >>>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>>> > >>>>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams wrote: > >>>>> > >>>>>> Hi Stephan, > >>>>>> > >>>>>> I have a branch that you can try: adams/gamg-add-old-coarsening > >>>>>> < > >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening > >>>>>> Things to test: > >>>>>> * First, verify that nothing unintended changed by reproducing your > >> bad > >>>>>> results with this branch (the defaults are the same) > >>>>>> * Try not using the minimum degree ordering that I suggested > >>>>>> with: -pc_gamg_use_minimum_degree_ordering false > >>>>>> -- I am eager to see if that is the main problem. > >>>>>> * Go back to what I think is the old method: > >>>>>> -pc_gamg_use_minimum_degree_ordering > >>>>>> false -pc_gamg_use_aggressive_square_graph true > >>>>>> > >>>>>> When we get back to where you were, I would like to try to get > modern > >>>>>> stuff working. > >>>>>> I did add a -pc_gamg_aggressive_mis_k <2> > >>>>>> You could to another step of MIS coarsening with > >>>> -pc_gamg_aggressive_mis_k > >>>>>> 3 > >>>>>> > >>>>>> Anyway, lots to look at but, alas, AMG does have a lot of > parameters. > >>>>>> > >>>>>> Thanks, > >>>>>> Mark > >>>>>> > >>>>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams wrote: > >>>>>> > >>>>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer < > >>>> s.kramer at imperial.ac.uk> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Many thanks for looking into this, Mark > >>>>>>>>> My 3D tests were not that different and I see you lowered the > >>>>>>>> threshold. > >>>>>>>>> Note, you can set the threshold to zero, but your test is running > >> so > >>>>>>>> much > >>>>>>>>> differently than mine there is something else going on. > >>>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to > >> shoot > >>>>>>>> for > >>>>>>>>> in 3D. > >>>>>>>>> > >>>>>>>>> So it is not clear what the problem is. Some questions: > >>>>>>>>> > >>>>>>>>> * do you have a picture of this mesh to show me? > >>>>>>>> It's just a standard hexahedral cubed sphere mesh with the > >> refinement > >>>>>>>> level giving the number of times each of the six sides have been > >>>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to > >> 16 > >>>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 > x > >>>> 16 = > >>>>>>>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) > >>>> going > >>>>>>>> to the next Level > >>>>>>>> > >>>>>>> I see, and I assume these are pretty stretched elements. > >>>>>>> > >>>>>>> > >>>>>>>>> * what do you mean by Q1-Q2 elements? > >>>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for > >> velocity > >>>>>>>> and (tri)linear for pressure > >>>>>>>> > >>>>>>>> I guess you could argue we could/should just do good old geometric > >>>>>>>> multigrid instead. More generally we do use this solver > >> configuration > >>>> a > >>>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our > >>>>>>>> adaptive mesh runs - would it be worth to see if we have the same > >>>>>>>> performance issues with tetrahedral P2-P1? > >>>>>>>> > >>>>>>> No, you have a clear reproducer, if not minimal. > >>>>>>> The first coarsening is very different. > >>>>>>> > >>>>>>> I am working on this and I see that I added a heuristic for thin > >> bodies > >>>>>>> where you order the vertices in greedy algorithms with minimum > degree > >>>> first. > >>>>>>> This will tend to pick corners first, edges then faces, etc. > >>>>>>> That may be the problem. I would like to understand it better (see > >>>> below). > >>>>>>> > >>>>>>>>> It would be nice to see if the new and old codes are similar > >> without > >>>>>>>>> aggressive coarsening. > >>>>>>>>> This was the intended change of the major change in this time > frame > >>>> as > >>>>>>>> you > >>>>>>>>> noticed. > >>>>>>>>> If these jobs are easy to run, could you check that the old and > new > >>>>>>>>> versions are similar with "-pc_gamg_square_graph 0 ", ( and you > >>>> only > >>>>>>>> need > >>>>>>>>> one time step). > >>>>>>>>> All you need to do is check that the first coarse grid has about > >> the > >>>>>>>> same > >>>>>>>>> number of equations (large). > >>>>>>>> Unfortunately we're seeing some memory errors when we use this > >> option, > >>>>>>>> and I'm not entirely clear whether we're just running out of > memory > >>>> and > >>>>>>>> need to put it on a special queue. > >>>>>>>> > >>>>>>>> The run with square_graph 0 using new PETSc managed to get through > >> one > >>>>>>>> solve at level 5, and is giving the following mg levels: > >>>>>>>> > >>>>>>>> rows=174, cols=174, bs=6 > >>>>>>>> total: nonzeros=30276, allocated nonzeros=30276 > >>>>>>>> -- > >>>>>>>> rows=2106, cols=2106, bs=6 > >>>>>>>> total: nonzeros=4238532, allocated nonzeros=4238532 > >>>>>>>> -- > >>>>>>>> rows=21828, cols=21828, bs=6 > >>>>>>>> total: nonzeros=62588232, allocated > nonzeros=62588232 > >>>>>>>> -- > >>>>>>>> rows=589824, cols=589824, bs=6 > >>>>>>>> total: nonzeros=1082528928, allocated > >> nonzeros=1082528928 > >>>>>>>> -- > >>>>>>>> rows=2433222, cols=2433222, bs=3 > >>>>>>>> total: nonzeros=456526098, allocated > nonzeros=456526098 > >>>>>>>> > >>>>>>>> comparing with square_graph 100 with new PETSc > >>>>>>>> > >>>>>>>> rows=96, cols=96, bs=6 > >>>>>>>> total: nonzeros=9216, allocated nonzeros=9216 > >>>>>>>> -- > >>>>>>>> rows=1440, cols=1440, bs=6 > >>>>>>>> total: nonzeros=647856, allocated nonzeros=647856 > >>>>>>>> -- > >>>>>>>> rows=97242, cols=97242, bs=6 > >>>>>>>> total: nonzeros=65656836, allocated > nonzeros=65656836 > >>>>>>>> -- > >>>>>>>> rows=2433222, cols=2433222, bs=3 > >>>>>>>> total: nonzeros=456526098, allocated > nonzeros=456526098 > >>>>>>>> > >>>>>>>> and old PETSc with square_graph 100 > >>>>>>>> > >>>>>>>> rows=90, cols=90, bs=6 > >>>>>>>> total: nonzeros=8100, allocated nonzeros=8100 > >>>>>>>> -- > >>>>>>>> rows=1872, cols=1872, bs=6 > >>>>>>>> total: nonzeros=1234080, allocated nonzeros=1234080 > >>>>>>>> -- > >>>>>>>> rows=47652, cols=47652, bs=6 > >>>>>>>> total: nonzeros=23343264, allocated > nonzeros=23343264 > >>>>>>>> -- > >>>>>>>> rows=2433222, cols=2433222, bs=3 > >>>>>>>> total: nonzeros=456526098, allocated > nonzeros=456526098 > >>>>>>>> -- > >>>>>>>> > >>>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a > >> single > >>>>>>>> solve before giving the memory error > >>>>>>>> > >>>>>>> OK, thanks for trying. > >>>>>>> > >>>>>>> I am working on this and I will give you a branch to test, but if > you > >>>> can > >>>>>>> rebuild PETSc here is a quick test that might fix your problem. > >>>>>>> In src/ksp/pc/impls/gamg/agg.c you will see: > >>>>>>> > >>>>>>> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); > >>>>>>> > >>>>>>> If you can comment this out in the new code and compare with the > old, > >>>>>>> that might fix the problem. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Mark > >>>>>>> > >>>>>>> > >>>>>>>>> BTW, I am starting to think I should add the old method back as > an > >>>>>>>> option. > >>>>>>>>> I did not think this change would cause large differences. > >>>>>>>> Yes, I think that would be much appreciated. Let us know if we can > >> do > >>>>>>>> any testing > >>>>>>>> > >>>>>>>> Best wishes > >>>>>>>> Stephan > >>>>>>>> > >>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Mark > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Note that we are providing the rigid body near nullspace, > >>>>>>>>>> hence the bs=3 to bs=6. > >>>>>>>>>> We have tried different values for the gamg_threshold but it > >> doesn't > >>>>>>>>>> really seem to significantly alter the coarsening amount in that > >>>> first > >>>>>>>>>> step. > >>>>>>>>>> > >>>>>>>>>> Do you have any suggestions for further things we should > try/look > >>>> at? > >>>>>>>>>> Any feedback would be much appreciated > >>>>>>>>>> > >>>>>>>>>> Best wishes > >>>>>>>>>> Stephan Kramer > >>>>>>>>>> > >>>>>>>>>> Full logs including log_view timings available from > >>>>>>>>>> https://github.com/stephankramer/petsc-scaling/ > >>>>>>>>>> > >>>>>>>>>> In particular: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Oct 6 06:01:08 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 6 Oct 2023 13:01:08 +0200 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: References: Message-ID: I am getting an error in a different place than you. I started to debug, but don't have much time at the moment. Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix? Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian(). Jose > El 6 oct 2023, a las 0:37, Kenneth C Hall escribi?: > > Hi all, > > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x. > > I am trying to implement in SLEPc/NEP. To get started, I have defined a much smaller and simpler system of the form > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP. > > I have set the problem up using shell matrices (as that is my ultimate goal). The full code is attached, but here is a smaller snippet of code: > > !.... Create matrix-free operators for A and B > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr)) > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr)) > PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr)) > PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr)) > > !.... Create nonlinear eigensolver > PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr)) > > !.... Set the problem type > PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr)) > ! > !.... set the solver type > PetscCall(NEPSetType(nep, NEPNLEIGS, ierr)) > ! > !.... Set functions and Jacobians for NEP > PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr)) > PetscCall(NEPSetJacobian(nep, B, MyNEPJacobian, PETSC_NULL_INTEGER, ierr)) > > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B. The NEP viewer and error messages are attached. > > Any help on getting this problem properly set up would be greatly appreciated. > > Kenneth Hall > ATTACHMENTS: > test_nep.f90 > code_output > > From hongzhang at anl.gov Fri Oct 6 08:15:12 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Fri, 6 Oct 2023 13:15:12 +0000 Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: <7839BCF7-8FEC-4AAA-94FF-AABEB42586BC@anl.gov> I noticed that you are using ARKIMEX in the code. A temporary workaround you can try is to disable adaptive time stepping, e.g. by using the option -ts_adapt_type none. Then MatShift() will not be called when the Jacobians are computed. Hong (Mr.) On Oct 5, 2023, at 4:52 PM, Fackler, Philip via petsc-users wrote: Aha! That makes sense. Thank you. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Thursday, October 5, 2023 17:29 To: Fackler, Philip Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net ; Blondel, Sophie Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface Wait a moment, it seems it was because we do not have a GPU implementation of MatShift... Let me see how to add it. --Junchao Zhang On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang > wrote: Hi, Philip, I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface? MatSetValues() needs to copy the data from device to host and thus is expensive. Do you have profiling results with COO enabled? --Junchao Zhang On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: Hi, Philip, I will look into the tarballs and get back to you. Thanks. --Junchao Zhang On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users > wrote: We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration). Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible. The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this. The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated. The tarballs linked here and here are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction. Thanks for your help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.c.hall at duke.edu Fri Oct 6 09:56:29 2023 From: kenneth.c.hall at duke.edu (Kenneth C Hall) Date: Fri, 6 Oct 2023 14:56:29 +0000 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: References: Message-ID: Jose, Thanks for this. I will try this and report back. Kenneth From: Jose E. Roman Date: Friday, October 6, 2023 at 7:01 AM To: Kenneth C Hall Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) I am getting an error in a different place than you. I started to debug, but don't have much time at the moment. Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix? Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian(). Jose > El 6 oct 2023, a las 0:37, Kenneth C Hall escribi?: > > Hi all, > > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x. > > I am trying to implement in SLEPc/NEP. To get started, I have defined a much smaller and simpler system of the form > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP. > > I have set the problem up using shell matrices (as that is my ultimate goal). The full code is attached, but here is a smaller snippet of code: > > !.... Create matrix-free operators for A and B > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr)) > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr)) > PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr)) > PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr)) > > !.... Create nonlinear eigensolver > PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr)) > > !.... Set the problem type > PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr)) > ! > !.... set the solver type > PetscCall(NEPSetType(nep, NEPNLEIGS, ierr)) > ! > !.... Set functions and Jacobians for NEP > PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr)) > PetscCall(NEPSetJacobian(nep, B, MyNEPJacobian, PETSC_NULL_INTEGER, ierr)) > > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B. The NEP viewer and error messages are attached. > > Any help on getting this problem properly set up would be greatly appreciated. > > Kenneth Hall > ATTACHMENTS: > test_nep.f90 > code_output > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.c.hall at duke.edu Fri Oct 6 15:28:14 2023 From: kenneth.c.hall at duke.edu (Kenneth C Hall) Date: Fri, 6 Oct 2023 20:28:14 +0000 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: References: Message-ID: Jose, Unfortunately, I was unable to implement the MATOP_DUPLICATE operation in fortran (and I do not know enough c to work in c). Here is the error message I get: [0]PETSC ERROR: #1 MatShellSetOperation_Fortran() at /Users/hall/Documents/Fortran_Codes/Packages/petsc/src/mat/impls/shell/ftn-custom/zshellf.c:283 [0]PETSC ERROR: #2 src/test_nep.f90:62 When I look at zshellf.c, MATOP_DUPLICATE is not one of the supported operations. See below. Kenneth /** * Subset of MatOperation that is supported by the Fortran wrappers. */ enum FortranMatOperation { FORTRAN_MATOP_MULT = 0, FORTRAN_MATOP_MULT_ADD = 1, FORTRAN_MATOP_MULT_TRANSPOSE = 2, FORTRAN_MATOP_MULT_TRANSPOSE_ADD = 3, FORTRAN_MATOP_SOR = 4, FORTRAN_MATOP_TRANSPOSE = 5, FORTRAN_MATOP_GET_DIAGONAL = 6, FORTRAN_MATOP_DIAGONAL_SCALE = 7, FORTRAN_MATOP_ZERO_ENTRIES = 8, FORTRAN_MATOP_AXPY = 9, FORTRAN_MATOP_SHIFT = 10, FORTRAN_MATOP_DIAGONAL_SET = 11, FORTRAN_MATOP_DESTROY = 12, FORTRAN_MATOP_VIEW = 13, FORTRAN_MATOP_CREATE_VECS = 14, FORTRAN_MATOP_GET_DIAGONAL_BLOCK = 15, FORTRAN_MATOP_COPY = 16, FORTRAN_MATOP_SCALE = 17, FORTRAN_MATOP_SET_RANDOM = 18, FORTRAN_MATOP_ASSEMBLY_BEGIN = 19, FORTRAN_MATOP_ASSEMBLY_END = 20, FORTRAN_MATOP_SIZE = 21 }; From: Jose E. Roman Date: Friday, October 6, 2023 at 7:01 AM To: Kenneth C Hall Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) I am getting an error in a different place than you. I started to debug, but don't have much time at the moment. Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix? Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian(). Jose > El 6 oct 2023, a las 0:37, Kenneth C Hall escribi?: > > Hi all, > > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x. > > I am trying to implement in SLEPc/NEP. To get started, I have defined a much smaller and simpler system of the form > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP. > > I have set the problem up using shell matrices (as that is my ultimate goal). The full code is attached, but here is a smaller snippet of code: > > !.... Create matrix-free operators for A and B > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr)) > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr)) > PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr)) > PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr)) > > !.... Create nonlinear eigensolver > PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr)) > > !.... Set the problem type > PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr)) > ! > !.... set the solver type > PetscCall(NEPSetType(nep, NEPNLEIGS, ierr)) > ! > !.... Set functions and Jacobians for NEP > PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr)) > PetscCall(NEPSetJacobian(nep, B, MyNEPJacobian, PETSC_NULL_INTEGER, ierr)) > > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B. The NEP viewer and error messages are attached. > > Any help on getting this problem properly set up would be greatly appreciated. > > Kenneth Hall > ATTACHMENTS: > test_nep.f90 > code_output > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiyuelu1 at gmail.com Fri Oct 6 16:40:12 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Fri, 6 Oct 2023 16:40:12 -0500 Subject: [petsc-users] 'nvcc -show' Error for configure with NVCC Message-ID: Hello, I am trying to configure PETSc(current release version) with NVCC, with these options: ./configure --with-cc=nvcc --with-cxx=nvcc --with-fc=0 --with-cuda=1 However, I got error like: --------------------------------------------------------------------------------------------- Could not execute "['nvcc -show']": nvcc fatal : Unknown option '-show' ********************************************************************************************* I wonder where this -show option comes from? It seems safe to disable this option. Thanks, Qiyue Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Oct 6 16:50:03 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 6 Oct 2023 16:50:03 -0500 (CDT) Subject: [petsc-users] 'nvcc -show' Error for configure with NVCC In-Reply-To: References: Message-ID: <28d271d8-f320-e982-5cbb-1e2bf50893bb@mcs.anl.gov> On Fri, 6 Oct 2023, Qiyue Lu wrote: > Hello, > I am trying to configure PETSc(current release version) with NVCC, with > these options: > ./configure --with-cc=nvcc --with-cxx=nvcc --with-fc=0 --with-cuda=1 this usage is incorrect. You need: --with-cc=mpicc --with-cxx=mpicxx --with-cudac=nvcc --with-cuda=1 Satish > > However, I got error like: > --------------------------------------------------------------------------------------------- > Could not execute "['nvcc -show']": > nvcc fatal : Unknown option '-show' > ********************************************************************************************* > > I wonder where this -show option comes from? It seems safe to disable this > option. > > Thanks, > Qiyue Lu > From Roland.Richter at empa.ch Mon Oct 9 07:32:16 2023 From: Roland.Richter at empa.ch (Richter, Roland) Date: Mon, 9 Oct 2023 12:32:16 +0000 Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails Message-ID: Hei, I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx" as C-compiler, as described by https://stackoverflow.com/a/76362396. Therefore, I adapted the configure-line as follow: ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=no --download-p4est=yes --download-sowing=yes --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1 The configuration, however, fails with The CMAKE_C_COMPILER: mpiicc -cc=icx is not a full path and was not found in the PATH for all additional modules which use a cmake-based configuration approach (such as OPENCASCADE). How could I solve that problem? Thank you! Regards, Roland Richter -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3969230 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 7926 bytes Desc: not available URL: From junchao.zhang at gmail.com Mon Oct 9 09:23:28 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 9 Oct 2023 09:23:28 -0500 Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails In-Reply-To: References: Message-ID: Could you just use "--with-cc=mpiicx --with-cxx=mpiicpx" ? In addition, you can export environment vars I_MPI_CC=icx and I_MPI_CXX=icpx to specify the underlying compilers. --Junchao Zhang On Mon, Oct 9, 2023 at 7:33?AM Richter, Roland wrote: > Hei, > > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with > Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc > -cc=icx" as C-compiler, as described by > https://stackoverflow.com/a/76362396. Therefore, I adapted the > configure-line as follow: > > > > *./configure --prefix=/media/storage/local_opt/petsc > --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc > -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC > -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true > --with-mpi=true > --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ > --with-openmp=true --download-hdf5=yes --download-netcdf=yes > --download-chaco=no --download-metis=yes --download-slepc=yes > --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes > --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes > --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 > --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost > --download-opencascade=yes --with-fftw=1 > --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes > --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 > --download-muparser=no --download-p4est=yes --download-sowing=yes > --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1* > > > > The configuration, however, fails with > > > > *The CMAKE_C_COMPILER:* > > > > * mpiicc -cc=icx* > > > > * is not a full path and was not found in the PATH* > > > > for all additional modules which use a cmake-based configuration approach > (such as OPENCASCADE). How could I solve that problem? > > > > Thank you! > > Regards, > > Roland Richter > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Oct 9 09:23:55 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 9 Oct 2023 10:23:55 -0400 Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails In-Reply-To: References: Message-ID: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev> Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc) and export I_MPI_CC=icx export I_MPI_CXX=icpx export I_MPI_F90=ifx > On Oct 9, 2023, at 8:32 AM, Richter, Roland wrote: > > Hei, > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx" as C-compiler, as described by https://stackoverflow.com/a/76362396. Therefore, I adapted the configure-line as follow: > > ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=no --download-p4est=yes --download-sowing=yes --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1 > > The configuration, however, fails with > > The CMAKE_C_COMPILER: > > mpiicc -cc=icx > > is not a full path and was not found in the PATH > > for all additional modules which use a cmake-based configuration approach (such as OPENCASCADE). How could I solve that problem? > > Thank you! > Regards, > Roland Richter > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pierre.LEDAC at cea.fr Mon Oct 9 09:35:49 2023 From: Pierre.LEDAC at cea.fr (LEDAC Pierre) Date: Mon, 9 Oct 2023 14:35:49 +0000 Subject: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower) convergence for classical AMG (sequential and especially in parallel) Message-ID: <4c9f02898f324fcd8be1fe5dcc9f0416@cea.fr> Hello all, I am struggling to find the same convergence in iterations when using classical algebric multigrid in my code with PETSc 3.20 compared to PETSc 3.14. I am using in order to solve a Poisson system: -ksp_type cg -pc_type gamg -pc_gamg_type classical I read the different releases notes between 3.15 and 3.20: https://petsc.org/release/changes/317 https://petsc.org/main/manualpages/PC/PCGAMGSetThreshold/ And have a look at the archive mailing list (especially this one: https://www.mail-archive.com/petsc-users at mcs.anl.gov/msg46688.html) so I added some other options to try to have the same behaviour than PETSc 3.14: -ksp_type cg -pc_type gamg -pc_gamg_type classical -mg_levels_pc_type sor -pc_gamg_threshold 0. It improves the convergence but there still a different convergence though (26 vs 18 iterations). On another of my test case, the number of levels is different (e.g. 6 vs 4) also, and here it is the same, but with a different coarsening according to the output from the -ksp_view option The main point is that the convergence dramatically degrades in parallel on a third test case, so I can't upgrade to PETSc 3.20 for now unhappily. I send you the partial report (petsc_314_vs_petsc_320.ksp_view) with -ksp_view (left PETSc 3.14, right PETSc 3.20) and the configure/command line options used (in petsc_XXX_petsc.TU files). Could my issue related to the following 3.18 change ? I have not tried the first one. * Remove PCGAMGSetSymGraph() and -pc_gamg_sym_graph. The user should now indicate symmetry and structural symmetry using MatSetOption() and GAMG will symmetrize the graph if a symmetric options is not set * Change -pc_gamg_reuse_interpolation default from false to true. Any advice would be greatly appreciated, Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?43 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_320_petsc.TU Type: application/octet-stream Size: 15358 bytes Desc: petsc_320_petsc.TU URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_314_petsc.TU Type: application/octet-stream Size: 14761 bytes Desc: petsc_314_petsc.TU URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_314_vs_petsc_320.ksp_view Type: application/octet-stream Size: 30948 bytes Desc: petsc_314_vs_petsc_320.ksp_view URL: From balay at mcs.anl.gov Mon Oct 9 10:29:08 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 9 Oct 2023 10:29:08 -0500 (CDT) Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails In-Reply-To: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev> References: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev> Message-ID: <78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov> Will note - OneAPI MPI usage is documented at https://petsc.org/release/install/install/#mpi Satish On Mon, 9 Oct 2023, Barry Smith wrote: > > Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc) and > > export I_MPI_CC=icx > export I_MPI_CXX=icpx > export I_MPI_F90=ifx > > > > On Oct 9, 2023, at 8:32 AM, Richter, Roland wrote: > > > > Hei, > > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx" as C-compiler, as described by https://stackoverflow.com/a/76362396. Therefore, I adapted the configure-line as follow: > > > > ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=no --download-p4est=yes --download-sowing=y es --dow nload-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1 > > > > The configuration, however, fails with > > > > The CMAKE_C_COMPILER: > > > > mpiicc -cc=icx > > > > is not a full path and was not found in the PATH > > > > for all additional modules which use a cmake-based configuration approach (such as OPENCASCADE). How could I solve that problem? > > > > Thank you! > > Regards, > > Roland Richter > > > > From yc17470 at connect.um.edu.mo Tue Oct 10 08:27:57 2023 From: yc17470 at connect.um.edu.mo (Gong Yujie) Date: Tue, 10 Oct 2023 13:27:57 +0000 Subject: [petsc-users] Scalability problem using PETSc with local installed OpenMPI Message-ID: Dear PETSc developers, I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions. Best Regards, Yujie -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 10 08:54:27 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Oct 2023 09:54:27 -0400 Subject: [petsc-users] Scalability problem using PETSc with local installed OpenMPI In-Reply-To: References: Message-ID: On Tue, Oct 10, 2023 at 9:28?AM Gong Yujie wrote: > Dear PETSc developers, > > I installed OpenMPI3 first and then installed PETSc with that mpi. > Currently, I'm facing a scalability issue, in detail, I tested that using > OpenMPI to calculate an addition of two distributed arrays and I get a good > scalability. The problem is when I calculate the addition of two vectors in > PETSc, I don't have any scalability. For the same size of the problem, > PETSc costs a lot much time than merely using OpenMPI. > > My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you > can give me some suggestions. > 1. For any performance question, we really need to see the output of -log_view for each run. 2. I am not sure I understand your question. Vector addition does not involve communication. Thus it will scale perfectly in the absence of load imbalance. Thanks, Matt > Best Regards, > Yujie > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 10 08:59:53 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Oct 2023 09:59:53 -0400 Subject: [petsc-users] Scalability problem using PETSc with local installed OpenMPI In-Reply-To: References: Message-ID: <764C6422-14C5-4A19-97A3-36BEB80690FB@petsc.dev> Take a look at https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup Check the binding that OpenMPI is using (by the way, there are much more recent OpenMPI versions, I suggest using them). Run the STREAMS benchmark as indicated on that page. Barry > On Oct 10, 2023, at 9:27 AM, Gong Yujie wrote: > > Dear PETSc developers, > > I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. > > My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions. > > Best Regards, > Yujie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 10 09:39:08 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Oct 2023 10:39:08 -0400 Subject: [petsc-users] Scalability problem using PETSc with local installed OpenMPI In-Reply-To: References: <764C6422-14C5-4A19-97A3-36BEB80690FB@petsc.dev> Message-ID: Run STREAMS with MPI_BINDING="-map-by socket --bind-to core --report-bindings" make mpistreams send the result Also run lscpu numactl -H if they are available on your machine, send the result > On Oct 10, 2023, at 10:17 AM, Gong Yujie wrote: > > Dear Barry, > > I tried to use the binding as suggested by PETSc: > mpiexec -n 4 --map-by socket --bind-to socket --report-bindings > But it seems not improving the performance. Here is the make stream log > > Best Regards, > Yujie > > mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O -I/home/tt/petsc-3.16.0/include -I/home/tt/petsc-3.16.0/arch-linux-c-opt/include `pwd`/MPIVersion.c > Running streams with 'mpiexec --oversubscribe ' using 'NPMAX=16' > 1 26119.1937 Rate (MB/s) > 2 29833.4281 Rate (MB/s) 1.1422 > 3 65338.5050 Rate (MB/s) 2.50155 > 4 59832.7482 Rate (MB/s) 2.29076 > 5 48629.8396 Rate (MB/s) 1.86184 > 6 58569.4289 Rate (MB/s) 2.24239 > 7 63827.1144 Rate (MB/s) 2.44369 > 8 57448.5349 Rate (MB/s) 2.19948 > 9 61405.3273 Rate (MB/s) 2.35097 > 10 68021.6111 Rate (MB/s) 2.60428 > 11 71289.0422 Rate (MB/s) 2.72937 > 12 76900.6386 Rate (MB/s) 2.94422 > 13 80198.6807 Rate (MB/s) 3.07049 > 14 64846.3685 Rate (MB/s) 2.48271 > 15 83072.8631 Rate (MB/s) 3.18053 > 16 70128.0166 Rate (MB/s) 2.68492 > ------------------------------------------------ > Traceback (most recent call last): > File "process.py", line 89, in > process(sys.argv[1],len(sys.argv)-2) > File "process.py", line 33, in process > speedups[i] = triads[i]/triads[0] > TypeError: 'dict_values' object does not support indexing > make[2]: [makefile:47: mpistream] Error 1 (ignored) > Traceback (most recent call last): > File "process.py", line 89, in > process(sys.argv[1],len(sys.argv)-2) > File "process.py", line 33, in process > speedups[i] = triads[i]/triads[0] > TypeError: 'dict_values' object does not support indexing > make[2]: [makefile:79: mpistreams] Error 1 (ignored) > From: Barry Smith > Sent: Tuesday, October 10, 2023 9:59 PM > To: Gong Yujie > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Scalability problem using PETSc with local installed OpenMPI > > > Take a look at https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup > > Check the binding that OpenMPI is using (by the way, there are much more recent OpenMPI versions, I suggest using them). Run the STREAMS benchmark as indicated on that page. > > Barry > > >> On Oct 10, 2023, at 9:27 AM, Gong Yujie wrote: >> >> Dear PETSc developers, >> >> I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. >> >> My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions. >> >> Best Regards, >> Yujie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 10 10:10:56 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Oct 2023 11:10:56 -0400 Subject: [petsc-users] Scalability problem using PETSc with local installed OpenMPI In-Reply-To: References: <764C6422-14C5-4A19-97A3-36BEB80690FB@petsc.dev> Message-ID: <0BFABF42-4509-488D-AF88-4559A4ACA14D@petsc.dev> This tells me you cannot realistically expect for large PETSc problems to get much more than a speedup of 2 on this system using two or four MPI processes; there simply is not more memory bandwidth available. There are two NUMA regions and a single core largely saturates a region. Also, always running MPI with the binding is important to get that small speedup. Barry > On Oct 10, 2023, at 10:47 AM, Gong Yujie wrote: > > Here is the result from STREAMS, lscpu and numactl. > > > -----------------------------lscpu------------------------------------ > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Thread(s) per core: 1 > Core(s) per socket: 64 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: AuthenticAMD > CPU family: 23 > Model: 49 > Model name: AMD EPYC 7702 64-Core Processor > Stepping: 0 > CPU MHz: 1996.019 > BogoMIPS: 3992.03 > Virtualization: AMD-V > L1d cache: 32K > L1i cache: 32K > L2 cache: 512K > L3 cache: 16384K > NUMA node0 CPU(s): 0-63 > NUMA node1 CPU(s): 64-127 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca > > ----------------------------numactl -H----------------------------- > available: 2 nodes (0-1) > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 > node 0 size: 128418 MB > node 0 free: 123340 MB > node 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 > node 1 size: 129010 MB > node 1 free: 124685 MB > node distances: > node 0 1 > 0: 10 32 > 1: 32 10 > > > --------------------------STREAMS---------------------------------- > > mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O -I/home/tt/petsc-3.16.0/include -I/home/tt/petsc-3.16.0/arch-linux-c-opt/include `pwd`/MPIVersion.c > Running streams with 'mpiexec --oversubscribe -map-by socket --bind-to core --report-bindings' using 'NPMAX=40' > [cpunode1:68038] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 1 26155.1277 Rate (MB/s) > [cpunode1:68050] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68050] MCW rank 1 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 2 52098.6873 Rate (MB/s) 1.99191 > [cpunode1:68065] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68065] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68065] MCW rank 2 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 3 44731.8512 Rate (MB/s) 1.71025 > [cpunode1:68082] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68082] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68082] MCW rank 2 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68082] MCW rank 3 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 4 59559.5275 Rate (MB/s) 2.27717 > [cpunode1:68103] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68103] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68103] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68103] MCW rank 3 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68103] MCW rank 4 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 5 48477.2117 Rate (MB/s) 1.85345 > [cpunode1:68126] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68126] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68126] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68126] MCW rank 3 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68126] MCW rank 4 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68126] MCW rank 5 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 6 58136.2545 Rate (MB/s) 2.22275 > [cpunode1:68153] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68153] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68153] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68153] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68153] MCW rank 4 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68153] MCW rank 5 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68153] MCW rank 6 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 7 50119.2133 Rate (MB/s) 1.91623 > [cpunode1:68182] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 4 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 5 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 6 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68182] MCW rank 7 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 8 57432.5057 Rate (MB/s) 2.19584 > [cpunode1:68214] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 5 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 6 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 7 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68214] MCW rank 8 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 9 52345.9115 Rate (MB/s) 2.00137 > [cpunode1:68250] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 5 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 6 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 7 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 8 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68250] MCW rank 9 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 10 57727.5090 Rate (MB/s) 2.20712 > [cpunode1:68288] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 6 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 7 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 8 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 9 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68288] MCW rank 10 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 11 52568.6771 Rate (MB/s) 2.00988 > [cpunode1:68330] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 6 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 7 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 8 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 9 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 10 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68330] MCW rank 11 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 12 57286.7990 Rate (MB/s) 2.19027 > [cpunode1:68383] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 7 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 8 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 9 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 10 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 11 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68383] MCW rank 12 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 13 52721.4401 Rate (MB/s) 2.01572 > [cpunode1:68430] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 7 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 8 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 9 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 10 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 11 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 12 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68430] MCW rank 13 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 14 56787.2447 Rate (MB/s) 2.17117 > [cpunode1:68481] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 8 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 9 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 10 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 11 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 12 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 13 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68481] MCW rank 14 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 15 53317.0901 Rate (MB/s) 2.0385 > [cpunode1:68534] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 8 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 9 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 10 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 11 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 12 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 13 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 14 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68534] MCW rank 15 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 16 56708.7028 Rate (MB/s) 2.16817 > [cpunode1:68590] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 9 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 10 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 11 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 12 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 13 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 14 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 15 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68590] MCW rank 16 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 17 58994.6721 Rate (MB/s) 2.25557 > [cpunode1:68649] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 9 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 10 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 11 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 12 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 13 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 14 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 15 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 16 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68649] MCW rank 17 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 18 62089.5079 Rate (MB/s) 2.3739 > [cpunode1:68711] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 10 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 11 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 12 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 13 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 14 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 15 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 16 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 17 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68711] MCW rank 18 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 19 63588.1264 Rate (MB/s) 2.43119 > [cpunode1:68776] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 10 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 11 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 12 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 13 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 14 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 15 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 16 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 17 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 18 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68776] MCW rank 19 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 20 67097.8382 Rate (MB/s) 2.56538 > [cpunode1:68844] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 11 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 12 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 13 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 14 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 15 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 16 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 17 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 18 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 19 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68844] MCW rank 20 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 21 68642.9757 Rate (MB/s) 2.62446 > [cpunode1:68917] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 11 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 12 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 13 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 14 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 15 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 16 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 17 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 18 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 19 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 20 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68917] MCW rank 21 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 22 71264.2836 Rate (MB/s) 2.72468 > [cpunode1:68991] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 12 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 13 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 14 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 15 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 16 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 17 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 18 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 19 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 20 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 21 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:68991] MCW rank 22 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > 23 72876.6138 Rate (MB/s) 2.78633 > [cpunode1:69069] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 12 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 13 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 14 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 15 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 16 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 17 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 18 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 19 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 20 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 21 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 22 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69069] MCW rank 23 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > 24 75732.6676 Rate (MB/s) 2.89552 > [cpunode1:69149] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 13 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 14 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 15 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 16 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 17 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 18 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 19 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 20 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 21 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 22 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 23 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69149] MCW rank 24 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > 25 77217.0466 Rate (MB/s) 2.95227 > [cpunode1:69232] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 13 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 14 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 15 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 16 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 17 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 18 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 19 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 20 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 21 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 22 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 23 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 24 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69232] MCW rank 25 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > 26 80035.7602 Rate (MB/s) 3.06004 > [cpunode1:69318] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 14 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 15 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 16 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 17 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 18 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 19 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 20 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 21 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 22 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 23 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 24 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 25 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69318] MCW rank 26 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > 27 80846.6416 Rate (MB/s) 3.09105 > [cpunode1:69408] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 14 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 15 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 16 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 17 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 18 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 19 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 20 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 21 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 22 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 23 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 24 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 25 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 26 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69408] MCW rank 27 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > 28 83282.5335 Rate (MB/s) 3.18418 > [cpunode1:69500] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 15 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 16 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 17 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 18 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 19 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 20 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 21 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 22 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 23 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 24 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 25 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 26 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 27 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69500] MCW rank 28 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > 29 83988.1592 Rate (MB/s) 3.21116 > [cpunode1:69596] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 15 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 16 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 17 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 18 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 19 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 20 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 21 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 22 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 23 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 24 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 25 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 26 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 27 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 28 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69596] MCW rank 29 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > 30 87241.9164 Rate (MB/s) 3.33556 > [cpunode1:69707] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 16 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 17 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 18 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 19 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 20 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 21 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 22 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 23 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 24 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 25 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 26 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 27 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 28 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 29 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69707] MCW rank 30 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > 31 87821.6811 Rate (MB/s) 3.35773 > [cpunode1:69810] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 16 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 17 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 18 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 19 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 20 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 21 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 22 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 23 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 24 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 25 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 26 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 27 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 28 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 29 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 30 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69810] MCW rank 31 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > 32 90156.4778 Rate (MB/s) 3.44699 > [cpunode1:69914] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 17 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 18 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 19 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 20 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 21 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 22 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 23 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 24 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 25 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 26 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 27 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 28 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 29 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 30 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 31 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:69914] MCW rank 32 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > 33 90112.8468 Rate (MB/s) 3.44533 > [cpunode1:70021] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 17 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 18 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 19 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 20 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 21 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 22 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 23 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 24 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 25 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 26 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 27 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 28 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 29 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 30 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 31 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 32 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70021] MCW rank 33 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > 34 92366.4171 Rate (MB/s) 3.53149 > [cpunode1:70131] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 18 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 19 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 20 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 21 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 22 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 23 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 24 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 25 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 26 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 27 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 28 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 29 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 30 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 31 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 32 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 33 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70131] MCW rank 34 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > 35 91504.9533 Rate (MB/s) 3.49855 > [cpunode1:70244] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 18 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 19 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 20 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 21 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 22 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 23 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 24 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 25 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 26 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 27 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 28 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 29 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 30 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 31 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 32 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 33 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 34 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70244] MCW rank 35 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.] > 36 94404.1634 Rate (MB/s) 3.6094 > [cpunode1:70360] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 19 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 20 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 21 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 22 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 23 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 24 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 25 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 26 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 27 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 28 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 29 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 30 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 31 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 32 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 33 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 34 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 35 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70360] MCW rank 36 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.] > 37 93616.1843 Rate (MB/s) 3.57927 > [cpunode1:70479] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 19 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 20 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 21 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 22 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 23 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 24 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 25 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 26 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 27 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 28 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 29 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 30 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 31 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 32 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 33 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 34 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 35 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 36 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70479] MCW rank 37 bound to socket 1[core 82[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.] > 38 95857.0121 Rate (MB/s) 3.66495 > [cpunode1:70601] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 20 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 21 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 22 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 23 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 24 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 25 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 26 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 27 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 28 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 29 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 30 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 31 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 32 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 33 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 34 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 35 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 36 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 37 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70601] MCW rank 38 bound to socket 1[core 82[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.] > 39 95242.8041 Rate (MB/s) 3.64146 > [cpunode1:70726] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 20 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 21 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 22 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 23 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 24 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 25 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 26 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 27 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 28 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 29 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 30 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 31 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 32 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 33 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 34 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 35 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 36 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 37 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 38 bound to socket 1[core 82[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.] > [cpunode1:70726] MCW rank 39 bound to socket 1[core 83[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././.] > 40 97441.9980 Rate (MB/s) 3.72554 > ------------------------------------------------ > Traceback (most recent call last): > File "process.py", line 89, in > process(sys.argv[1],len(sys.argv)-2) > File "process.py", line 33, in process > speedups[i] = triads[i]/triads[0] > TypeError: 'dict_values' object does not support indexing > make[2]: [makefile:47: mpistream] Error 1 (ignored) > Traceback (most recent call last): > File "process.py", line 89, in > process(sys.argv[1],len(sys.argv)-2) > File "process.py", line 33, in process > speedups[i] = triads[i]/triads[0] > TypeError: 'dict_values' object does not support indexing > make[2]: [makefile:79: mpistreams] Error 1 (ignored) > From: Barry Smith > > Sent: Tuesday, October 10, 2023 10:39 PM > To: Gong Yujie > > Cc: PETSc users list > > Subject: Re: [petsc-users] Scalability problem using PETSc with local installed OpenMPI > > > Run STREAMS with > > MPI_BINDING="-map-by socket --bind-to core --report-bindings" make mpistreams > > send the result > > Also run > > lscpu > numactl -H > > if they are available on your machine, send the result > > >> On Oct 10, 2023, at 10:17 AM, Gong Yujie > wrote: >> >> Dear Barry, >> >> I tried to use the binding as suggested by PETSc: >> mpiexec -n 4 --map-by socket --bind-to socket --report-bindings >> But it seems not improving the performance. Here is the make stream log >> >> Best Regards, >> Yujie >> >> mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O -I/home/tt/petsc-3.16.0/include -I/home/tt/petsc-3.16.0/arch-linux-c-opt/include `pwd`/MPIVersion.c >> Running streams with 'mpiexec --oversubscribe ' using 'NPMAX=16' >> 1 26119.1937 Rate (MB/s) >> 2 29833.4281 Rate (MB/s) 1.1422 >> 3 65338.5050 Rate (MB/s) 2.50155 >> 4 59832.7482 Rate (MB/s) 2.29076 >> 5 48629.8396 Rate (MB/s) 1.86184 >> 6 58569.4289 Rate (MB/s) 2.24239 >> 7 63827.1144 Rate (MB/s) 2.44369 >> 8 57448.5349 Rate (MB/s) 2.19948 >> 9 61405.3273 Rate (MB/s) 2.35097 >> 10 68021.6111 Rate (MB/s) 2.60428 >> 11 71289.0422 Rate (MB/s) 2.72937 >> 12 76900.6386 Rate (MB/s) 2.94422 >> 13 80198.6807 Rate (MB/s) 3.07049 >> 14 64846.3685 Rate (MB/s) 2.48271 >> 15 83072.8631 Rate (MB/s) 3.18053 >> 16 70128.0166 Rate (MB/s) 2.68492 >> ------------------------------------------------ >> Traceback (most recent call last): >> File "process.py", line 89, in >> process(sys.argv[1],len(sys.argv)-2) >> File "process.py", line 33, in process >> speedups[i] = triads[i]/triads[0] >> TypeError: 'dict_values' object does not support indexing >> make[2]: [makefile:47: mpistream] Error 1 (ignored) >> Traceback (most recent call last): >> File "process.py", line 89, in >> process(sys.argv[1],len(sys.argv)-2) >> File "process.py", line 33, in process >> speedups[i] = triads[i]/triads[0] >> TypeError: 'dict_values' object does not support indexing >> make[2]: [makefile:79: mpistreams] Error 1 (ignored) >> From: Barry Smith > >> Sent: Tuesday, October 10, 2023 9:59 PM >> To: Gong Yujie > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Scalability problem using PETSc with local installed OpenMPI >> >> >> Take a look at https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >> >> Check the binding that OpenMPI is using (by the way, there are much more recent OpenMPI versions, I suggest using them). Run the STREAMS benchmark as indicated on that page. >> >> Barry >> >> >>> On Oct 10, 2023, at 9:27 AM, Gong Yujie > wrote: >>> >>> Dear PETSc developers, >>> >>> I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. >>> >>> My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions. >>> >>> Best Regards, >>> Yujie -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Tue Oct 10 16:33:48 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Tue, 10 Oct 2023 23:33:48 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> Message-ID: Hi all, Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). """Experimenting with PETSc mat-mat multiplication""" import time import numpy as np from colorama import Fore from firedrake import COMM_SELF, COMM_WORLD from firedrake.petsc import PETSc from mpi4py import MPI from numpy.testing import assert_array_almost_equal from utilities import Print nproc = COMM_WORLD.size rank = COMM_WORLD.rank def create_petsc_matrix(input_array, sparse=True): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: PETSc mat: PETSc mpi matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape size = ((None, global_rows), (global_cols, global_cols)) # Create a sparse or dense matrix based on the 'sparse' argument if sparse: matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix # -------------------------------------------- # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi # A' = Phi.T * A * Phi # [k x k] <- [k x m] x [m x m] x [m x k] # -------------------------------------------- m, k = 100, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, m)) Phi_np = np.random.randint(low=0, high=6, size=(m, k)) # -------------------------------------------- # TEST: Galerking projection of numpy matrices A_np and Phi_np # -------------------------------------------- Aprime_np = Phi_np.T @ A_np @ Phi_np Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") Print(f"{Aprime_np}") # Create A as an mpi matrix distributed on each process A = create_petsc_matrix(A_np, sparse=False) # Create Phi as an mpi matrix distributed on each process Phi = create_petsc_matrix(Phi_np, sparse=False) # Create an empty PETSc matrix object to store the result of the PtAP operation. # This will hold the result A' = Phi.T * A * Phi after the computation. A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) # Perform the PtAP (Phi Transpose times A times Phi) operation. # In mathematical terms, this operation is A' = Phi.T * A * Phi. # A_prime will store the result of the operation. A_prime = A.ptap(Phi) Here is the error MATRIX mpiaij A [100x100] Assembled Partitioning for A: Rank 0: Rows [0, 34) Rank 1: Rows [34, 67) Rank 2: Rows [67, 100) MATRIX mpiaij Phi [100x7] Assembled Partitioning for Phi: Rank 0: Rows [0, 34) Rank 1: Rows [34, 67) Rank 2: Rows [67, 100) Traceback (most recent call last): File "/Users/boutsitron/work/galerkin_projection.py", line 87, in A_prime = A.ptap(Phi) ^^^^^^^^^^^ File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap petsc4py.PETSc.Error: error code 60 [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 [0] Nonconforming object sizes [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 Any thoughts? Thanks, Thanos > On 5 Oct 2023, at 14:23, Thanasis Boutsikakis wrote: > > This works Pierre. Amazing input, thanks a lot! > >> On 5 Oct 2023, at 14:17, Pierre Jolivet wrote: >> >> Not a petsc4py expert here, but you may to try instead: >> A_prime = A.ptap(Phi) >> >> Thanks, >> Pierre >> >>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis wrote: >>> >>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>> >>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>> [0]PETSC ERROR: to get more information on the crash. >>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>> >>> """Experimenting with PETSc mat-mat multiplication""" >>> >>> import time >>> >>> import numpy as np >>> from colorama import Fore >>> from firedrake import COMM_SELF, COMM_WORLD >>> from firedrake.petsc import PETSc >>> from mpi4py import MPI >>> from numpy.testing import assert_array_almost_equal >>> >>> from utilities import ( >>> Print, >>> create_petsc_matrix, >>> print_matrix_partitioning, >>> ) >>> >>> nproc = COMM_WORLD.size >>> rank = COMM_WORLD.rank >>> >>> # -------------------------------------------- >>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>> # A' = Phi.T * A * Phi >>> # [k x k] <- [k x m] x [m x m] x [m x k] >>> # -------------------------------------------- >>> >>> m, k = 11, 7 >>> # Generate the random numpy matrices >>> np.random.seed(0) # sets the seed to 0 >>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>> >>> # -------------------------------------------- >>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>> # -------------------------------------------- >>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>> Print(f"{Aprime_np}") >>> >>> # Create A as an mpi matrix distributed on each process >>> A = create_petsc_matrix(A_np, sparse=False) >>> >>> # Create Phi as an mpi matrix distributed on each process >>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>> >>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>> >>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>> # A_prime will store the result of the operation. >>> Phi.PtAP(A, A_prime) >>> >>>> On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: >>>> >>>> How about using ptap which will use MatPtAP? >>>> It will be more efficient (and it will help you bypass the issue). >>>> >>>> Thanks, >>>> Pierre >>>> >>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis wrote: >>>>> >>>>> Sorry, forgot function create_petsc_matrix() >>>>> >>>>> def create_petsc_matrix(input_array sparse=True): >>>>> """Create a PETSc matrix from an input_array >>>>> >>>>> Args: >>>>> input_array (np array): Input array >>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>> >>>>> Returns: >>>>> PETSc mat: PETSc matrix >>>>> """ >>>>> # Check if input_array is 1D and reshape if necessary >>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>> global_rows, global_cols = input_array.shape >>>>> >>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>> >>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>> if sparse: >>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>> else: >>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>> matrix.setUp() >>>>> >>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>> >>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>> # Calculate the correct row in the array for the current process >>>>> row_in_array = counter + local_rows_start >>>>> matrix.setValues( >>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>> ) >>>>> >>>>> # Assembly the matrix to compute the final structure >>>>> matrix.assemblyBegin() >>>>> matrix.assemblyEnd() >>>>> >>>>> return matrix >>>>> >>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>> >>>>>> Phi.transposeMatMult(A, A1) >>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>> petsc4py.PETSc.Error: error code 56 >>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>> [0] No support for this operation for this object type >>>>>> [0] Call MatProductCreate() first >>>>>> >>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>> >>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>> >>>>>> import time >>>>>> >>>>>> import numpy as np >>>>>> from colorama import Fore >>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>> from firedrake.petsc import PETSc >>>>>> from mpi4py import MPI >>>>>> from numpy.testing import assert_array_almost_equal >>>>>> >>>>>> from utilities import ( >>>>>> Print, >>>>>> create_petsc_matrix, >>>>>> ) >>>>>> >>>>>> nproc = COMM_WORLD.size >>>>>> rank = COMM_WORLD.rank >>>>>> >>>>>> # -------------------------------------------- >>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>> # A' = Phi.T * A * Phi >>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>> # -------------------------------------------- >>>>>> >>>>>> m, k = 11, 7 >>>>>> # Generate the random numpy matrices >>>>>> np.random.seed(0) # sets the seed to 0 >>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>> >>>>>> # Create A as an mpi matrix distributed on each process >>>>>> A = create_petsc_matrix(A_np) >>>>>> >>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>> >>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>> >>>>>> # Now A1 contains the result of Phi^T * A >>>>>> Phi.transposeMatMult(A, A1) >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erdemguer at proton.me Tue Oct 10 18:01:13 2023 From: erdemguer at proton.me (erdemguer) Date: Tue, 10 Oct 2023 23:01:13 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: Message-ID: Hi, Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: - DMPlexCreateBoxMesh - DMSetFromOptions - PetscSectionCreate - PetscSectionSetNumFields - PetscSectionSetFieldDof - PetscSectionSetDof - PetscSectionSetUp - DMSetLocalSection - DMSetAdjacency - DMPlexDistribute It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. But I couldn't figure out how to decide where the ghost/processor boundary cells start. In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. Thanks again, Guer. ------- Original Message ------- On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: > On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: > >> Hi, >> >> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >> >> Create a DMPlex mesh from GMSH. >> Reorder it with DMPlexPermute. >> Create necessary pre-processing arrays related to the mesh/problem. >> Create field(s) with multi-dofs. >> Create residual vectors. >> Define a function to calculate the residual for each cell and, use SNES. >> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? > > The intention was that there is enough information in the manual to do this. > > Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. > > So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. > > Thanks, > > Matt > >> Thank you. >> Guer. >> >> Sent with [Proton Mail](https://proton.me/) secure email. > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 10 19:26:34 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Oct 2023 20:26:34 -0400 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> Message-ID: On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis < thanasis.boutsikakis at corintis.com> wrote: > Hi all, > > Revisiting my code and the proposed solution from Pierre, I realized this > works only in sequential. The reason is that PETSc partitions those > matrices only row-wise, which leads to an error due to the mismatch between > number of columns of A (non-partitioned) and the number of rows of Phi > (partitioned). > Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong. Thanks, Matt > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import Print > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > def create_petsc_matrix(input_array, sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc mpi matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc > matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 100, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # -------------------------------------------- > # TEST: Galerking projection of numpy matrices A_np and Phi_np > # -------------------------------------------- > Aprime_np = Phi_np.T @ A_np @ Phi_np > Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") > Print(f"{Aprime_np}") > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np, sparse=False) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np, sparse=False) > > # Create an empty PETSc matrix object to store the result of the PtAP > operation. > # This will hold the result A' = Phi.T * A * Phi after the computation. > A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) > > # Perform the PtAP (Phi Transpose times A times Phi) operation. > # In mathematical terms, this operation is A' = Phi.T * A * Phi. > # A_prime will store the result of the operation. > A_prime = A.ptap(Phi) > > Here is the error > > MATRIX mpiaij A [100x100] > Assembled > > Partitioning for A: > Rank 0: Rows [0, 34) > Rank 1: Rows [34, 67) > Rank 2: Rows [67, 100) > > MATRIX mpiaij Phi [100x7] > Assembled > > Partitioning for Phi: > Rank 0: Rows [0, 34) > Rank 1: Rows [34, 67) > Rank 2: Rows [67, 100) > > Traceback (most recent call last): > File "/Users/boutsitron/work/galerkin_projection.py", line 87, in > > A_prime = A.ptap(Phi) > ^^^^^^^^^^^ > File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap > petsc4py.PETSc.Error: error code 60 > [0] MatPtAP() at > /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 > [0] MatProductSetFromOptions() at > /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 > [0] MatProductSetFromOptions_Private() at > /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 > [0] MatProductSetFromOptions_MPIAIJ() at > /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 > [0] MatProductSetFromOptions_MPIAIJ_PtAP() at > /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 > [0] Nonconforming object sizes > [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) > Abort(1) on node 0 (rank 0 in comm 496): application called > MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 > > > Any thoughts? > > Thanks, > Thanos > > On 5 Oct 2023, at 14:23, Thanasis Boutsikakis < > thanasis.boutsikakis at corintis.com> wrote: > > This works Pierre. Amazing input, thanks a lot! > > On 5 Oct 2023, at 14:17, Pierre Jolivet wrote: > > Not a petsc4py expert here, but you may to try instead: > A_prime = A.ptap(Phi) > > Thanks, > Pierre > > On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis < > thanasis.boutsikakis at corintis.com> wrote: > > Thanks Pierre! So I tried this and got a segmentation fault. Is this > supposed to work right off the bat or am I missing sth? > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is > causing the crash. > Abort(59) on node 0 (rank 0 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import ( > Print, > create_petsc_matrix, > print_matrix_partitioning, > ) > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc > matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # -------------------------------------------- > # TEST: Galerking projection of numpy matrices A_np and Phi_np > # -------------------------------------------- > Aprime_np = Phi_np.T @ A_np @ Phi_np > Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") > Print(f"{Aprime_np}") > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np, sparse=False) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np, sparse=False) > > # Create an empty PETSc matrix object to store the result of the PtAP > operation. > # This will hold the result A' = Phi.T * A * Phi after the computation. > A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) > > # Perform the PtAP (Phi Transpose times A times Phi) operation. > # In mathematical terms, this operation is A' = Phi.T * A * Phi. > # A_prime will store the result of the operation. > Phi.PtAP(A, A_prime) > > On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: > > How about using ptap which will use MatPtAP? > It will be more efficient (and it will help you bypass the issue). > > Thanks, > Pierre > > On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis < > thanasis.boutsikakis at corintis.com> wrote: > > Sorry, forgot function create_petsc_matrix() > > def create_petsc_matrix(input_array sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > On 5 Oct 2023, at 13:09, Thanasis Boutsikakis < > thanasis.boutsikakis at corintis.com> wrote: > > Hi everyone, > > I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, > A1) work. The error is > > Phi.transposeMatMult(A, A1) > File "petsc4py/PETSc/Mat.pyx", line 1514, in > petsc4py.PETSc.Mat.transposeMatMult > petsc4py.PETSc.Error: error code 56 > [0] MatTransposeMatMult() at > /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 > [0] MatProduct_Private() at > /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 > [0] No support for this operation for this object type > [0] Call MatProductCreate() first > > Do you know if these exposed to petsc4py or maybe there is another way? I > cannot get the MFE to work (neither in sequential nor in parallel) > > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import ( > Print, > create_petsc_matrix, > ) > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc > matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np) > > A1 = create_petsc_matrix(np.zeros((k, m))) > > # Now A1 contains the result of Phi^T * A > Phi.transposeMatMult(A, A1) > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 10 19:33:18 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Oct 2023 20:33:18 -0400 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: Message-ID: On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: > > Hi, > Sorry for my late response. I tried with your suggestions and I think I > made a progress. But I still got issues. Let me explain my latest mesh > routine: > > > 1. DMPlexCreateBoxMesh > 2. DMSetFromOptions > 3. PetscSectionCreate > 4. PetscSectionSetNumFields > 5. PetscSectionSetFieldDof > 6. PetscSectionSetDof > 7. PetscSectionSetUp > 8. DMSetLocalSection > 9. DMSetAdjacency > 10. DMPlexDistribute > > > It's still not working but it's promising, if I call DMPlexGetDepthStratum > for cells, I can see that after distribution processors have more cells. > Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put DMViewFromOptions(dm, NULL, "-dm1_view") with a different string after each call. > But I couldn't figure out how to decide where the ghost/processor boundary > cells start. > Please send the actual code because the above is not specific enough. For example, you will not have "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, so each process gets a unique set. Thanks, Matt > In older mails I saw there is a function DMPlexGetHybridBounds but I > think that function is deprecated. I tried to use, > DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting -1 > as cEndInterior before and after distribution. I tried it for > DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also > tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but > nothing changed. I think I can calculate the ghost cell indices using > cStart/cEnd before & after distribution but I think there is a better way > I'm currently missing. > > Thanks again, > Guer. > > ------- Original Message ------- > On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> I am currently using DMPlex in my code. It runs serially at the moment, >> but I'm interested in adding parallel options. Here is my workflow: >> >> Create a DMPlex mesh from GMSH. >> Reorder it with DMPlexPermute. >> Create necessary pre-processing arrays related to the mesh/problem. >> Create field(s) with multi-dofs. >> Create residual vectors. >> Define a function to calculate the residual for each cell and, use SNES. >> As you can see, I'm not using FV or FE structures (most examples do). >> Now, I'm trying to implement this in parallel using a similar approach. >> However, I'm struggling to understand how to create corresponding vectors >> and how to obtain index sets for each processor. Is there a tutorial or >> paper that covers this topic? >> > > The intention was that there is enough information in the manual to do > this. > > Using PetscFE/PetscFV is not required. However, I strongly encourage you > to use PetscSection. Without this, it would be incredibly hard to do what > you want. Once the DM has a Section, it can do things like automatically > create vectors and matrices for you. It can redistribute them, subset them, > etc. The Section describes how dofs are assigned to pieces of the mesh > (mesh points). This is in the manual, and there are a few examples that do > it by hand. > > So I suggest changing your code to use PetscSection, and then letting us > know if things still do not work. > > Thanks, > > Matt > >> Thank you. >> Guer. >> >> Sent with Proton Mail secure email. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Oct 10 19:42:56 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 10 Oct 2023 20:42:56 -0400 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> Message-ID: This looks like a false positive or there is some subtle bug here that we are not seeing. Could this be the first time parallel PtAP has been used (and reported) in petsc4py? Mark On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley wrote: > On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis < > thanasis.boutsikakis at corintis.com> wrote: > >> Hi all, >> >> Revisiting my code and the proposed solution from Pierre, I realized this >> works only in sequential. The reason is that PETSc partitions those >> matrices only row-wise, which leads to an error due to the mismatch between >> number of columns of A (non-partitioned) and the number of rows of Phi >> (partitioned). >> > > Are you positive about this? P^T A P is designed to run in this scenario, > so either we have a bug or the diagnosis is wrong. > > Thanks, > > Matt > > >> """Experimenting with PETSc mat-mat multiplication""" >> >> import time >> >> import numpy as np >> from colorama import Fore >> from firedrake import COMM_SELF, COMM_WORLD >> from firedrake.petsc import PETSc >> from mpi4py import MPI >> from numpy.testing import assert_array_almost_equal >> >> from utilities import Print >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> def create_petsc_matrix(input_array, sparse=True): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> PETSc mat: PETSc mpi matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> size = ((None, global_rows), (global_cols, global_cols)) >> >> # Create a sparse or dense matrix based on the 'sparse' argument >> if sparse: >> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >> else: >> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> # -------------------------------------------- >> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc >> matrix Phi >> # A' = Phi.T * A * Phi >> # [k x k] <- [k x m] x [m x m] x [m x k] >> # -------------------------------------------- >> >> m, k = 100, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, m)) >> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> # -------------------------------------------- >> # TEST: Galerking projection of numpy matrices A_np and Phi_np >> # -------------------------------------------- >> Aprime_np = Phi_np.T @ A_np @ Phi_np >> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >> Print(f"{Aprime_np}") >> >> # Create A as an mpi matrix distributed on each process >> A = create_petsc_matrix(A_np, sparse=False) >> >> # Create Phi as an mpi matrix distributed on each process >> Phi = create_petsc_matrix(Phi_np, sparse=False) >> >> # Create an empty PETSc matrix object to store the result of the PtAP >> operation. >> # This will hold the result A' = Phi.T * A * Phi after the computation. >> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >> >> # Perform the PtAP (Phi Transpose times A times Phi) operation. >> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >> # A_prime will store the result of the operation. >> A_prime = A.ptap(Phi) >> >> Here is the error >> >> MATRIX mpiaij A [100x100] >> Assembled >> >> Partitioning for A: >> Rank 0: Rows [0, 34) >> Rank 1: Rows [34, 67) >> Rank 2: Rows [67, 100) >> >> MATRIX mpiaij Phi [100x7] >> Assembled >> >> Partitioning for Phi: >> Rank 0: Rows [0, 34) >> Rank 1: Rows [34, 67) >> Rank 2: Rows [67, 100) >> >> Traceback (most recent call last): >> File "/Users/boutsitron/work/galerkin_projection.py", line 87, in >> >> A_prime = A.ptap(Phi) >> ^^^^^^^^^^^ >> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >> petsc4py.PETSc.Error: error code 60 >> [0] MatPtAP() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >> [0] MatProductSetFromOptions() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >> [0] MatProductSetFromOptions_Private() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >> [0] MatProductSetFromOptions_MPIAIJ() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >> [0] Nonconforming object sizes >> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >> Abort(1) on node 0 (rank 0 in comm 496): application called >> MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >> >> >> Any thoughts? >> >> Thanks, >> Thanos >> >> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis < >> thanasis.boutsikakis at corintis.com> wrote: >> >> This works Pierre. Amazing input, thanks a lot! >> >> On 5 Oct 2023, at 14:17, Pierre Jolivet wrote: >> >> Not a petsc4py expert here, but you may to try instead: >> A_prime = A.ptap(Phi) >> >> Thanks, >> Pierre >> >> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis < >> thanasis.boutsikakis at corintis.com> wrote: >> >> Thanks Pierre! So I tried this and got a segmentation fault. Is this >> supposed to work right off the bat or am I missing sth? >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and >> https://petsc.org/release/faq/ >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is >> causing the crash. >> Abort(59) on node 0 (rank 0 in comm 0): application called >> MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> >> """Experimenting with PETSc mat-mat multiplication""" >> >> import time >> >> import numpy as np >> from colorama import Fore >> from firedrake import COMM_SELF, COMM_WORLD >> from firedrake.petsc import PETSc >> from mpi4py import MPI >> from numpy.testing import assert_array_almost_equal >> >> from utilities import ( >> Print, >> create_petsc_matrix, >> print_matrix_partitioning, >> ) >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> # -------------------------------------------- >> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc >> matrix Phi >> # A' = Phi.T * A * Phi >> # [k x k] <- [k x m] x [m x m] x [m x k] >> # -------------------------------------------- >> >> m, k = 11, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, m)) >> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> # -------------------------------------------- >> # TEST: Galerking projection of numpy matrices A_np and Phi_np >> # -------------------------------------------- >> Aprime_np = Phi_np.T @ A_np @ Phi_np >> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >> Print(f"{Aprime_np}") >> >> # Create A as an mpi matrix distributed on each process >> A = create_petsc_matrix(A_np, sparse=False) >> >> # Create Phi as an mpi matrix distributed on each process >> Phi = create_petsc_matrix(Phi_np, sparse=False) >> >> # Create an empty PETSc matrix object to store the result of the PtAP >> operation. >> # This will hold the result A' = Phi.T * A * Phi after the computation. >> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >> >> # Perform the PtAP (Phi Transpose times A times Phi) operation. >> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >> # A_prime will store the result of the operation. >> Phi.PtAP(A, A_prime) >> >> On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: >> >> How about using ptap which will use MatPtAP? >> It will be more efficient (and it will help you bypass the issue). >> >> Thanks, >> Pierre >> >> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis < >> thanasis.boutsikakis at corintis.com> wrote: >> >> Sorry, forgot function create_petsc_matrix() >> >> def create_petsc_matrix(input_array sparse=True): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> PETSc mat: PETSc matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> >> size = ((None, global_rows), (global_cols, global_cols)) >> >> # Create a sparse or dense matrix based on the 'sparse' argument >> if sparse: >> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >> else: >> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis < >> thanasis.boutsikakis at corintis.com> wrote: >> >> Hi everyone, >> >> I am trying a Galerkin projection (see MFE below) and I cannot get the >> Phi.transposeMatMult(A, A1) work. The error is >> >> Phi.transposeMatMult(A, A1) >> File "petsc4py/PETSc/Mat.pyx", line 1514, in >> petsc4py.PETSc.Mat.transposeMatMult >> petsc4py.PETSc.Error: error code 56 >> [0] MatTransposeMatMult() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >> [0] MatProduct_Private() at >> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >> [0] No support for this operation for this object type >> [0] Call MatProductCreate() first >> >> Do you know if these exposed to petsc4py or maybe there is another way? I >> cannot get the MFE to work (neither in sequential nor in parallel) >> >> """Experimenting with PETSc mat-mat multiplication""" >> >> import time >> >> import numpy as np >> from colorama import Fore >> from firedrake import COMM_SELF, COMM_WORLD >> from firedrake.petsc import PETSc >> from mpi4py import MPI >> from numpy.testing import assert_array_almost_equal >> >> from utilities import ( >> Print, >> create_petsc_matrix, >> ) >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> # -------------------------------------------- >> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc >> matrix Phi >> # A' = Phi.T * A * Phi >> # [k x k] <- [k x m] x [m x m] x [m x k] >> # -------------------------------------------- >> >> m, k = 11, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, m)) >> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> # Create A as an mpi matrix distributed on each process >> A = create_petsc_matrix(A_np) >> >> # Create Phi as an mpi matrix distributed on each process >> Phi = create_petsc_matrix(Phi_np) >> >> A1 = create_petsc_matrix(np.zeros((k, m))) >> >> # Now A1 contains the result of Phi^T * A >> Phi.transposeMatMult(A, A1) >> >> >> >> >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bldenton at buffalo.edu Tue Oct 10 20:34:16 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Wed, 11 Oct 2023 01:34:16 +0000 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization Message-ID: Good Evening, I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information. Thank you in advance for your time. Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 10 21:18:07 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 10 Oct 2023 20:18:07 -0600 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: Message-ID: <401ffc8a-38ec-4a30-a26d-8c8028ccfcca@app.fastmail.com> Do you want to write a new code using only PETSc or would you be up for collaborating on ceed-fluids, which is a high-performance compressible SUPG solver based on DMPlex with good GPU support? It uses the metric to compute covariant length for stabilization. We have YZ? shock capturing, though it hasn't been tested much beyond shock tube experiments. (Most of our work has been low Mach.) https://libceed.org/en/latest/examples/fluids/ https://github.com/CEED/libCEED/blob/main/examples/fluids/qfunctions/stabilization.h#L76 On Tue, Oct 10, 2023, at 7:34 PM, Brandon Denton via petsc-users wrote: > Good Evening, > > I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information. > > Thank you in advance for your time. > Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bldenton at buffalo.edu Tue Oct 10 22:54:11 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Wed, 11 Oct 2023 03:54:11 +0000 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: <401ffc8a-38ec-4a30-a26d-8c8028ccfcca@app.fastmail.com> References: <401ffc8a-38ec-4a30-a26d-8c8028ccfcca@app.fastmail.com> Message-ID: My initial plan was to write a new code using only PETSc. However, I don't see how to do what I want within the point-wise residual function. Am I missing something? Yes. I would be interested in collaborating on the ceed-fluids. I took a quick look at the links you provided and it looks interesting. I'll warn you though. I'm a Mechanical Engineer by trade/training. The calculus and programming sometimes take me a little while to wrap my head around. Let me know how I can help. In the meantime, I'll continue to review the information you sent over. ________________________________ From: Jed Brown Sent: Tuesday, October 10, 2023 10:18 PM To: Brandon Denton ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization Do you want to write a new code using only PETSc or would you be up for collaborating on ceed-fluids, which is a high-performance compressible SUPG solver based on DMPlex with good GPU support? It uses the metric to compute covariant length for stabilization. We have YZ? shock capturing, though it hasn't been tested much beyond shock tube experiments. (Most of our work has been low Mach.) https://libceed.org/en/latest/examples/fluids/ https://github.com/CEED/libCEED/blob/main/examples/fluids/qfunctions/stabilization.h#L76 On Tue, Oct 10, 2023, at 7:34 PM, Brandon Denton via petsc-users wrote: Good Evening, I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information. Thank you in advance for your time. Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed Oct 11 00:18:10 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 11 Oct 2023 07:18:10 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> Message-ID: <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance). It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense. On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message. On the other hand, there is not much to optimize when computing C = P? A P with everything being dense. I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then). Thanks, Pierre > On 11 Oct 2023, at 2:42?AM, Mark Adams wrote: > > This looks like a false positive or there is some subtle bug here that we are not seeing. > Could this be the first time parallel PtAP has been used (and reported) in petsc4py? > > Mark > > On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley > wrote: >> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis > wrote: >>> Hi all, >>> >>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). >> >> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong. >> >> Thanks, >> >> Matt >> >>> """Experimenting with PETSc mat-mat multiplication""" >>> >>> import time >>> >>> import numpy as np >>> from colorama import Fore >>> from firedrake import COMM_SELF, COMM_WORLD >>> from firedrake.petsc import PETSc >>> from mpi4py import MPI >>> from numpy.testing import assert_array_almost_equal >>> >>> from utilities import Print >>> >>> nproc = COMM_WORLD.size >>> rank = COMM_WORLD.rank >>> >>> def create_petsc_matrix(input_array, sparse=True): >>> """Create a PETSc matrix from an input_array >>> >>> Args: >>> input_array (np array): Input array >>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>> >>> Returns: >>> PETSc mat: PETSc mpi matrix >>> """ >>> # Check if input_array is 1D and reshape if necessary >>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>> global_rows, global_cols = input_array.shape >>> size = ((None, global_rows), (global_cols, global_cols)) >>> >>> # Create a sparse or dense matrix based on the 'sparse' argument >>> if sparse: >>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>> else: >>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>> matrix.setUp() >>> >>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>> >>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>> # Calculate the correct row in the array for the current process >>> row_in_array = counter + local_rows_start >>> matrix.setValues( >>> i, range(global_cols), input_array[row_in_array, :], addv=False >>> ) >>> >>> # Assembly the matrix to compute the final structure >>> matrix.assemblyBegin() >>> matrix.assemblyEnd() >>> >>> return matrix >>> >>> # -------------------------------------------- >>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>> # A' = Phi.T * A * Phi >>> # [k x k] <- [k x m] x [m x m] x [m x k] >>> # -------------------------------------------- >>> >>> m, k = 100, 7 >>> # Generate the random numpy matrices >>> np.random.seed(0) # sets the seed to 0 >>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>> >>> # -------------------------------------------- >>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>> # -------------------------------------------- >>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>> Print(f"{Aprime_np}") >>> >>> # Create A as an mpi matrix distributed on each process >>> A = create_petsc_matrix(A_np, sparse=False) >>> >>> # Create Phi as an mpi matrix distributed on each process >>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>> >>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>> >>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>> # A_prime will store the result of the operation. >>> A_prime = A.ptap(Phi) >>> >>> Here is the error >>> >>> MATRIX mpiaij A [100x100] >>> Assembled >>> >>> Partitioning for A: >>> Rank 0: Rows [0, 34) >>> Rank 1: Rows [34, 67) >>> Rank 2: Rows [67, 100) >>> >>> MATRIX mpiaij Phi [100x7] >>> Assembled >>> >>> Partitioning for Phi: >>> Rank 0: Rows [0, 34) >>> Rank 1: Rows [34, 67) >>> Rank 2: Rows [67, 100) >>> >>> Traceback (most recent call last): >>> File "/Users/boutsitron/work/galerkin_projection.py", line 87, in >>> A_prime = A.ptap(Phi) >>> ^^^^^^^^^^^ >>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>> petsc4py.PETSc.Error: error code 60 >>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>> [0] Nonconforming object sizes >>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >>> >>> Any thoughts? >>> >>> Thanks, >>> Thanos >>> >>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis > wrote: >>>> >>>> This works Pierre. Amazing input, thanks a lot! >>>> >>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet > wrote: >>>>> >>>>> Not a petsc4py expert here, but you may to try instead: >>>>> A_prime = A.ptap(Phi) >>>>> >>>>> Thanks, >>>>> Pierre >>>>> >>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis > wrote: >>>>>> >>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>>>>> >>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>>> >>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>> >>>>>> import time >>>>>> >>>>>> import numpy as np >>>>>> from colorama import Fore >>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>> from firedrake.petsc import PETSc >>>>>> from mpi4py import MPI >>>>>> from numpy.testing import assert_array_almost_equal >>>>>> >>>>>> from utilities import ( >>>>>> Print, >>>>>> create_petsc_matrix, >>>>>> print_matrix_partitioning, >>>>>> ) >>>>>> >>>>>> nproc = COMM_WORLD.size >>>>>> rank = COMM_WORLD.rank >>>>>> >>>>>> # -------------------------------------------- >>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>> # A' = Phi.T * A * Phi >>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>> # -------------------------------------------- >>>>>> >>>>>> m, k = 11, 7 >>>>>> # Generate the random numpy matrices >>>>>> np.random.seed(0) # sets the seed to 0 >>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>> >>>>>> # -------------------------------------------- >>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>> # -------------------------------------------- >>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>> Print(f"{Aprime_np}") >>>>>> >>>>>> # Create A as an mpi matrix distributed on each process >>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>> >>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>> >>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>> >>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>> # A_prime will store the result of the operation. >>>>>> Phi.PtAP(A, A_prime) >>>>>> >>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet > wrote: >>>>>>> >>>>>>> How about using ptap which will use MatPtAP? >>>>>>> It will be more efficient (and it will help you bypass the issue). >>>>>>> >>>>>>> Thanks, >>>>>>> Pierre >>>>>>> >>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis > wrote: >>>>>>>> >>>>>>>> Sorry, forgot function create_petsc_matrix() >>>>>>>> >>>>>>>> def create_petsc_matrix(input_array sparse=True): >>>>>>>> """Create a PETSc matrix from an input_array >>>>>>>> >>>>>>>> Args: >>>>>>>> input_array (np array): Input array >>>>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>>>> >>>>>>>> Returns: >>>>>>>> PETSc mat: PETSc matrix >>>>>>>> """ >>>>>>>> # Check if input_array is 1D and reshape if necessary >>>>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>>>> global_rows, global_cols = input_array.shape >>>>>>>> >>>>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>>>> >>>>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>>>> if sparse: >>>>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>>>> else: >>>>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>>>> matrix.setUp() >>>>>>>> >>>>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>>>> >>>>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>>>> # Calculate the correct row in the array for the current process >>>>>>>> row_in_array = counter + local_rows_start >>>>>>>> matrix.setValues( >>>>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>>>> ) >>>>>>>> >>>>>>>> # Assembly the matrix to compute the final structure >>>>>>>> matrix.assemblyBegin() >>>>>>>> matrix.assemblyEnd() >>>>>>>> >>>>>>>> return matrix >>>>>>>> >>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis > wrote: >>>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>>>>> >>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>>>>> petsc4py.PETSc.Error: error code 56 >>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>>>>> [0] No support for this operation for this object type >>>>>>>>> [0] Call MatProductCreate() first >>>>>>>>> >>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>>>>> >>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>> >>>>>>>>> import time >>>>>>>>> >>>>>>>>> import numpy as np >>>>>>>>> from colorama import Fore >>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>> from mpi4py import MPI >>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>> >>>>>>>>> from utilities import ( >>>>>>>>> Print, >>>>>>>>> create_petsc_matrix, >>>>>>>>> ) >>>>>>>>> >>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>> >>>>>>>>> # -------------------------------------------- >>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>> # -------------------------------------------- >>>>>>>>> >>>>>>>>> m, k = 11, 7 >>>>>>>>> # Generate the random numpy matrices >>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>> >>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>> A = create_petsc_matrix(A_np) >>>>>>>>> >>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>>>>> >>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>>>>> >>>>>>>>> # Now A1 contains the result of Phi^T * A >>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Oct 11 01:41:22 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 11 Oct 2023 08:41:22 +0200 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: References: Message-ID: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es> Kenneth, The MatDuplicate issue should be fixed in the following MR https://gitlab.com/petsc/petsc/-/merge_requests/6912 Note that the NLEIGS solver internally uses MatDuplicate for creating multiple copies of the shell matrix, each one with its own value of lambda. Hence your implementation of the shell matrix is not appropriate, since you have a single global lambda within the module. I have attempted to write a Fortran example that duplicates the lambda correctly (see the MR), but does not work yet. Jose > El 6 oct 2023, a las 22:28, Kenneth C Hall escribi?: > > Jose, > > Unfortunately, I was unable to implement the MATOP_DUPLICATE operation in fortran (and I do not know enough c to work in c). Here is the error message I get: > > [0]PETSC ERROR: #1 MatShellSetOperation_Fortran() at /Users/hall/Documents/Fortran_Codes/Packages/petsc/src/mat/impls/shell/ftn-custom/zshellf.c:283 > [0]PETSC ERROR: #2 src/test_nep.f90:62 > > When I look at zshellf.c, MATOP_DUPLICATE is not one of the supported operations. See below. > > Kenneth > > > /** > * Subset of MatOperation that is supported by the Fortran wrappers. > */ > enum FortranMatOperation { > FORTRAN_MATOP_MULT = 0, > FORTRAN_MATOP_MULT_ADD = 1, > FORTRAN_MATOP_MULT_TRANSPOSE = 2, > FORTRAN_MATOP_MULT_TRANSPOSE_ADD = 3, > FORTRAN_MATOP_SOR = 4, > FORTRAN_MATOP_TRANSPOSE = 5, > FORTRAN_MATOP_GET_DIAGONAL = 6, > FORTRAN_MATOP_DIAGONAL_SCALE = 7, > FORTRAN_MATOP_ZERO_ENTRIES = 8, > FORTRAN_MATOP_AXPY = 9, > FORTRAN_MATOP_SHIFT = 10, > FORTRAN_MATOP_DIAGONAL_SET = 11, > FORTRAN_MATOP_DESTROY = 12, > FORTRAN_MATOP_VIEW = 13, > FORTRAN_MATOP_CREATE_VECS = 14, > FORTRAN_MATOP_GET_DIAGONAL_BLOCK = 15, > FORTRAN_MATOP_COPY = 16, > FORTRAN_MATOP_SCALE = 17, > FORTRAN_MATOP_SET_RANDOM = 18, > FORTRAN_MATOP_ASSEMBLY_BEGIN = 19, > FORTRAN_MATOP_ASSEMBLY_END = 20, > FORTRAN_MATOP_SIZE = 21 > }; > > > From: Jose E. Roman > Date: Friday, October 6, 2023 at 7:01 AM > To: Kenneth C Hall > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) > > I am getting an error in a different place than you. I started to debug, but don't have much time at the moment. > Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix? > > Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian(). > > Jose > > > > El 6 oct 2023, a las 0:37, Kenneth C Hall escribi?: > > > > Hi all, > > > > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x. > > > > I am trying to implement in SLEPc/NEP. To get started, I have defined a much smaller and simpler system of the form > > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP. > > > > I have set the problem up using shell matrices (as that is my ultimate goal). The full code is attached, but here is a smaller snippet of code: > > > > !.... Create matrix-free operators for A and B > > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr)) > > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr)) > > PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr)) > > PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr)) > > > > !.... Create nonlinear eigensolver > > PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr)) > > > > !.... Set the problem type > > PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr)) > > ! > > !.... set the solver type > > PetscCall(NEPSetType(nep, NEPNLEIGS, ierr)) > > ! > > !.... Set functions and Jacobians for NEP > > PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr)) > > PetscCall(NEPSetJacobian(nep, B, MyNEPJacobian, PETSC_NULL_INTEGER, ierr)) > > > > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B. The NEP viewer and error messages are attached. > > > > Any help on getting this problem properly set up would be greatly appreciated. > > > > Kenneth Hall > > ATTACHMENTS: > > test_nep.f90 > > code_output > > > > > From Roland.Richter at empa.ch Wed Oct 11 01:44:48 2023 From: Roland.Richter at empa.ch (Richter, Roland) Date: Wed, 11 Oct 2023 06:44:48 +0000 Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails In-Reply-To: <78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov> References: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev> <78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov> Message-ID: Hei, Thank you very much for the answer! I looked it up, but petsc.org seems to be a bit unstable here, quite often I can't reach petsc.org. Regards, Roland Richter -----Urspr?ngliche Nachricht----- Von: Satish Balay Gesendet: mandag 9. oktober 2023 17:29 An: Barry Smith Cc: Richter, Roland ; petsc-users at mcs.anl.gov Betreff: Re: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails Will note - OneAPI MPI usage is documented at https://petsc.org/release/install/install/#mpi Satish On Mon, 9 Oct 2023, Barry Smith wrote: > > Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc) and > > export I_MPI_CC=icx > export I_MPI_CXX=icpx > export I_MPI_F90=ifx > > > > On Oct 9, 2023, at 8:32 AM, Richter, Roland wrote: > > > > Hei, > > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx" as C-compiler, as described by https://stackoverflow.com/a/76362396. Therefore, I adapted the configure-line as follow: > > > > ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=no --download-p4est=yes --download-sowing=y es --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1 > > > > The configuration, however, fails with > > > > The CMAKE_C_COMPILER: > > > > mpiicc -cc=icx > > > > is not a full path and was not found in the PATH > > > > for all additional modules which use a cmake-based configuration approach (such as OPENCASCADE). How could I solve that problem? > > > > Thank you! > > Regards, > > Roland Richter > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 7926 bytes Desc: not available URL: From thanasis.boutsikakis at corintis.com Wed Oct 11 01:58:18 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Wed, 11 Oct 2023 08:58:18 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et> References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et> Message-ID: <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com> Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code """Experimenting with PETSc mat-mat multiplication""" import numpy as np from firedrake import COMM_WORLD from firedrake.petsc import PETSc from utilities import Print nproc = COMM_WORLD.size rank = COMM_WORLD.rank def create_petsc_matrix(input_array, sparse=True): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: PETSc mat: PETSc mpi matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape size = ((None, global_rows), (global_cols, global_cols)) # Create a sparse or dense matrix based on the 'sparse' argument if sparse: matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix # -------------------------------------------- # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi # A' = Phi.T * A * Phi # [k x k] <- [k x m] x [m x m] x [m x k] # -------------------------------------------- m, k = 100, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, m)) Phi_np = np.random.randint(low=0, high=6, size=(m, k)) # -------------------------------------------- # TEST: Galerking projection of numpy matrices A_np and Phi_np # -------------------------------------------- Aprime_np = Phi_np.T @ A_np @ Phi_np # Create A as an mpi matrix distributed on each process A = create_petsc_matrix(A_np, sparse=True) # Create Phi as an mpi matrix distributed on each process Phi = create_petsc_matrix(Phi_np, sparse=True) # Create an empty PETSc matrix object to store the result of the PtAP operation. # This will hold the result A' = Phi.T * A * Phi after the computation. A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True) # Perform the PtAP (Phi Transpose times A times Phi) operation. # In mathematical terms, this operation is A' = Phi.T * A * Phi. # A_prime will store the result of the operation. A_prime = A.ptap(Phi) I got Traceback (most recent call last): File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in Traceback (most recent call last): File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in Traceback (most recent call last): File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in A_prime = A.ptap(Phi) A_prime = A.ptap(Phi) ^^^^^^^^^^^ File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap A_prime = A.ptap(Phi) ^^^^^^^^^^^ ^^^^^^^^^^^ File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap petsc4py.PETSc.Error: error code 60 [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 [0] Nonconforming object sizes [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 petsc4py.PETSc.Error: error code 60 [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 [1] Nonconforming object sizes [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67) Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1 petsc4py.PETSc.Error: error code 60 [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 [2] Nonconforming object sizes [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100) Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2 > On 11 Oct 2023, at 07:18, Pierre Jolivet wrote: > > I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance). > It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense. > On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message. > On the other hand, there is not much to optimize when computing C = P? A P with everything being dense. > I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then). > > Thanks, > Pierre > >> On 11 Oct 2023, at 2:42?AM, Mark Adams wrote: >> >> This looks like a false positive or there is some subtle bug here that we are not seeing. >> Could this be the first time parallel PtAP has been used (and reported) in petsc4py? >> >> Mark >> >> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley > wrote: >>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis > wrote: >>>> Hi all, >>>> >>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). >>> >>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong. >>> >>> Thanks, >>> >>> Matt >>> >>>> """Experimenting with PETSc mat-mat multiplication""" >>>> >>>> import time >>>> >>>> import numpy as np >>>> from colorama import Fore >>>> from firedrake import COMM_SELF, COMM_WORLD >>>> from firedrake.petsc import PETSc >>>> from mpi4py import MPI >>>> from numpy.testing import assert_array_almost_equal >>>> >>>> from utilities import Print >>>> >>>> nproc = COMM_WORLD.size >>>> rank = COMM_WORLD.rank >>>> >>>> def create_petsc_matrix(input_array, sparse=True): >>>> """Create a PETSc matrix from an input_array >>>> >>>> Args: >>>> input_array (np array): Input array >>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>> >>>> Returns: >>>> PETSc mat: PETSc mpi matrix >>>> """ >>>> # Check if input_array is 1D and reshape if necessary >>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>> global_rows, global_cols = input_array.shape >>>> size = ((None, global_rows), (global_cols, global_cols)) >>>> >>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>> if sparse: >>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>> else: >>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>> matrix.setUp() >>>> >>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>> >>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>> # Calculate the correct row in the array for the current process >>>> row_in_array = counter + local_rows_start >>>> matrix.setValues( >>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>> ) >>>> >>>> # Assembly the matrix to compute the final structure >>>> matrix.assemblyBegin() >>>> matrix.assemblyEnd() >>>> >>>> return matrix >>>> >>>> # -------------------------------------------- >>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>> # A' = Phi.T * A * Phi >>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>> # -------------------------------------------- >>>> >>>> m, k = 100, 7 >>>> # Generate the random numpy matrices >>>> np.random.seed(0) # sets the seed to 0 >>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>> >>>> # -------------------------------------------- >>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>> # -------------------------------------------- >>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>> Print(f"{Aprime_np}") >>>> >>>> # Create A as an mpi matrix distributed on each process >>>> A = create_petsc_matrix(A_np, sparse=False) >>>> >>>> # Create Phi as an mpi matrix distributed on each process >>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>> >>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>> >>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>> # A_prime will store the result of the operation. >>>> A_prime = A.ptap(Phi) >>>> >>>> Here is the error >>>> >>>> MATRIX mpiaij A [100x100] >>>> Assembled >>>> >>>> Partitioning for A: >>>> Rank 0: Rows [0, 34) >>>> Rank 1: Rows [34, 67) >>>> Rank 2: Rows [67, 100) >>>> >>>> MATRIX mpiaij Phi [100x7] >>>> Assembled >>>> >>>> Partitioning for Phi: >>>> Rank 0: Rows [0, 34) >>>> Rank 1: Rows [34, 67) >>>> Rank 2: Rows [67, 100) >>>> >>>> Traceback (most recent call last): >>>> File "/Users/boutsitron/work/galerkin_projection.py", line 87, in >>>> A_prime = A.ptap(Phi) >>>> ^^^^^^^^^^^ >>>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>>> petsc4py.PETSc.Error: error code 60 >>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>>> [0] Nonconforming object sizes >>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >>>> >>>> Any thoughts? >>>> >>>> Thanks, >>>> Thanos >>>> >>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis > wrote: >>>>> >>>>> This works Pierre. Amazing input, thanks a lot! >>>>> >>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet > wrote: >>>>>> >>>>>> Not a petsc4py expert here, but you may to try instead: >>>>>> A_prime = A.ptap(Phi) >>>>>> >>>>>> Thanks, >>>>>> Pierre >>>>>> >>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis > wrote: >>>>>>> >>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>>>>>> >>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>>>> >>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>> >>>>>>> import time >>>>>>> >>>>>>> import numpy as np >>>>>>> from colorama import Fore >>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>> from firedrake.petsc import PETSc >>>>>>> from mpi4py import MPI >>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>> >>>>>>> from utilities import ( >>>>>>> Print, >>>>>>> create_petsc_matrix, >>>>>>> print_matrix_partitioning, >>>>>>> ) >>>>>>> >>>>>>> nproc = COMM_WORLD.size >>>>>>> rank = COMM_WORLD.rank >>>>>>> >>>>>>> # -------------------------------------------- >>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>> # A' = Phi.T * A * Phi >>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>> # -------------------------------------------- >>>>>>> >>>>>>> m, k = 11, 7 >>>>>>> # Generate the random numpy matrices >>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>> >>>>>>> # -------------------------------------------- >>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>>> # -------------------------------------------- >>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>>> Print(f"{Aprime_np}") >>>>>>> >>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>>> >>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>>> >>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>>> >>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>>> # A_prime will store the result of the operation. >>>>>>> Phi.PtAP(A, A_prime) >>>>>>> >>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet > wrote: >>>>>>>> >>>>>>>> How about using ptap which will use MatPtAP? >>>>>>>> It will be more efficient (and it will help you bypass the issue). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Pierre >>>>>>>> >>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis > wrote: >>>>>>>>> >>>>>>>>> Sorry, forgot function create_petsc_matrix() >>>>>>>>> >>>>>>>>> def create_petsc_matrix(input_array sparse=True): >>>>>>>>> """Create a PETSc matrix from an input_array >>>>>>>>> >>>>>>>>> Args: >>>>>>>>> input_array (np array): Input array >>>>>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>>>>> >>>>>>>>> Returns: >>>>>>>>> PETSc mat: PETSc matrix >>>>>>>>> """ >>>>>>>>> # Check if input_array is 1D and reshape if necessary >>>>>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>>>>> global_rows, global_cols = input_array.shape >>>>>>>>> >>>>>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>>>>> >>>>>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>>>>> if sparse: >>>>>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>>>>> else: >>>>>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>>>>> matrix.setUp() >>>>>>>>> >>>>>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>>>>> >>>>>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>>>>> # Calculate the correct row in the array for the current process >>>>>>>>> row_in_array = counter + local_rows_start >>>>>>>>> matrix.setValues( >>>>>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>>>>> ) >>>>>>>>> >>>>>>>>> # Assembly the matrix to compute the final structure >>>>>>>>> matrix.assemblyBegin() >>>>>>>>> matrix.assemblyEnd() >>>>>>>>> >>>>>>>>> return matrix >>>>>>>>> >>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis > wrote: >>>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>>>>>> >>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>>>>>> petsc4py.PETSc.Error: error code 56 >>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>>>>>> [0] No support for this operation for this object type >>>>>>>>>> [0] Call MatProductCreate() first >>>>>>>>>> >>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>>>>>> >>>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>>> >>>>>>>>>> import time >>>>>>>>>> >>>>>>>>>> import numpy as np >>>>>>>>>> from colorama import Fore >>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>>> from mpi4py import MPI >>>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>>> >>>>>>>>>> from utilities import ( >>>>>>>>>> Print, >>>>>>>>>> create_petsc_matrix, >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>>> >>>>>>>>>> # -------------------------------------------- >>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>>> # -------------------------------------------- >>>>>>>>>> >>>>>>>>>> m, k = 11, 7 >>>>>>>>>> # Generate the random numpy matrices >>>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>>> >>>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>>> A = create_petsc_matrix(A_np) >>>>>>>>>> >>>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>>>>>> >>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>>>>>> >>>>>>>>>> # Now A1 contains the result of Phi^T * A >>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Wed Oct 11 02:04:29 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Wed, 11 Oct 2023 09:04:29 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> Message-ID: Furthermore, I tried to perform the Galerkin projection in two steps by substituting > A_prime = A.ptap(Phi) With AL = Phi.transposeMatMult(A) A_prime = AL.matMult(Phi) And running this with 3 procs, results to the false creation of a matrix AL that has 3 times bigger dimensions that it should (A is of size 100x100 and Phi of size 100x7): MATRIX mpiaij AL [21x300] Assembled Partitioning for AL: Rank 0: Rows [0, 7) Rank 1: Rows [7, 14) Rank 2: Rows [14, 21) And naturally, in another dimension incompatibility: Traceback (most recent call last): File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 85, in Traceback (most recent call last): File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 85, in A_prime = AL.matMult(Phi) A_prime = AL.matMult(Phi) ^^^^^^^^^^^^^^^ File "petsc4py/PETSc/Mat.pyx", line 1492, in petsc4py.PETSc.Mat.matMult ^^^^^^^^^^^^^^^ File "petsc4py/PETSc/Mat.pyx", line 1492, in petsc4py.PETSc.Mat.matMult Traceback (most recent call last): File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 85, in petsc4py.PETSc.Error: error code 60 [2] MatMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10053 [2] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9976 [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:421 [2] Nonconforming object sizes [2] Matrix dimensions of A and B are incompatible for MatProductType AB: A 21x300, B 100x7 Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2 petsc4py.PETSc.Error: error code 60 [1] MatMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10053 [1] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9976 [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:421 [1] Nonconforming object sizes [1] Matrix dimensions of A and B are incompatible for MatProductType AB: A 21x300, B 100x7 Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1 A_prime = AL.matMult(Phi) ^^^^^^^^^^^^^^^ File "petsc4py/PETSc/Mat.pyx", line 1492, in petsc4py.PETSc.Mat.matMult petsc4py.PETSc.Error: error code 60 [0] MatMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10053 [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9976 [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:421 [0] Nonconforming object sizes [0] Matrix dimensions of A and B are incompatible for MatProductType AB: A 21x300, B 100x7 Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 > On 10 Oct 2023, at 23:33, Thanasis Boutsikakis wrote: > > Hi all, > > Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). > > """Experimenting with PETSc mat-mat multiplication""" > > import time > > import numpy as np > from colorama import Fore > from firedrake import COMM_SELF, COMM_WORLD > from firedrake.petsc import PETSc > from mpi4py import MPI > from numpy.testing import assert_array_almost_equal > > from utilities import Print > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > def create_petsc_matrix(input_array, sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc mpi matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 100, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # -------------------------------------------- > # TEST: Galerking projection of numpy matrices A_np and Phi_np > # -------------------------------------------- > Aprime_np = Phi_np.T @ A_np @ Phi_np > Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") > Print(f"{Aprime_np}") > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np, sparse=False) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np, sparse=False) > > # Create an empty PETSc matrix object to store the result of the PtAP operation. > # This will hold the result A' = Phi.T * A * Phi after the computation. > A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) > > # Perform the PtAP (Phi Transpose times A times Phi) operation. > # In mathematical terms, this operation is A' = Phi.T * A * Phi. > # A_prime will store the result of the operation. > A_prime = A.ptap(Phi) > > Here is the error > > MATRIX mpiaij A [100x100] > Assembled > > Partitioning for A: > Rank 0: Rows [0, 34) > Rank 1: Rows [34, 67) > Rank 2: Rows [67, 100) > > MATRIX mpiaij Phi [100x7] > Assembled > > Partitioning for Phi: > Rank 0: Rows [0, 34) > Rank 1: Rows [34, 67) > Rank 2: Rows [67, 100) > > Traceback (most recent call last): > File "/Users/boutsitron/work/galerkin_projection.py", line 87, in > A_prime = A.ptap(Phi) > ^^^^^^^^^^^ > File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap > petsc4py.PETSc.Error: error code 60 > [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 > [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 > [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 > [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 > [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 > [0] Nonconforming object sizes > [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) > Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 > > Any thoughts? > > Thanks, > Thanos > >> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis wrote: >> >> This works Pierre. Amazing input, thanks a lot! >> >>> On 5 Oct 2023, at 14:17, Pierre Jolivet wrote: >>> >>> Not a petsc4py expert here, but you may to try instead: >>> A_prime = A.ptap(Phi) >>> >>> Thanks, >>> Pierre >>> >>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis wrote: >>>> >>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>>> >>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>> [0]PETSC ERROR: to get more information on the crash. >>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>> >>>> """Experimenting with PETSc mat-mat multiplication""" >>>> >>>> import time >>>> >>>> import numpy as np >>>> from colorama import Fore >>>> from firedrake import COMM_SELF, COMM_WORLD >>>> from firedrake.petsc import PETSc >>>> from mpi4py import MPI >>>> from numpy.testing import assert_array_almost_equal >>>> >>>> from utilities import ( >>>> Print, >>>> create_petsc_matrix, >>>> print_matrix_partitioning, >>>> ) >>>> >>>> nproc = COMM_WORLD.size >>>> rank = COMM_WORLD.rank >>>> >>>> # -------------------------------------------- >>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>> # A' = Phi.T * A * Phi >>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>> # -------------------------------------------- >>>> >>>> m, k = 11, 7 >>>> # Generate the random numpy matrices >>>> np.random.seed(0) # sets the seed to 0 >>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>> >>>> # -------------------------------------------- >>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>> # -------------------------------------------- >>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>> Print(f"{Aprime_np}") >>>> >>>> # Create A as an mpi matrix distributed on each process >>>> A = create_petsc_matrix(A_np, sparse=False) >>>> >>>> # Create Phi as an mpi matrix distributed on each process >>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>> >>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>> >>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>> # A_prime will store the result of the operation. >>>> Phi.PtAP(A, A_prime) >>>> >>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet wrote: >>>>> >>>>> How about using ptap which will use MatPtAP? >>>>> It will be more efficient (and it will help you bypass the issue). >>>>> >>>>> Thanks, >>>>> Pierre >>>>> >>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis wrote: >>>>>> >>>>>> Sorry, forgot function create_petsc_matrix() >>>>>> >>>>>> def create_petsc_matrix(input_array sparse=True): >>>>>> """Create a PETSc matrix from an input_array >>>>>> >>>>>> Args: >>>>>> input_array (np array): Input array >>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>> >>>>>> Returns: >>>>>> PETSc mat: PETSc matrix >>>>>> """ >>>>>> # Check if input_array is 1D and reshape if necessary >>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>> global_rows, global_cols = input_array.shape >>>>>> >>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>> >>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>> if sparse: >>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>> else: >>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>> matrix.setUp() >>>>>> >>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>> >>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>> # Calculate the correct row in the array for the current process >>>>>> row_in_array = counter + local_rows_start >>>>>> matrix.setValues( >>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>> ) >>>>>> >>>>>> # Assembly the matrix to compute the final structure >>>>>> matrix.assemblyBegin() >>>>>> matrix.assemblyEnd() >>>>>> >>>>>> return matrix >>>>>> >>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis wrote: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>>> >>>>>>> Phi.transposeMatMult(A, A1) >>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>>> petsc4py.PETSc.Error: error code 56 >>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>>> [0] No support for this operation for this object type >>>>>>> [0] Call MatProductCreate() first >>>>>>> >>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>>> >>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>> >>>>>>> import time >>>>>>> >>>>>>> import numpy as np >>>>>>> from colorama import Fore >>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>> from firedrake.petsc import PETSc >>>>>>> from mpi4py import MPI >>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>> >>>>>>> from utilities import ( >>>>>>> Print, >>>>>>> create_petsc_matrix, >>>>>>> ) >>>>>>> >>>>>>> nproc = COMM_WORLD.size >>>>>>> rank = COMM_WORLD.rank >>>>>>> >>>>>>> # -------------------------------------------- >>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>> # A' = Phi.T * A * Phi >>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>> # -------------------------------------------- >>>>>>> >>>>>>> m, k = 11, 7 >>>>>>> # Generate the random numpy matrices >>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>> >>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>> A = create_petsc_matrix(A_np) >>>>>>> >>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>>> >>>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>>> >>>>>>> # Now A1 contains the result of Phi^T * A >>>>>>> Phi.transposeMatMult(A, A1) >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed Oct 11 02:04:51 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 11 Oct 2023 09:04:51 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com> References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et> <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com> Message-ID: That?s because: size = ((None, global_rows), (global_cols, global_cols)) should be: size = ((None, global_rows), (None, global_cols)) Then, it will work. $ ~/repo/petsc/arch-darwin-c-debug-real/bin/mpirun -n 4 python3.12 test.py && echo $? 0 Thanks, Pierre > On 11 Oct 2023, at 8:58?AM, Thanasis Boutsikakis wrote: > > Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code > > """Experimenting with PETSc mat-mat multiplication""" > > import numpy as np > from firedrake import COMM_WORLD > from firedrake.petsc import PETSc > > from utilities import Print > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > > def create_petsc_matrix(input_array, sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc mpi matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > # -------------------------------------------- > # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi > # A' = Phi.T * A * Phi > # [k x k] <- [k x m] x [m x m] x [m x k] > # -------------------------------------------- > > m, k = 100, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, m)) > Phi_np = np.random.randint(low=0, high=6, size=(m, k)) > > # -------------------------------------------- > # TEST: Galerking projection of numpy matrices A_np and Phi_np > # -------------------------------------------- > Aprime_np = Phi_np.T @ A_np @ Phi_np > > # Create A as an mpi matrix distributed on each process > A = create_petsc_matrix(A_np, sparse=True) > > # Create Phi as an mpi matrix distributed on each process > Phi = create_petsc_matrix(Phi_np, sparse=True) > > # Create an empty PETSc matrix object to store the result of the PtAP operation. > # This will hold the result A' = Phi.T * A * Phi after the computation. > A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True) > > # Perform the PtAP (Phi Transpose times A times Phi) operation. > # In mathematical terms, this operation is A' = Phi.T * A * Phi. > # A_prime will store the result of the operation. > A_prime = A.ptap(Phi) > > I got > > Traceback (most recent call last): > File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in > Traceback (most recent call last): > File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in > Traceback (most recent call last): > File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in > A_prime = A.ptap(Phi) > A_prime = A.ptap(Phi) > ^^^^^^^^^^^ > File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap > A_prime = A.ptap(Phi) > ^^^^^^^^^^^ > ^^^^^^^^^^^ > File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap > File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap > petsc4py.PETSc.Error: error code 60 > [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 > [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 > [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 > [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 > [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 > [0] Nonconforming object sizes > [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) > Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 > petsc4py.PETSc.Error: error code 60 > [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 > [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 > [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 > [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 > [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 > [1] Nonconforming object sizes > [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67) > Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1 > petsc4py.PETSc.Error: error code 60 > [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 > [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 > [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 > [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 > [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 > [2] Nonconforming object sizes > [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100) > Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2 > >> On 11 Oct 2023, at 07:18, Pierre Jolivet wrote: >> >> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance). >> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense. >> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message. >> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense. >> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then). >> >> Thanks, >> Pierre >> >>> On 11 Oct 2023, at 2:42?AM, Mark Adams wrote: >>> >>> This looks like a false positive or there is some subtle bug here that we are not seeing. >>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py? >>> >>> Mark >>> >>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley > wrote: >>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis > wrote: >>>>> Hi all, >>>>> >>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). >>>> >>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>> >>>>> import time >>>>> >>>>> import numpy as np >>>>> from colorama import Fore >>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>> from firedrake.petsc import PETSc >>>>> from mpi4py import MPI >>>>> from numpy.testing import assert_array_almost_equal >>>>> >>>>> from utilities import Print >>>>> >>>>> nproc = COMM_WORLD.size >>>>> rank = COMM_WORLD.rank >>>>> >>>>> def create_petsc_matrix(input_array, sparse=True): >>>>> """Create a PETSc matrix from an input_array >>>>> >>>>> Args: >>>>> input_array (np array): Input array >>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>> >>>>> Returns: >>>>> PETSc mat: PETSc mpi matrix >>>>> """ >>>>> # Check if input_array is 1D and reshape if necessary >>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>> global_rows, global_cols = input_array.shape >>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>> >>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>> if sparse: >>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>> else: >>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>> matrix.setUp() >>>>> >>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>> >>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>> # Calculate the correct row in the array for the current process >>>>> row_in_array = counter + local_rows_start >>>>> matrix.setValues( >>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>> ) >>>>> >>>>> # Assembly the matrix to compute the final structure >>>>> matrix.assemblyBegin() >>>>> matrix.assemblyEnd() >>>>> >>>>> return matrix >>>>> >>>>> # -------------------------------------------- >>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>> # A' = Phi.T * A * Phi >>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>> # -------------------------------------------- >>>>> >>>>> m, k = 100, 7 >>>>> # Generate the random numpy matrices >>>>> np.random.seed(0) # sets the seed to 0 >>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>> >>>>> # -------------------------------------------- >>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>> # -------------------------------------------- >>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>> Print(f"{Aprime_np}") >>>>> >>>>> # Create A as an mpi matrix distributed on each process >>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>> >>>>> # Create Phi as an mpi matrix distributed on each process >>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>> >>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>> >>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>> # A_prime will store the result of the operation. >>>>> A_prime = A.ptap(Phi) >>>>> >>>>> Here is the error >>>>> >>>>> MATRIX mpiaij A [100x100] >>>>> Assembled >>>>> >>>>> Partitioning for A: >>>>> Rank 0: Rows [0, 34) >>>>> Rank 1: Rows [34, 67) >>>>> Rank 2: Rows [67, 100) >>>>> >>>>> MATRIX mpiaij Phi [100x7] >>>>> Assembled >>>>> >>>>> Partitioning for Phi: >>>>> Rank 0: Rows [0, 34) >>>>> Rank 1: Rows [34, 67) >>>>> Rank 2: Rows [67, 100) >>>>> >>>>> Traceback (most recent call last): >>>>> File "/Users/boutsitron/work/galerkin_projection.py", line 87, in >>>>> A_prime = A.ptap(Phi) >>>>> ^^^^^^^^^^^ >>>>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>>>> petsc4py.PETSc.Error: error code 60 >>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>>>> [0] Nonconforming object sizes >>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >>>>> >>>>> Any thoughts? >>>>> >>>>> Thanks, >>>>> Thanos >>>>> >>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis > wrote: >>>>>> >>>>>> This works Pierre. Amazing input, thanks a lot! >>>>>> >>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet > wrote: >>>>>>> >>>>>>> Not a petsc4py expert here, but you may to try instead: >>>>>>> A_prime = A.ptap(Phi) >>>>>>> >>>>>>> Thanks, >>>>>>> Pierre >>>>>>> >>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis > wrote: >>>>>>>> >>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>>>>>>> >>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>>>>> >>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>> >>>>>>>> import time >>>>>>>> >>>>>>>> import numpy as np >>>>>>>> from colorama import Fore >>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>> from firedrake.petsc import PETSc >>>>>>>> from mpi4py import MPI >>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>> >>>>>>>> from utilities import ( >>>>>>>> Print, >>>>>>>> create_petsc_matrix, >>>>>>>> print_matrix_partitioning, >>>>>>>> ) >>>>>>>> >>>>>>>> nproc = COMM_WORLD.size >>>>>>>> rank = COMM_WORLD.rank >>>>>>>> >>>>>>>> # -------------------------------------------- >>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>> # A' = Phi.T * A * Phi >>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>> # -------------------------------------------- >>>>>>>> >>>>>>>> m, k = 11, 7 >>>>>>>> # Generate the random numpy matrices >>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>> >>>>>>>> # -------------------------------------------- >>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>>>> # -------------------------------------------- >>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>>>> Print(f"{Aprime_np}") >>>>>>>> >>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>>>> >>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>>>> >>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>>>> >>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>>>> # A_prime will store the result of the operation. >>>>>>>> Phi.PtAP(A, A_prime) >>>>>>>> >>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet > wrote: >>>>>>>>> >>>>>>>>> How about using ptap which will use MatPtAP? >>>>>>>>> It will be more efficient (and it will help you bypass the issue). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Pierre >>>>>>>>> >>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis > wrote: >>>>>>>>>> >>>>>>>>>> Sorry, forgot function create_petsc_matrix() >>>>>>>>>> >>>>>>>>>> def create_petsc_matrix(input_array sparse=True): >>>>>>>>>> """Create a PETSc matrix from an input_array >>>>>>>>>> >>>>>>>>>> Args: >>>>>>>>>> input_array (np array): Input array >>>>>>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>>>>>> >>>>>>>>>> Returns: >>>>>>>>>> PETSc mat: PETSc matrix >>>>>>>>>> """ >>>>>>>>>> # Check if input_array is 1D and reshape if necessary >>>>>>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>>>>>> global_rows, global_cols = input_array.shape >>>>>>>>>> >>>>>>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>>>>>> >>>>>>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>>>>>> if sparse: >>>>>>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>>>>>> else: >>>>>>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>>>>>> matrix.setUp() >>>>>>>>>> >>>>>>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>>>>>> >>>>>>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>>>>>> # Calculate the correct row in the array for the current process >>>>>>>>>> row_in_array = counter + local_rows_start >>>>>>>>>> matrix.setValues( >>>>>>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> # Assembly the matrix to compute the final structure >>>>>>>>>> matrix.assemblyBegin() >>>>>>>>>> matrix.assemblyEnd() >>>>>>>>>> >>>>>>>>>> return matrix >>>>>>>>>> >>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>>>>>>> >>>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>>>>>>> petsc4py.PETSc.Error: error code 56 >>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>>>>>>> [0] No support for this operation for this object type >>>>>>>>>>> [0] Call MatProductCreate() first >>>>>>>>>>> >>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>>>>>>> >>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>>>> >>>>>>>>>>> import time >>>>>>>>>>> >>>>>>>>>>> import numpy as np >>>>>>>>>>> from colorama import Fore >>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>>>> from mpi4py import MPI >>>>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>>>> >>>>>>>>>>> from utilities import ( >>>>>>>>>>> Print, >>>>>>>>>>> create_petsc_matrix, >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>>>> >>>>>>>>>>> # -------------------------------------------- >>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>>>> # -------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> m, k = 11, 7 >>>>>>>>>>> # Generate the random numpy matrices >>>>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>>>> >>>>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>>>> A = create_petsc_matrix(A_np) >>>>>>>>>>> >>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>>>>>>> >>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>>>>>>> >>>>>>>>>>> # Now A1 contains the result of Phi^T * A >>>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Wed Oct 11 02:13:28 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Wed, 11 Oct 2023 09:13:28 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et> <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com> Message-ID: <74C597F2-65FA-4CCF-9611-C1C196E4C4C0@corintis.com> Very good catch Pierre, thanks a lot! This made everything work: the two-step process and the ptap(). I mistakenly thought that I should not let the local number of columns to be None, since the matrix is only partitioned row-wise. Could you please explain what happened because of my setting the local column number so that I get the philosophy behind this partitioning? Thanks again, Thanos > On 11 Oct 2023, at 09:04, Pierre Jolivet wrote: > > That?s because: > size = ((None, global_rows), (global_cols, global_cols)) > should be: > size = ((None, global_rows), (None, global_cols)) > Then, it will work. > $ ~/repo/petsc/arch-darwin-c-debug-real/bin/mpirun -n 4 python3.12 test.py && echo $? > 0 > > Thanks, > Pierre > >> On 11 Oct 2023, at 8:58?AM, Thanasis Boutsikakis wrote: >> >> Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code >> >> """Experimenting with PETSc mat-mat multiplication""" >> >> import numpy as np >> from firedrake import COMM_WORLD >> from firedrake.petsc import PETSc >> >> from utilities import Print >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> >> def create_petsc_matrix(input_array, sparse=True): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> PETSc mat: PETSc mpi matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> size = ((None, global_rows), (global_cols, global_cols)) >> >> # Create a sparse or dense matrix based on the 'sparse' argument >> if sparse: >> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >> else: >> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> # -------------------------------------------- >> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >> # A' = Phi.T * A * Phi >> # [k x k] <- [k x m] x [m x m] x [m x k] >> # -------------------------------------------- >> >> m, k = 100, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, m)) >> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> # -------------------------------------------- >> # TEST: Galerking projection of numpy matrices A_np and Phi_np >> # -------------------------------------------- >> Aprime_np = Phi_np.T @ A_np @ Phi_np >> >> # Create A as an mpi matrix distributed on each process >> A = create_petsc_matrix(A_np, sparse=True) >> >> # Create Phi as an mpi matrix distributed on each process >> Phi = create_petsc_matrix(Phi_np, sparse=True) >> >> # Create an empty PETSc matrix object to store the result of the PtAP operation. >> # This will hold the result A' = Phi.T * A * Phi after the computation. >> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True) >> >> # Perform the PtAP (Phi Transpose times A times Phi) operation. >> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >> # A_prime will store the result of the operation. >> A_prime = A.ptap(Phi) >> >> I got >> >> Traceback (most recent call last): >> File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in >> Traceback (most recent call last): >> File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in >> Traceback (most recent call last): >> File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in >> A_prime = A.ptap(Phi) >> A_prime = A.ptap(Phi) >> ^^^^^^^^^^^ >> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >> A_prime = A.ptap(Phi) >> ^^^^^^^^^^^ >> ^^^^^^^^^^^ >> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >> petsc4py.PETSc.Error: error code 60 >> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >> [0] Nonconforming object sizes >> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >> petsc4py.PETSc.Error: error code 60 >> [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >> [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >> [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >> [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >> [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >> [1] Nonconforming object sizes >> [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67) >> Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1 >> petsc4py.PETSc.Error: error code 60 >> [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >> [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >> [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >> [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >> [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >> [2] Nonconforming object sizes >> [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100) >> Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2 >> >>> On 11 Oct 2023, at 07:18, Pierre Jolivet wrote: >>> >>> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance). >>> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense. >>> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message. >>> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense. >>> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then). >>> >>> Thanks, >>> Pierre >>> >>>> On 11 Oct 2023, at 2:42?AM, Mark Adams wrote: >>>> >>>> This looks like a false positive or there is some subtle bug here that we are not seeing. >>>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py? >>>> >>>> Mark >>>> >>>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley > wrote: >>>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis > wrote: >>>>>> Hi all, >>>>>> >>>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). >>>>> >>>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>> >>>>>> import time >>>>>> >>>>>> import numpy as np >>>>>> from colorama import Fore >>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>> from firedrake.petsc import PETSc >>>>>> from mpi4py import MPI >>>>>> from numpy.testing import assert_array_almost_equal >>>>>> >>>>>> from utilities import Print >>>>>> >>>>>> nproc = COMM_WORLD.size >>>>>> rank = COMM_WORLD.rank >>>>>> >>>>>> def create_petsc_matrix(input_array, sparse=True): >>>>>> """Create a PETSc matrix from an input_array >>>>>> >>>>>> Args: >>>>>> input_array (np array): Input array >>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>> >>>>>> Returns: >>>>>> PETSc mat: PETSc mpi matrix >>>>>> """ >>>>>> # Check if input_array is 1D and reshape if necessary >>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>> global_rows, global_cols = input_array.shape >>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>> >>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>> if sparse: >>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>> else: >>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>> matrix.setUp() >>>>>> >>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>> >>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>> # Calculate the correct row in the array for the current process >>>>>> row_in_array = counter + local_rows_start >>>>>> matrix.setValues( >>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>> ) >>>>>> >>>>>> # Assembly the matrix to compute the final structure >>>>>> matrix.assemblyBegin() >>>>>> matrix.assemblyEnd() >>>>>> >>>>>> return matrix >>>>>> >>>>>> # -------------------------------------------- >>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>> # A' = Phi.T * A * Phi >>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>> # -------------------------------------------- >>>>>> >>>>>> m, k = 100, 7 >>>>>> # Generate the random numpy matrices >>>>>> np.random.seed(0) # sets the seed to 0 >>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>> >>>>>> # -------------------------------------------- >>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>> # -------------------------------------------- >>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>> Print(f"{Aprime_np}") >>>>>> >>>>>> # Create A as an mpi matrix distributed on each process >>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>> >>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>> >>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>> >>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>> # A_prime will store the result of the operation. >>>>>> A_prime = A.ptap(Phi) >>>>>> >>>>>> Here is the error >>>>>> >>>>>> MATRIX mpiaij A [100x100] >>>>>> Assembled >>>>>> >>>>>> Partitioning for A: >>>>>> Rank 0: Rows [0, 34) >>>>>> Rank 1: Rows [34, 67) >>>>>> Rank 2: Rows [67, 100) >>>>>> >>>>>> MATRIX mpiaij Phi [100x7] >>>>>> Assembled >>>>>> >>>>>> Partitioning for Phi: >>>>>> Rank 0: Rows [0, 34) >>>>>> Rank 1: Rows [34, 67) >>>>>> Rank 2: Rows [67, 100) >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File "/Users/boutsitron/work/galerkin_projection.py", line 87, in >>>>>> A_prime = A.ptap(Phi) >>>>>> ^^^^^^^^^^^ >>>>>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>>>>> petsc4py.PETSc.Error: error code 60 >>>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>>>>> [0] Nonconforming object sizes >>>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >>>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks, >>>>>> Thanos >>>>>> >>>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis > wrote: >>>>>>> >>>>>>> This works Pierre. Amazing input, thanks a lot! >>>>>>> >>>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet > wrote: >>>>>>>> >>>>>>>> Not a petsc4py expert here, but you may to try instead: >>>>>>>> A_prime = A.ptap(Phi) >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Pierre >>>>>>>> >>>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis > wrote: >>>>>>>>> >>>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>>>>>>>> >>>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>>>>>> >>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>> >>>>>>>>> import time >>>>>>>>> >>>>>>>>> import numpy as np >>>>>>>>> from colorama import Fore >>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>> from mpi4py import MPI >>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>> >>>>>>>>> from utilities import ( >>>>>>>>> Print, >>>>>>>>> create_petsc_matrix, >>>>>>>>> print_matrix_partitioning, >>>>>>>>> ) >>>>>>>>> >>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>> >>>>>>>>> # -------------------------------------------- >>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>> # -------------------------------------------- >>>>>>>>> >>>>>>>>> m, k = 11, 7 >>>>>>>>> # Generate the random numpy matrices >>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>> >>>>>>>>> # -------------------------------------------- >>>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>>>>> # -------------------------------------------- >>>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>>>>> Print(f"{Aprime_np}") >>>>>>>>> >>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>>>>> >>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>>>>> >>>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>>>>> >>>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>>>>> # A_prime will store the result of the operation. >>>>>>>>> Phi.PtAP(A, A_prime) >>>>>>>>> >>>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet > wrote: >>>>>>>>>> >>>>>>>>>> How about using ptap which will use MatPtAP? >>>>>>>>>> It will be more efficient (and it will help you bypass the issue). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Pierre >>>>>>>>>> >>>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis > wrote: >>>>>>>>>>> >>>>>>>>>>> Sorry, forgot function create_petsc_matrix() >>>>>>>>>>> >>>>>>>>>>> def create_petsc_matrix(input_array sparse=True): >>>>>>>>>>> """Create a PETSc matrix from an input_array >>>>>>>>>>> >>>>>>>>>>> Args: >>>>>>>>>>> input_array (np array): Input array >>>>>>>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>>>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>>>>>>> >>>>>>>>>>> Returns: >>>>>>>>>>> PETSc mat: PETSc matrix >>>>>>>>>>> """ >>>>>>>>>>> # Check if input_array is 1D and reshape if necessary >>>>>>>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>>>>>>> global_rows, global_cols = input_array.shape >>>>>>>>>>> >>>>>>>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>>>>>>> >>>>>>>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>>>>>>> if sparse: >>>>>>>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>>>>>>> else: >>>>>>>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>>>>>>> matrix.setUp() >>>>>>>>>>> >>>>>>>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>>>>>>> >>>>>>>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>>>>>>> # Calculate the correct row in the array for the current process >>>>>>>>>>> row_in_array = counter + local_rows_start >>>>>>>>>>> matrix.setValues( >>>>>>>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> # Assembly the matrix to compute the final structure >>>>>>>>>>> matrix.assemblyBegin() >>>>>>>>>>> matrix.assemblyEnd() >>>>>>>>>>> >>>>>>>>>>> return matrix >>>>>>>>>>> >>>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi everyone, >>>>>>>>>>>> >>>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>>>>>>>> >>>>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>>>>>>>> petsc4py.PETSc.Error: error code 56 >>>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>>>>>>>> [0] No support for this operation for this object type >>>>>>>>>>>> [0] Call MatProductCreate() first >>>>>>>>>>>> >>>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>>>>>>>> >>>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>>>>> >>>>>>>>>>>> import time >>>>>>>>>>>> >>>>>>>>>>>> import numpy as np >>>>>>>>>>>> from colorama import Fore >>>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>>>>> from mpi4py import MPI >>>>>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>>>>> >>>>>>>>>>>> from utilities import ( >>>>>>>>>>>> Print, >>>>>>>>>>>> create_petsc_matrix, >>>>>>>>>>>> ) >>>>>>>>>>>> >>>>>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>>>>> >>>>>>>>>>>> # -------------------------------------------- >>>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>>>>> # -------------------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> m, k = 11, 7 >>>>>>>>>>>> # Generate the random numpy matrices >>>>>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>>>>> >>>>>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>>>>> A = create_petsc_matrix(A_np) >>>>>>>>>>>> >>>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>>>>>>>> >>>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>>>>>>>> >>>>>>>>>>>> # Now A1 contains the result of Phi^T * A >>>>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Roland.Richter at empa.ch Wed Oct 11 03:21:55 2023 From: Roland.Richter at empa.ch (Richter, Roland) Date: Wed, 11 Oct 2023 08:21:55 +0000 Subject: [petsc-users] Compilation failure of PETSc with "The procedure name of the INTERFACE block conflicts with a name in the encompassing scoping unit" Message-ID: Hei, following my last question I managed to configure PETSc with Intel MPI and Intel OneAPI using the following configure-line: ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc=mpiicc --with-cxx=mpiicpc --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc=mpiifort --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=yes --download-p4est=yes --download-sowing=yes --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=0 Now, however, compilation fails with the following error: /home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-int erfaces/petscpc.h90(699): error #6623: The procedure name of the INTERFACE block conflicts with a name in the encompassing scoping unit. [PCGASMCREATESUBDOMAINS2D] subroutine PCGASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,j,z) -----------------^ /home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-int erfaces/petscpc.h90(1199): error #6623: The procedure name of the INTERFACE block conflicts with a name in the encompassing scoping unit. [PCASMCREATESUBDOMAINS2D] subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) -----------------^ I'm on the latest version of origin/main, but can't figure out how to fix that issue by myself. Therefore, I'd appreciate additional insight. Thanks! Regards, Roland Richter -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compilation_log.log Type: application/octet-stream Size: 18866 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 7926 bytes Desc: not available URL: From pierre at joliv.et Wed Oct 11 03:29:26 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 11 Oct 2023 10:29:26 +0200 Subject: [petsc-users] Galerkin projection using petsc4py In-Reply-To: <74C597F2-65FA-4CCF-9611-C1C196E4C4C0@corintis.com> References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com> <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com> <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com> <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et> <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com> <74C597F2-65FA-4CCF-9611-C1C196E4C4C0@corintis.com> Message-ID: <80B91AD7-7FC5-46FF-9FE0-B3205719C6CE@joliv.et> > On 11 Oct 2023, at 9:13?AM, Thanasis Boutsikakis wrote: > > Very good catch Pierre, thanks a lot! > > This made everything work: the two-step process and the ptap(). I mistakenly thought that I should not let the local number of columns to be None, since the matrix is only partitioned row-wise. Could you please explain what happened because of my setting the local column number so that I get the philosophy behind this partitioning? Hopefully this should make things clearer to you: https://petsc.org/release/manual/mat/#sec-matlayout Thanks, Pierre > Thanks again, > Thanos > >> On 11 Oct 2023, at 09:04, Pierre Jolivet wrote: >> >> That?s because: >> size = ((None, global_rows), (global_cols, global_cols)) >> should be: >> size = ((None, global_rows), (None, global_cols)) >> Then, it will work. >> $ ~/repo/petsc/arch-darwin-c-debug-real/bin/mpirun -n 4 python3.12 test.py && echo $? >> 0 >> >> Thanks, >> Pierre >> >>> On 11 Oct 2023, at 8:58?AM, Thanasis Boutsikakis wrote: >>> >>> Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code >>> >>> """Experimenting with PETSc mat-mat multiplication""" >>> >>> import numpy as np >>> from firedrake import COMM_WORLD >>> from firedrake.petsc import PETSc >>> >>> from utilities import Print >>> >>> nproc = COMM_WORLD.size >>> rank = COMM_WORLD.rank >>> >>> >>> def create_petsc_matrix(input_array, sparse=True): >>> """Create a PETSc matrix from an input_array >>> >>> Args: >>> input_array (np array): Input array >>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>> >>> Returns: >>> PETSc mat: PETSc mpi matrix >>> """ >>> # Check if input_array is 1D and reshape if necessary >>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>> global_rows, global_cols = input_array.shape >>> size = ((None, global_rows), (global_cols, global_cols)) >>> >>> # Create a sparse or dense matrix based on the 'sparse' argument >>> if sparse: >>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>> else: >>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>> matrix.setUp() >>> >>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>> >>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>> # Calculate the correct row in the array for the current process >>> row_in_array = counter + local_rows_start >>> matrix.setValues( >>> i, range(global_cols), input_array[row_in_array, :], addv=False >>> ) >>> >>> # Assembly the matrix to compute the final structure >>> matrix.assemblyBegin() >>> matrix.assemblyEnd() >>> >>> return matrix >>> >>> >>> # -------------------------------------------- >>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>> # A' = Phi.T * A * Phi >>> # [k x k] <- [k x m] x [m x m] x [m x k] >>> # -------------------------------------------- >>> >>> m, k = 100, 7 >>> # Generate the random numpy matrices >>> np.random.seed(0) # sets the seed to 0 >>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>> >>> # -------------------------------------------- >>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>> # -------------------------------------------- >>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>> >>> # Create A as an mpi matrix distributed on each process >>> A = create_petsc_matrix(A_np, sparse=True) >>> >>> # Create Phi as an mpi matrix distributed on each process >>> Phi = create_petsc_matrix(Phi_np, sparse=True) >>> >>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True) >>> >>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>> # A_prime will store the result of the operation. >>> A_prime = A.ptap(Phi) >>> >>> I got >>> >>> Traceback (most recent call last): >>> File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in >>> Traceback (most recent call last): >>> File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in >>> Traceback (most recent call last): >>> File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in >>> A_prime = A.ptap(Phi) >>> A_prime = A.ptap(Phi) >>> ^^^^^^^^^^^ >>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>> A_prime = A.ptap(Phi) >>> ^^^^^^^^^^^ >>> ^^^^^^^^^^^ >>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>> petsc4py.PETSc.Error: error code 60 >>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>> [0] Nonconforming object sizes >>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >>> petsc4py.PETSc.Error: error code 60 >>> [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>> [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>> [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>> [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>> [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>> [1] Nonconforming object sizes >>> [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67) >>> Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1 >>> petsc4py.PETSc.Error: error code 60 >>> [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>> [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>> [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>> [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>> [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>> [2] Nonconforming object sizes >>> [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100) >>> Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2 >>> >>>> On 11 Oct 2023, at 07:18, Pierre Jolivet wrote: >>>> >>>> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance). >>>> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense. >>>> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message. >>>> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense. >>>> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then). >>>> >>>> Thanks, >>>> Pierre >>>> >>>>> On 11 Oct 2023, at 2:42?AM, Mark Adams wrote: >>>>> >>>>> This looks like a false positive or there is some subtle bug here that we are not seeing. >>>>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py? >>>>> >>>>> Mark >>>>> >>>>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley > wrote: >>>>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis > wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned). >>>>>> >>>>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>> >>>>>>> import time >>>>>>> >>>>>>> import numpy as np >>>>>>> from colorama import Fore >>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>> from firedrake.petsc import PETSc >>>>>>> from mpi4py import MPI >>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>> >>>>>>> from utilities import Print >>>>>>> >>>>>>> nproc = COMM_WORLD.size >>>>>>> rank = COMM_WORLD.rank >>>>>>> >>>>>>> def create_petsc_matrix(input_array, sparse=True): >>>>>>> """Create a PETSc matrix from an input_array >>>>>>> >>>>>>> Args: >>>>>>> input_array (np array): Input array >>>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>>> >>>>>>> Returns: >>>>>>> PETSc mat: PETSc mpi matrix >>>>>>> """ >>>>>>> # Check if input_array is 1D and reshape if necessary >>>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>>> global_rows, global_cols = input_array.shape >>>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>>> >>>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>>> if sparse: >>>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>>> else: >>>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>>> matrix.setUp() >>>>>>> >>>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>>> >>>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>>> # Calculate the correct row in the array for the current process >>>>>>> row_in_array = counter + local_rows_start >>>>>>> matrix.setValues( >>>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>>> ) >>>>>>> >>>>>>> # Assembly the matrix to compute the final structure >>>>>>> matrix.assemblyBegin() >>>>>>> matrix.assemblyEnd() >>>>>>> >>>>>>> return matrix >>>>>>> >>>>>>> # -------------------------------------------- >>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>> # A' = Phi.T * A * Phi >>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>> # -------------------------------------------- >>>>>>> >>>>>>> m, k = 100, 7 >>>>>>> # Generate the random numpy matrices >>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>> >>>>>>> # -------------------------------------------- >>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>>> # -------------------------------------------- >>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>>> Print(f"{Aprime_np}") >>>>>>> >>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>>> >>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>>> >>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>>> >>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>>> # A_prime will store the result of the operation. >>>>>>> A_prime = A.ptap(Phi) >>>>>>> >>>>>>> Here is the error >>>>>>> >>>>>>> MATRIX mpiaij A [100x100] >>>>>>> Assembled >>>>>>> >>>>>>> Partitioning for A: >>>>>>> Rank 0: Rows [0, 34) >>>>>>> Rank 1: Rows [34, 67) >>>>>>> Rank 2: Rows [67, 100) >>>>>>> >>>>>>> MATRIX mpiaij Phi [100x7] >>>>>>> Assembled >>>>>>> >>>>>>> Partitioning for Phi: >>>>>>> Rank 0: Rows [0, 34) >>>>>>> Rank 1: Rows [34, 67) >>>>>>> Rank 2: Rows [67, 100) >>>>>>> >>>>>>> Traceback (most recent call last): >>>>>>> File "/Users/boutsitron/work/galerkin_projection.py", line 87, in >>>>>>> A_prime = A.ptap(Phi) >>>>>>> ^^^^^^^^^^^ >>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap >>>>>>> petsc4py.PETSc.Error: error code 60 >>>>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896 >>>>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541 >>>>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435 >>>>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372 >>>>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266 >>>>>>> [0] Nonconforming object sizes >>>>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34) >>>>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0 >>>>>>> >>>>>>> Any thoughts? >>>>>>> >>>>>>> Thanks, >>>>>>> Thanos >>>>>>> >>>>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis > wrote: >>>>>>>> >>>>>>>> This works Pierre. Amazing input, thanks a lot! >>>>>>>> >>>>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet > wrote: >>>>>>>>> >>>>>>>>> Not a petsc4py expert here, but you may to try instead: >>>>>>>>> A_prime = A.ptap(Phi) >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Pierre >>>>>>>>> >>>>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis > wrote: >>>>>>>>>> >>>>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth? >>>>>>>>>> >>>>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>>>>>>> >>>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>>> >>>>>>>>>> import time >>>>>>>>>> >>>>>>>>>> import numpy as np >>>>>>>>>> from colorama import Fore >>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>>> from mpi4py import MPI >>>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>>> >>>>>>>>>> from utilities import ( >>>>>>>>>> Print, >>>>>>>>>> create_petsc_matrix, >>>>>>>>>> print_matrix_partitioning, >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>>> >>>>>>>>>> # -------------------------------------------- >>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>>> # -------------------------------------------- >>>>>>>>>> >>>>>>>>>> m, k = 11, 7 >>>>>>>>>> # Generate the random numpy matrices >>>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>>> >>>>>>>>>> # -------------------------------------------- >>>>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np >>>>>>>>>> # -------------------------------------------- >>>>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np >>>>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]") >>>>>>>>>> Print(f"{Aprime_np}") >>>>>>>>>> >>>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>>> A = create_petsc_matrix(A_np, sparse=False) >>>>>>>>>> >>>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False) >>>>>>>>>> >>>>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation. >>>>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation. >>>>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False) >>>>>>>>>> >>>>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation. >>>>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi. >>>>>>>>>> # A_prime will store the result of the operation. >>>>>>>>>> Phi.PtAP(A, A_prime) >>>>>>>>>> >>>>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet > wrote: >>>>>>>>>>> >>>>>>>>>>> How about using ptap which will use MatPtAP? >>>>>>>>>>> It will be more efficient (and it will help you bypass the issue). >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Pierre >>>>>>>>>>> >>>>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Sorry, forgot function create_petsc_matrix() >>>>>>>>>>>> >>>>>>>>>>>> def create_petsc_matrix(input_array sparse=True): >>>>>>>>>>>> """Create a PETSc matrix from an input_array >>>>>>>>>>>> >>>>>>>>>>>> Args: >>>>>>>>>>>> input_array (np array): Input array >>>>>>>>>>>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>>>>>>>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>>>>>>>> >>>>>>>>>>>> Returns: >>>>>>>>>>>> PETSc mat: PETSc matrix >>>>>>>>>>>> """ >>>>>>>>>>>> # Check if input_array is 1D and reshape if necessary >>>>>>>>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>>>>>>>> global_rows, global_cols = input_array.shape >>>>>>>>>>>> >>>>>>>>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>>>>>>>> >>>>>>>>>>>> # Create a sparse or dense matrix based on the 'sparse' argument >>>>>>>>>>>> if sparse: >>>>>>>>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>>>>>>>>>>> else: >>>>>>>>>>>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>>>>>>>>>>> matrix.setUp() >>>>>>>>>>>> >>>>>>>>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>>>>>>>> >>>>>>>>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>>>>>>>> # Calculate the correct row in the array for the current process >>>>>>>>>>>> row_in_array = counter + local_rows_start >>>>>>>>>>>> matrix.setValues( >>>>>>>>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>>>>>>>> ) >>>>>>>>>>>> >>>>>>>>>>>> # Assembly the matrix to compute the final structure >>>>>>>>>>>> matrix.assemblyBegin() >>>>>>>>>>>> matrix.assemblyEnd() >>>>>>>>>>>> >>>>>>>>>>>> return matrix >>>>>>>>>>>> >>>>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is >>>>>>>>>>>>> >>>>>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>>>>> File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult >>>>>>>>>>>>> petsc4py.PETSc.Error: error code 56 >>>>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135 >>>>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989 >>>>>>>>>>>>> [0] No support for this operation for this object type >>>>>>>>>>>>> [0] Call MatProductCreate() first >>>>>>>>>>>>> >>>>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel) >>>>>>>>>>>>> >>>>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication""" >>>>>>>>>>>>> >>>>>>>>>>>>> import time >>>>>>>>>>>>> >>>>>>>>>>>>> import numpy as np >>>>>>>>>>>>> from colorama import Fore >>>>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD >>>>>>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>>>>>> from mpi4py import MPI >>>>>>>>>>>>> from numpy.testing import assert_array_almost_equal >>>>>>>>>>>>> >>>>>>>>>>>>> from utilities import ( >>>>>>>>>>>>> Print, >>>>>>>>>>>>> create_petsc_matrix, >>>>>>>>>>>>> ) >>>>>>>>>>>>> >>>>>>>>>>>>> nproc = COMM_WORLD.size >>>>>>>>>>>>> rank = COMM_WORLD.rank >>>>>>>>>>>>> >>>>>>>>>>>>> # -------------------------------------------- >>>>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi >>>>>>>>>>>>> # A' = Phi.T * A * Phi >>>>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k] >>>>>>>>>>>>> # -------------------------------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> m, k = 11, 7 >>>>>>>>>>>>> # Generate the random numpy matrices >>>>>>>>>>>>> np.random.seed(0) # sets the seed to 0 >>>>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m)) >>>>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>>>>>>>>> >>>>>>>>>>>>> # Create A as an mpi matrix distributed on each process >>>>>>>>>>>>> A = create_petsc_matrix(A_np) >>>>>>>>>>>>> >>>>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process >>>>>>>>>>>>> Phi = create_petsc_matrix(Phi_np) >>>>>>>>>>>>> >>>>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m))) >>>>>>>>>>>>> >>>>>>>>>>>>> # Now A1 contains the result of Phi^T * A >>>>>>>>>>>>> Phi.transposeMatMult(A, A1) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erdemguer at proton.me Wed Oct 11 03:42:14 2023 From: erdemguer at proton.me (erdemguer) Date: Wed, 11 Oct 2023 08:42:14 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: Message-ID: Hi again, Here is my code: #include static char help[] = "dmplex"; int main(int argc, char **argv) { PetscCall(PetscInitialize(&argc, &argv, NULL, help)); DM dm, dm_dist; PetscSection section; PetscInt cStart, cEndInterior, cEnd, rank; PetscInt nc[3] = {3, 3, 3}; PetscReal upper[3] = {1, 1, 1}; PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm); DMViewFromOptions(dm, NULL, "-dm1_view"); PetscCall(DMSetFromOptions(dm)); DMViewFromOptions(dm, NULL, "-dm2_view"); PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); DMPlexComputeCellTypes(dm); PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL)); PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, cEndInterior, cEnd); PetscInt nField = 1, nDof = 3, field = 0; PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); PetscSectionSetNumFields(section, nField); PetscCall(PetscSectionSetChart(section, cStart, cEnd)); for (PetscInt p = cStart; p < cEnd; p++) { PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); PetscCall(PetscSectionSetDof(section, p, nDof)); } PetscCall(PetscSectionSetUp(section)); DMSetLocalSection(dm, section); DMViewFromOptions(dm, NULL, "-dm3_view"); DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); DMViewFromOptions(dm, NULL, "-dm4_view"); PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); if (dm_dist) { DMDestroy(&dm); dm = dm_dist; } DMViewFromOptions(dm, NULL, "-dm5_view"); PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); DMPlexComputeCellTypes(dm); PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL)); PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, cEndInterior, cEnd); DMDestroy(&dm); PetscCall(PetscFinalize());} This codes output is currently (on 2 processors) is: Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 DMView outputs: dm1_view (after creation): DM Object: 2 MPI processes type: plex DM_0x84000004_0 in 3 dimensions: Number of 0-cells per rank: 64 0 Number of 1-cells per rank: 144 0 Number of 2-cells per rank: 108 0 Number of 3-cells per rank: 27 0 Labels: marker: 1 strata with value/size (1 (218)) Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9)) depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) dm2_view (after setfromoptions): DM Object: 2 MPI processes type: plex DM_0x84000004_0 in 3 dimensions: Number of 0-cells per rank: 40 46 Number of 1-cells per rank: 83 95 Number of 2-cells per rank: 57 64 Number of 3-cells per rank: 13 14 Labels: depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) marker: 1 strata with value/size (1 (109)) Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) dm3_view (after setting local section): DM Object: 2 MPI processes type: plex DM_0x84000004_0 in 3 dimensions: Number of 0-cells per rank: 40 46 Number of 1-cells per rank: 83 95 Number of 2-cells per rank: 57 64 Number of 3-cells per rank: 13 14 Labels: depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) marker: 1 strata with value/size (1 (109)) Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) Field Field_0: adjacency FEM dm4_view (after setting adjacency): DM Object: 2 MPI processes type: plex DM_0x84000004_0 in 3 dimensions: Number of 0-cells per rank: 40 46 Number of 1-cells per rank: 83 95 Number of 2-cells per rank: 57 64 Number of 3-cells per rank: 13 14 Labels: depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) marker: 1 strata with value/size (1 (109)) Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) Field Field_0: adjacency FVM++ dm5_view (after distribution): DM Object: Parallel Mesh 2 MPI processes type: plex Parallel Mesh in 3 dimensions: Number of 0-cells per rank: 64 60 Number of 1-cells per rank: 144 133 Number of 2-cells per rank: 108 98 Number of 3-cells per rank: 27 24 Labels: depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) marker: 1 strata with value/size (1 (218)) Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9)) celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) Field Field_0: adjacency FVM++ Thanks, Guer. Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley wrote: > On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: > >> Hi, >> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: >> >> - DMPlexCreateBoxMesh >> >> - DMSetFromOptions >> - PetscSectionCreate >> - PetscSectionSetNumFields >> - PetscSectionSetFieldDof >> >> - PetscSectionSetDof >> >> - PetscSectionSetUp >> - DMSetLocalSection >> - DMSetAdjacency >> - DMPlexDistribute >> >> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. > > Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put > > DMViewFromOptions(dm, NULL, "-dm1_view") > > with a different string after each call. > >> But I couldn't figure out how to decide where the ghost/processor boundary cells start. > > Please send the actual code because the above is not specific enough. For example, you will not have > "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, > so each process gets a unique set. > > Thanks, > > Matt > >> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. >> >> Thanks again, >> Guer. >> >> ------- Original Message ------- >> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: >> >>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: >>> >>>> Hi, >>>> >>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >>>> >>>> Create a DMPlex mesh from GMSH. >>>> Reorder it with DMPlexPermute. >>>> Create necessary pre-processing arrays related to the mesh/problem. >>>> Create field(s) with multi-dofs. >>>> Create residual vectors. >>>> Define a function to calculate the residual for each cell and, use SNES. >>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? >>> >>> The intention was that there is enough information in the manual to do this. >>> >>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. >>> >>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. >>> >>> Thanks, >>> >>> Matt >>> >>>> Thank you. >>>> Guer. >>>> >>>> Sent with [Proton Mail](https://proton.me/) secure email. >>> >>> -- >>> >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 11 06:02:42 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Oct 2023 07:02:42 -0400 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: Message-ID: On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users < petsc-users at mcs.anl.gov> wrote: > Good Evening, > > I am looking to implement a form of Navier-Stokes with SUPG Stabilization > and shock capturing using PETSc's FEM infrastructure. In this > implementation, I need access to the cell's shape function gradients and > natural coordinate gradients for calculations within the point-wise > residual calculations. How do I get these quantities at the quadrature > points? The signatures for fo and f1 don't seem to contain this information. > Are you sure you need those? Darsh and I implemented SUPG without that. You would need local second derivative information, which you can get using -dm_ds_jet_degree 2. If you check in an example, I can go over it. Thanks, Matt > Thank you in advance for your time. > Brandon > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 11 06:07:25 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Oct 2023 07:07:25 -0400 Subject: [petsc-users] Compilation failure of PETSc with "The procedure name of the INTERFACE block conflicts with a name in the encompassing scoping unit" In-Reply-To: References: Message-ID: On Wed, Oct 11, 2023 at 4:22?AM Richter, Roland wrote: > Hei, > > following my last question I managed to configure PETSc with Intel MPI and > Intel OneAPI using the following configure-line: > > > > *./configure --prefix=/media/storage/local_opt/petsc > --with-scalar-type=complex --with-cc=mpiicc --with-cxx=mpiicpc > --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native > -mavx2" --with-fc=mpiifort --with-pic=true --with-mpi=true > --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ > --with-openmp=true --download-hdf5=yes --download-netcdf=yes > --download-chaco=no --download-metis=yes --download-slepc=yes > --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes > --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes > --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 > --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost > --download-opencascade=yes --with-fftw=1 > --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes > --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 > --download-muparser=yes --download-p4est=yes --download-sowing=yes > --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=0* > > > > Now, however, compilation fails with the following error: > > /home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90(699): > error #6623: The procedure name of the INTERFACE block conflicts with a > name in the encompassing scoping unit. [PCGASMCREATESUBDOMAINS2D] > > subroutine PCGASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,j,z) > > -----------------^ > > /home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90(1199): > error #6623: The procedure name of the INTERFACE block conflicts with a > name in the encompassing scoping unit. [PCASMCREATESUBDOMAINS2D] > > subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > > -----------------^ > > I'm on the latest version of origin/main, but can't figure out how to fix > that issue by myself. Therefore, I'd appreciate additional insight. > You have old build files in the tree. We changed the Fortran stubs to be generated in the PETSC_ARCH tree so that you can build the stubs for different branches in the same PETSc tree. You have old stubs in the src tree. You can get rid of these using git clean -f -d -x unless you have your own files in the source tree, in which case you need to remove the ftn-auto-interfaces directories yourself. Thanks, Matt > Thanks! > > Regards, > > Roland Richter > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bldenton at buffalo.edu Wed Oct 11 07:25:10 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Wed, 11 Oct 2023 12:25:10 +0000 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: Message-ID: I was thinking about trying to implement Ben Kirk's approach to Navier-Stokes (see attached paper; Section 5). His approach uses these quantities to align the orientation of the unstructured element/cell with the fluid velocity to apply the stabilization/upwinding and to detect shocks. If you have an example of the approach you mentioned, could you please send it over so I can review it? On Oct 11, 2023 6:02 AM, Matthew Knepley wrote: On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users > wrote: Good Evening, I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information. Are you sure you need those? Darsh and I implemented SUPG without that. You would need local second derivative information, which you can get using -dm_ds_jet_degree 2. If you check in an example, I can go over it. Thanks, Matt Thank you in advance for your time. Brandon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 11 08:17:36 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Oct 2023 09:17:36 -0400 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: Message-ID: On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: > Hi again, > I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached. Thanks, Matt > Here is my code: > #include > static char help[] = "dmplex"; > > int main(int argc, char **argv) > { > PetscCall(PetscInitialize(&argc, &argv, NULL, help)); > DM dm, dm_dist; > PetscSection section; > PetscInt cStart, cEndInterior, cEnd, rank; > PetscInt nc[3] = {3, 3, 3}; > PetscReal upper[3] = {1, 1, 1}; > > PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); > > DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, > NULL, PETSC_TRUE, &dm); > DMViewFromOptions(dm, NULL, "-dm1_view"); > PetscCall(DMSetFromOptions(dm)); > DMViewFromOptions(dm, NULL, "-dm2_view"); > > PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); > DMPlexComputeCellTypes(dm); > PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, > &cEndInterior, NULL)); > PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: > %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, > cEndInterior, cEnd); > > PetscInt nField = 1, nDof = 3, field = 0; > PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); > PetscSectionSetNumFields(section, nField); > PetscCall(PetscSectionSetChart(section, cStart, cEnd)); > for (PetscInt p = cStart; p < cEnd; p++) > { > PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); > PetscCall(PetscSectionSetDof(section, p, nDof)); > } > > PetscCall(PetscSectionSetUp(section)); > > DMSetLocalSection(dm, section); > DMViewFromOptions(dm, NULL, "-dm3_view"); > > DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); > DMViewFromOptions(dm, NULL, "-dm4_view"); > PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); > if (dm_dist) > { > DMDestroy(&dm); > dm = dm_dist; > } > DMViewFromOptions(dm, NULL, "-dm5_view"); > PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); > DMPlexComputeCellTypes(dm); > PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, > &cEndInterior, NULL)); > PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, > cEndInterior: %d, cEnd: %d\n", rank, cStart, > cEndInterior, cEnd); > > DMDestroy(&dm); > PetscCall(PetscFinalize()); > } > > This codes output is currently (on 2 processors) is: > Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 > Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 > After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27 > After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 > > DMView outputs: > dm1_view (after creation): > DM Object: 2 MPI processes > type: plex > DM_0x84000004_0 in 3 dimensions: > Number of 0-cells per rank: 64 0 > Number of 1-cells per rank: 144 0 > Number of 2-cells per rank: 108 0 > Number of 3-cells per rank: 27 0 > Labels: > marker: 1 strata with value/size (1 (218)) > Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), > 2 (9)) > depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) > celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) > > dm2_view (after setfromoptions): > DM Object: 2 MPI processes > type: plex > DM_0x84000004_0 in 3 dimensions: > Number of 0-cells per rank: 40 46 > Number of 1-cells per rank: 83 95 > Number of 2-cells per rank: 57 64 > Number of 3-cells per rank: 13 14 > Labels: > depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) > marker: 1 strata with value/size (1 (109)) > Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) > celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) > > dm3_view (after setting local section): > DM Object: 2 MPI processes > type: plex > DM_0x84000004_0 in 3 dimensions: > Number of 0-cells per rank: 40 46 > Number of 1-cells per rank: 83 95 > Number of 2-cells per rank: 57 64 > Number of 3-cells per rank: 13 14 > Labels: > depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) > marker: 1 strata with value/size (1 (109)) > Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) > celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) > Field Field_0: > adjacency FEM > > dm4_view (after setting adjacency): > DM Object: 2 MPI processes > type: plex > DM_0x84000004_0 in 3 dimensions: > Number of 0-cells per rank: 40 46 > Number of 1-cells per rank: 83 95 > Number of 2-cells per rank: 57 64 > Number of 3-cells per rank: 13 14 > Labels: > depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) > marker: 1 strata with value/size (1 (109)) > Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) > celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) > Field Field_0: > adjacency FVM++ > > dm5_view (after distribution): > DM Object: Parallel Mesh 2 MPI processes > type: plex > Parallel Mesh in 3 dimensions: > Number of 0-cells per rank: 64 60 > Number of 1-cells per rank: 144 133 > Number of 2-cells per rank: 108 98 > Number of 3-cells per rank: 27 24 > Labels: > depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) > marker: 1 strata with value/size (1 (218)) > Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), > 6 (9)) > celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) > Field Field_0: > adjacency FVM++ > > Thanks, > Guer. > Sent with Proton Mail secure email. > > ------- Original Message ------- > On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: > >> >> Hi, >> Sorry for my late response. I tried with your suggestions and I think I >> made a progress. But I still got issues. Let me explain my latest mesh >> routine: >> >> >> 1. DMPlexCreateBoxMesh >> 2. DMSetFromOptions >> 3. PetscSectionCreate >> 4. PetscSectionSetNumFields >> 5. PetscSectionSetFieldDof >> 6. PetscSectionSetDof >> 7. PetscSectionSetUp >> 8. DMSetLocalSection >> 9. DMSetAdjacency >> 10. DMPlexDistribute >> >> >> It's still not working but it's promising, if I call >> DMPlexGetDepthStratum for cells, I can see that after distribution >> processors have more cells. >> > > Please send the output of DMPlexView() for each incarnation of the mesh. > What I do is put > > DMViewFromOptions(dm, NULL, "-dm1_view") > > > with a different string after each call. > >> But I couldn't figure out how to decide where the ghost/processor >> boundary cells start. >> > > Please send the actual code because the above is not specific enough. For > example, you will not have > "ghost cells" unless you partition with overlap. This is because by > default cells are the partitioned quantity, > so each process gets a unique set. > > Thanks, > > Matt > >> In older mails I saw there is a function DMPlexGetHybridBounds but I >> think that function is deprecated. I tried to use, >> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting -1 >> as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, >> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling >> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. >> I think I can calculate the ghost cell indices using cStart/cEnd before & >> after distribution but I think there is a better way I'm currently missing. >> >> Thanks again, >> Guer. >> >> ------- Original Message ------- >> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley < >> knepley at gmail.com> wrote: >> >> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> Hi, >>> >>> I am currently using DMPlex in my code. It runs serially at the moment, >>> but I'm interested in adding parallel options. Here is my workflow: >>> >>> Create a DMPlex mesh from GMSH. >>> Reorder it with DMPlexPermute. >>> Create necessary pre-processing arrays related to the mesh/problem. >>> Create field(s) with multi-dofs. >>> Create residual vectors. >>> Define a function to calculate the residual for each cell and, use SNES. >>> As you can see, I'm not using FV or FE structures (most examples do). >>> Now, I'm trying to implement this in parallel using a similar approach. >>> However, I'm struggling to understand how to create corresponding vectors >>> and how to obtain index sets for each processor. Is there a tutorial or >>> paper that covers this topic? >>> >> >> The intention was that there is enough information in the manual to do >> this. >> >> Using PetscFE/PetscFV is not required. However, I strongly encourage you >> to use PetscSection. Without this, it would be incredibly hard to do what >> you want. Once the DM has a Section, it can do things like automatically >> create vectors and matrices for you. It can redistribute them, subset them, >> etc. The Section describes how dofs are assigned to pieces of the mesh >> (mesh points). This is in the manual, and there are a few examples that do >> it by hand. >> >> So I suggest changing your code to use PetscSection, and then letting us >> know if things still do not work. >> >> Thanks, >> >> Matt >> >>> Thank you. >>> Guer. >>> >>> Sent with Proton Mail secure email. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1.c Type: application/octet-stream Size: 3039 bytes Desc: not available URL: From junchao.zhang at gmail.com Wed Oct 11 09:14:57 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 11 Oct 2023 09:14:57 -0500 Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Hi, Philip, Could you try this branch jczhang/2023-10-05/feature-support-matshift-aijkokkos ? Thanks. --Junchao Zhang On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip wrote: > Aha! That makes sense. Thank you. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Thursday, October 5, 2023 17:29 > *To:* Fackler, Philip > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie < > sblondel at utk.edu> > *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses > switching to COO interface > > Wait a moment, it seems it was because we do not have a GPU implementation > of MatShift... > Let me see how to add it. > --Junchao Zhang > > > On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang > wrote: > > Hi, Philip, > I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() > instead of the COO interface? MatSetValues() needs to copy the data from > device to host and thus is expensive. > Do you have profiling results with COO enabled? > > [image: Screenshot 2023-10-05 at 10.55.29?AM.png] > > > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: > > Hi, Philip, > I will look into the tarballs and get back to you. > Thanks. > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > We finally have xolotl ported to use the new COO interface and the > aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port > to our previous version (using MatSetValuesStencil and the default Mat and > Vec implementations), we expected to see an improvement in performance for > both the "serial" and "cuda" builds (here I'm referring to the kokkos > configuration). > > Attached are two plots that show timings for three different cases. All of > these were run on Ascent (the Summit-like training system) with 6 MPI tasks > (on a single node). The CUDA cases were given one GPU per task (and used > CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases > we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent > as possible. > > The performance of RHSJacobian (where the bulk of computation happens in > xolotl) behaved basically as expected (better than expected in the serial > build). NE_3 case in CUDA was the only one that performed worse, but not > surprisingly, since its workload for the GPUs is much smaller. We've still > got more optimization to do on this. > > The real surprise was how much worse the overall solve times were. This > seems to be due simply to switching to the kokkos-based implementation. I'm > wondering if there are any changes we can make in configuration or runtime > arguments to help with PETSc's performance here. Any help looking into this > would be appreciated. > > The tarballs linked here > > and here > > are profiling databases which, once extracted, can be viewed with > hpcviewer. I don't know how helpful that will be, but hopefully it can give > you some direction. > > Thanks for your help, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-10-05 at 10.55.29?AM.png Type: image/png Size: 144341 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-10-05 at 10.55.29?AM.png Type: image/png Size: 144341 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Oct 11 09:28:09 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 11 Oct 2023 09:28:09 -0500 (CDT) Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails In-Reply-To: References: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev> <78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov> Message-ID: <9b267e6f-3e92-9492-3851-f2265231bbaa@mcs.anl.gov> The same docs should be available in https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-with-docs-3.20.0.tar.gz Satish On Wed, 11 Oct 2023, Richter, Roland wrote: > Hei, > Thank you very much for the answer! I looked it up, but petsc.org seems to > be a bit unstable here, quite often I can't reach petsc.org. > Regards, > Roland Richter > > -----Urspr?ngliche Nachricht----- > Von: Satish Balay > Gesendet: mandag 9. oktober 2023 17:29 > An: Barry Smith > Cc: Richter, Roland ; petsc-users at mcs.anl.gov > Betreff: Re: [petsc-users] Configuration of PETSc with Intel OneAPI and > Intel MPI fails > > Will note - OneAPI MPI usage is documented at > https://petsc.org/release/install/install/#mpi > > Satish > > On Mon, 9 Oct 2023, Barry Smith wrote: > > > > > Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc) > and > > > > export I_MPI_CC=icx > > export I_MPI_CXX=icpx > > export I_MPI_F90=ifx > > > > > > > On Oct 9, 2023, at 8:32 AM, Richter, Roland > wrote: > > > > > > Hei, > > > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with > Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc > -cc=icx" as C-compiler, as described by > https://stackoverflow.com/a/76362396. Therefore, I adapted the > configure-line as follow: > > > > > > ./configure --prefix=/media/storage/local_opt/petsc > --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc > -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC > -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true > --with-mpi=true > --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ > --with-openmp=true --download-hdf5=yes --download-netcdf=yes > --download-chaco=no --download-metis=yes --download-slepc=yes > --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes > --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes > --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 > --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost > --download-opencascade=yes --with-fftw=1 > --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes > --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 > --download-muparser=no --download-p4est=yes --download-sowing=y > es --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 > --with-cuda=1 > > > > > > The configuration, however, fails with > > > > > > The CMAKE_C_COMPILER: > > > > > > mpiicc -cc=icx > > > > > > is not a full path and was not found in the PATH > > > > > > for all additional modules which use a cmake-based configuration > approach (such as OPENCASCADE). How could I solve that problem? > > > > > > Thank you! > > > Regards, > > > Roland Richter > > > > > > > > From kenneth.c.hall at duke.edu Wed Oct 11 10:27:03 2023 From: kenneth.c.hall at duke.edu (Kenneth C Hall) Date: Wed, 11 Oct 2023 15:27:03 +0000 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es> References: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es> Message-ID: Jose, Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working. Thanks, and best regards, Kenneth From: Jose E. Roman Date: Wednesday, October 11, 2023 at 2:41 AM To: Kenneth C Hall Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) Kenneth, The MatDuplicate issue should be fixed in the following MR https://urldefense.com/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/6912__;!!OToaGQ!p1tu1lzpyqM4wU-3WRzXN9bH3sFnXjyJvwQZh4PQBG5GNgB472qfxKOASyjxsg23AUQGusU-HpzI855ViaFfRCI$ Note that the NLEIGS solver internally uses MatDuplicate for creating multiple copies of the shell matrix, each one with its own value of lambda. Hence your implementation of the shell matrix is not appropriate, since you have a single global lambda within the module. I have attempted to write a Fortran example that duplicates the lambda correctly (see the MR), but does not work yet. Jose > El 6 oct 2023, a las 22:28, Kenneth C Hall escribi?: > > Jose, > > Unfortunately, I was unable to implement the MATOP_DUPLICATE operation in fortran (and I do not know enough c to work in c). Here is the error message I get: > > [0]PETSC ERROR: #1 MatShellSetOperation_Fortran() at /Users/hall/Documents/Fortran_Codes/Packages/petsc/src/mat/impls/shell/ftn-custom/zshellf.c:283 > [0]PETSC ERROR: #2 src/test_nep.f90:62 > > When I look at zshellf.c, MATOP_DUPLICATE is not one of the supported operations. See below. > > Kenneth > > > /** > * Subset of MatOperation that is supported by the Fortran wrappers. > */ > enum FortranMatOperation { > FORTRAN_MATOP_MULT = 0, > FORTRAN_MATOP_MULT_ADD = 1, > FORTRAN_MATOP_MULT_TRANSPOSE = 2, > FORTRAN_MATOP_MULT_TRANSPOSE_ADD = 3, > FORTRAN_MATOP_SOR = 4, > FORTRAN_MATOP_TRANSPOSE = 5, > FORTRAN_MATOP_GET_DIAGONAL = 6, > FORTRAN_MATOP_DIAGONAL_SCALE = 7, > FORTRAN_MATOP_ZERO_ENTRIES = 8, > FORTRAN_MATOP_AXPY = 9, > FORTRAN_MATOP_SHIFT = 10, > FORTRAN_MATOP_DIAGONAL_SET = 11, > FORTRAN_MATOP_DESTROY = 12, > FORTRAN_MATOP_VIEW = 13, > FORTRAN_MATOP_CREATE_VECS = 14, > FORTRAN_MATOP_GET_DIAGONAL_BLOCK = 15, > FORTRAN_MATOP_COPY = 16, > FORTRAN_MATOP_SCALE = 17, > FORTRAN_MATOP_SET_RANDOM = 18, > FORTRAN_MATOP_ASSEMBLY_BEGIN = 19, > FORTRAN_MATOP_ASSEMBLY_END = 20, > FORTRAN_MATOP_SIZE = 21 > }; > > > From: Jose E. Roman > Date: Friday, October 6, 2023 at 7:01 AM > To: Kenneth C Hall > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) > > I am getting an error in a different place than you. I started to debug, but don't have much time at the moment. > Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix? > > Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian(). > > Jose > > > > El 6 oct 2023, a las 0:37, Kenneth C Hall escribi?: > > > > Hi all, > > > > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x. > > > > I am trying to implement in SLEPc/NEP. To get started, I have defined a much smaller and simpler system of the form > > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP. > > > > I have set the problem up using shell matrices (as that is my ultimate goal). The full code is attached, but here is a smaller snippet of code: > > > > !.... Create matrix-free operators for A and B > > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr)) > > PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr)) > > PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr)) > > PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr)) > > > > !.... Create nonlinear eigensolver > > PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr)) > > > > !.... Set the problem type > > PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr)) > > ! > > !.... set the solver type > > PetscCall(NEPSetType(nep, NEPNLEIGS, ierr)) > > ! > > !.... Set functions and Jacobians for NEP > > PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr)) > > PetscCall(NEPSetJacobian(nep, B, MyNEPJacobian, PETSC_NULL_INTEGER, ierr)) > > > > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B. The NEP viewer and error messages are attached. > > > > Any help on getting this problem properly set up would be greatly appreciated. > > > > Kenneth Hall > > ATTACHMENTS: > > test_nep.f90 > > code_output > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Wed Oct 11 10:31:02 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Wed, 11 Oct 2023 15:31:02 +0000 Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: I'm on it. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Wednesday, October 11, 2023 10:14 To: Fackler, Philip Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net ; Blondel, Sophie Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface Hi, Philip, Could you try this branch jczhang/2023-10-05/feature-support-matshift-aijkokkos ? Thanks. --Junchao Zhang On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip > wrote: Aha! That makes sense. Thank you. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Thursday, October 5, 2023 17:29 To: Fackler, Philip > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net >; Blondel, Sophie > Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface Wait a moment, it seems it was because we do not have a GPU implementation of MatShift... Let me see how to add it. --Junchao Zhang On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang > wrote: Hi, Philip, I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface? MatSetValues() needs to copy the data from device to host and thus is expensive. Do you have profiling results with COO enabled? [Screenshot 2023-10-05 at 10.55.29?AM.png] --Junchao Zhang On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: Hi, Philip, I will look into the tarballs and get back to you. Thanks. --Junchao Zhang On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users > wrote: We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration). Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible. The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this. The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated. The tarballs linked here and here are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction. Thanks for your help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Oct 11 12:03:12 2023 From: jed at jedbrown.org (Jed Brown) Date: Wed, 11 Oct 2023 11:03:12 -0600 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: Message-ID: <87ttqx148f.fsf@jedbrown.org> I don't see an attachment, but his thesis used conservative variables and defined an effective length scale in a way that seemed to assume constant shape function gradients. I'm not aware of systematic literature comparing the covariant and contravariant length measures on anisotropic meshes, but I believe most people working in the Shakib/Hughes approach use the covariant measure. Our docs have a brief discussion of this choice. https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet Matt, I don't understand how the second derivative comes into play as a length measure on anistropic meshes -- the second derivatives can be uniformly zero and yet you still need a length measure. Brandon Denton via petsc-users writes: > I was thinking about trying to implement Ben Kirk's approach to Navier-Stokes (see attached paper; Section 5). His approach uses these quantities to align the orientation of the unstructured element/cell with the fluid velocity to apply the stabilization/upwinding and to detect shocks. > > If you have an example of the approach you mentioned, could you please send it over so I can review it? > > On Oct 11, 2023 6:02 AM, Matthew Knepley wrote: > On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users > wrote: > Good Evening, > > I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information. > > Are you sure you need those? Darsh and I implemented SUPG without that. You would need local second derivative information, which you can get using -dm_ds_jet_degree 2. If you check in an example, I can go over it. > > Thanks, > > Matt > > Thank you in advance for your time. > Brandon > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Wed Oct 11 12:33:54 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Oct 2023 13:33:54 -0400 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: <87ttqx148f.fsf@jedbrown.org> References: <87ttqx148f.fsf@jedbrown.org> Message-ID: On Wed, Oct 11, 2023 at 1:03?PM Jed Brown wrote: > I don't see an attachment, but his thesis used conservative variables and > defined an effective length scale in a way that seemed to assume constant > shape function gradients. I'm not aware of systematic literature comparing > the covariant and contravariant length measures on anisotropic meshes, but > I believe most people working in the Shakib/Hughes approach use the > covariant measure. Our docs have a brief discussion of this choice. > > https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet > > Matt, I don't understand how the second derivative comes into play as a > length measure on anistropic meshes -- the second derivatives can be > uniformly zero and yet you still need a length measure. > I was talking about the usual SUPG where we just penalize the true residual. Matt > Brandon Denton via petsc-users writes: > > > I was thinking about trying to implement Ben Kirk's approach to > Navier-Stokes (see attached paper; Section 5). His approach uses these > quantities to align the orientation of the unstructured element/cell with > the fluid velocity to apply the stabilization/upwinding and to detect > shocks. > > > > If you have an example of the approach you mentioned, could you please > send it over so I can review it? > > > > On Oct 11, 2023 6:02 AM, Matthew Knepley wrote: > > On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Good Evening, > > > > I am looking to implement a form of Navier-Stokes with SUPG > Stabilization and shock capturing using PETSc's FEM infrastructure. In this > implementation, I need access to the cell's shape function gradients and > natural coordinate gradients for calculations within the point-wise > residual calculations. How do I get these quantities at the quadrature > points? The signatures for fo and f1 don't seem to contain this information. > > > > Are you sure you need those? Darsh and I implemented SUPG without that. > You would need local second derivative information, which you can get using > -dm_ds_jet_degree 2. If you check in an example, I can go over it. > > > > Thanks, > > > > Matt > > > > Thank you in advance for your time. > > Brandon > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/< > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Oct 11 12:38:17 2023 From: jed at jedbrown.org (Jed Brown) Date: Wed, 11 Oct 2023 11:38:17 -0600 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: <87ttqx148f.fsf@jedbrown.org> Message-ID: <87o7h512ly.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown wrote: > >> I don't see an attachment, but his thesis used conservative variables and >> defined an effective length scale in a way that seemed to assume constant >> shape function gradients. I'm not aware of systematic literature comparing >> the covariant and contravariant length measures on anisotropic meshes, but >> I believe most people working in the Shakib/Hughes approach use the >> covariant measure. Our docs have a brief discussion of this choice. >> >> https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet >> >> Matt, I don't understand how the second derivative comes into play as a >> length measure on anistropic meshes -- the second derivatives can be >> uniformly zero and yet you still need a length measure. >> > > I was talking about the usual SUPG where we just penalize the true residual. I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor). From bldenton at buffalo.edu Wed Oct 11 13:09:01 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Wed, 11 Oct 2023 18:09:01 +0000 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: <87o7h512ly.fsf@jedbrown.org> References: <87ttqx148f.fsf@jedbrown.org> <87o7h512ly.fsf@jedbrown.org> Message-ID: Thank you for the discussion. Are we agreed then that the derivatives of the natural coordinates are required for the described approach? If so, is this something PETSc can currently do within the point-wise residual functions? Matt - Thank you for the command line option for the 2nd derivatives. Those will be needed to implement the discussed approach. Specifically in the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). What is a good reference for the usual SUPG method you are referencing? I've been looking through my textbooks but haven't found a good reference. Jed - Thank you for the link. I will review the information on it. Sorry about the attachment. I will upload it to this thread later (I'm at work right now and I can't do it from here). ________________________________ From: Jed Brown Sent: Wednesday, October 11, 2023 1:38 PM To: Matthew Knepley Cc: Brandon Denton ; petsc-users Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization Matthew Knepley writes: > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown wrote: > >> I don't see an attachment, but his thesis used conservative variables and >> defined an effective length scale in a way that seemed to assume constant >> shape function gradients. I'm not aware of systematic literature comparing >> the covariant and contravariant length measures on anisotropic meshes, but >> I believe most people working in the Shakib/Hughes approach use the >> covariant measure. Our docs have a brief discussion of this choice. >> >> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0 >> >> Matt, I don't understand how the second derivative comes into play as a >> length measure on anistropic meshes -- the second derivatives can be >> uniformly zero and yet you still need a length measure. >> > > I was talking about the usual SUPG where we just penalize the true residual. I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor). -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 11 14:13:40 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Oct 2023 15:13:40 -0400 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: <87ttqx148f.fsf@jedbrown.org> <87o7h512ly.fsf@jedbrown.org> Message-ID: On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton wrote: > Thank you for the discussion. > > Are we agreed then that the derivatives of the natural coordinates are > required for the described approach? If so, is this something PETSc can > currently do within the point-wise residual functions? > I am not sure what natural coordinates are. Do we just mean the Jacobian, derivatives of the map between reference and real coordinates? If so, yes the Jacobian is available. Right now I do not pass it directly, but passing it is easy. Thanks, Matt > Matt - Thank you for the command line option for the 2nd derivatives. > Those will be needed to implement the discussed approach. Specifically in > the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). > What is a good reference for the usual SUPG method you are referencing? > I've been looking through my textbooks but haven't found a good reference. > > Jed - Thank you for the link. I will review the information on it. > > Sorry about the attachment. I will upload it to this thread later (I'm at > work right now and I can't do it from here). > ------------------------------ > *From:* Jed Brown > *Sent:* Wednesday, October 11, 2023 1:38 PM > *To:* Matthew Knepley > *Cc:* Brandon Denton ; petsc-users < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] FEM Implementation of NS with SUPG > Stabilization > > Matthew Knepley writes: > > > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown wrote: > > > >> I don't see an attachment, but his thesis used conservative variables > and > >> defined an effective length scale in a way that seemed to assume > constant > >> shape function gradients. I'm not aware of systematic literature > comparing > >> the covariant and contravariant length measures on anisotropic meshes, > but > >> I believe most people working in the Shakib/Hughes approach use the > >> covariant measure. Our docs have a brief discussion of this choice. > >> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0 > > >> > >> Matt, I don't understand how the second derivative comes into play as a > >> length measure on anistropic meshes -- the second derivatives can be > >> uniformly zero and yet you still need a length measure. > >> > > > > I was talking about the usual SUPG where we just penalize the true > residual. > > I think you're focused on computing the strong diffusive flux (which can > be done using second derivatives or by a projection; the latter produces > somewhat better results). But you still need a length scale and that's most > naturally computed using the derivative of reference coordinates with > respect to physical (or equivalently, the associated metric tensor). > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bldenton at buffalo.edu Wed Oct 11 15:14:16 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Wed, 11 Oct 2023 20:14:16 +0000 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: <87ttqx148f.fsf@jedbrown.org> <87o7h512ly.fsf@jedbrown.org> Message-ID: By natural coordinates, I am referring to the reference element coordinates. Usually these are represented as (xi, eta, zeta) in the literature. Yes. I would like to have the Jacobian and the derivatives of the map available within PetscDSSetResidual() f0 and f1 functions. I believe DMPlexComputeCellGeometryFEM() function provides this information. Is there a way to get the cell, shape functions as well? It not, can we talk about this more? I would like to understand how the shape functions are addressed within PETSc. Dr. Kirk's approach uses the shape function gradients in its SUPG parameter. I'd love to talk with you about this is more detail. ________________________________ From: Matthew Knepley Sent: Wednesday, October 11, 2023 3:13 PM To: Brandon Denton Cc: Jed Brown ; petsc-users Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton > wrote: Thank you for the discussion. Are we agreed then that the derivatives of the natural coordinates are required for the described approach? If so, is this something PETSc can currently do within the point-wise residual functions? I am not sure what natural coordinates are. Do we just mean the Jacobian, derivatives of the map between reference and real coordinates? If so, yes the Jacobian is available. Right now I do not pass it directly, but passing it is easy. Thanks, Matt Matt - Thank you for the command line option for the 2nd derivatives. Those will be needed to implement the discussed approach. Specifically in the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). What is a good reference for the usual SUPG method you are referencing? I've been looking through my textbooks but haven't found a good reference. Jed - Thank you for the link. I will review the information on it. Sorry about the attachment. I will upload it to this thread later (I'm at work right now and I can't do it from here). ________________________________ From: Jed Brown > Sent: Wednesday, October 11, 2023 1:38 PM To: Matthew Knepley > Cc: Brandon Denton >; petsc-users > Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization Matthew Knepley > writes: > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown > wrote: > >> I don't see an attachment, but his thesis used conservative variables and >> defined an effective length scale in a way that seemed to assume constant >> shape function gradients. I'm not aware of systematic literature comparing >> the covariant and contravariant length measures on anisotropic meshes, but >> I believe most people working in the Shakib/Hughes approach use the >> covariant measure. Our docs have a brief discussion of this choice. >> >> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0 >> >> Matt, I don't understand how the second derivative comes into play as a >> length measure on anistropic meshes -- the second derivatives can be >> uniformly zero and yet you still need a length measure. >> > > I was talking about the usual SUPG where we just penalize the true residual. I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor). -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From erdemguer at proton.me Wed Oct 11 16:59:20 2023 From: erdemguer at proton.me (erdemguer) Date: Wed, 11 Oct 2023 21:59:20 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: Message-ID: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Thank you! That's exactly what I need. Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley wrote: > On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: > >> Hi again, > > I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached. > > Thanks, > > Matt > >> Here is my code: >> #include >> static char help[] = "dmplex"; >> >> int main(int argc, char **argv) >> { >> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >> DM dm, dm_dist; >> PetscSection section; >> PetscInt cStart, cEndInterior, cEnd, rank; >> PetscInt nc[3] = {3, 3, 3}; >> PetscReal upper[3] = {1, 1, 1}; >> >> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >> >> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm); >> DMViewFromOptions(dm, NULL, "-dm1_view"); >> PetscCall(DMSetFromOptions(dm)); >> DMViewFromOptions(dm, NULL, "-dm2_view"); >> >> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >> DMPlexComputeCellTypes(dm); >> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL)); >> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >> cEndInterior, cEnd); >> >> PetscInt nField = 1, nDof = 3, field = 0; >> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >> PetscSectionSetNumFields(section, nField); >> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >> for (PetscInt p = cStart; p < cEnd; p++) >> { >> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >> PetscCall(PetscSectionSetDof(section, p, nDof)); >> } >> >> PetscCall(PetscSectionSetUp(section)); >> >> DMSetLocalSection(dm, section); >> DMViewFromOptions(dm, NULL, "-dm3_view"); >> >> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >> DMViewFromOptions(dm, NULL, "-dm4_view"); >> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >> if (dm_dist) >> { >> DMDestroy(&dm); >> dm = dm_dist; >> } >> DMViewFromOptions(dm, NULL, "-dm5_view"); >> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >> DMPlexComputeCellTypes(dm); >> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL)); >> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >> cEndInterior, cEnd); >> >> DMDestroy(&dm); >> PetscCall(PetscFinalize());} >> >> This codes output is currently (on 2 processors) is: >> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >> >> DMView outputs: >> dm1_view (after creation): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 64 0 >> Number of 1-cells per rank: 144 0 >> Number of 2-cells per rank: 108 0 >> Number of 3-cells per rank: 27 0 >> Labels: >> marker: 1 strata with value/size (1 (218)) >> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9)) >> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >> >> dm2_view (after setfromoptions): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 40 46 >> Number of 1-cells per rank: 83 95 >> Number of 2-cells per rank: 57 64 >> Number of 3-cells per rank: 13 14 >> Labels: >> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >> marker: 1 strata with value/size (1 (109)) >> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >> >> dm3_view (after setting local section): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 40 46 >> Number of 1-cells per rank: 83 95 >> Number of 2-cells per rank: 57 64 >> Number of 3-cells per rank: 13 14 >> Labels: >> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >> marker: 1 strata with value/size (1 (109)) >> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >> Field Field_0: adjacency FEM >> >> dm4_view (after setting adjacency): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 40 46 >> Number of 1-cells per rank: 83 95 >> Number of 2-cells per rank: 57 64 >> Number of 3-cells per rank: 13 14 >> Labels: >> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >> marker: 1 strata with value/size (1 (109)) >> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >> Field Field_0: adjacency FVM++ >> >> dm5_view (after distribution): >> DM Object: Parallel Mesh 2 MPI processes >> type: plex >> Parallel Mesh in 3 dimensions: >> Number of 0-cells per rank: 64 60 >> Number of 1-cells per rank: 144 133 >> Number of 2-cells per rank: 108 98 >> Number of 3-cells per rank: 27 24 >> Labels: >> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >> marker: 1 strata with value/size (1 (218)) >> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9)) >> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >> Field Field_0: adjacency FVM++ >> >> Thanks, >> Guer. >> >> Sent with [Proton Mail](https://proton.me/) secure email. >> >> ------- Original Message ------- >> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley wrote: >> >>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>> >>>> Hi, >>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: >>>> >>>> - DMPlexCreateBoxMesh >>>> >>>> - DMSetFromOptions >>>> - PetscSectionCreate >>>> - PetscSectionSetNumFields >>>> - PetscSectionSetFieldDof >>>> >>>> - PetscSectionSetDof >>>> >>>> - PetscSectionSetUp >>>> - DMSetLocalSection >>>> - DMSetAdjacency >>>> - DMPlexDistribute >>>> >>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. >>> >>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put >>> >>> DMViewFromOptions(dm, NULL, "-dm1_view") >>> >>> with a different string after each call. >>> >>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start. >>> >>> Please send the actual code because the above is not specific enough. For example, you will not have >>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, >>> so each process gets a unique set. >>> >>> Thanks, >>> >>> Matt >>> >>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. >>>> >>>> Thanks again, >>>> Guer. >>>> >>>> ------- Original Message ------- >>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: >>>> >>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>> >>>>>> Create a DMPlex mesh from GMSH. >>>>>> Reorder it with DMPlexPermute. >>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>> Create field(s) with multi-dofs. >>>>>> Create residual vectors. >>>>>> Define a function to calculate the residual for each cell and, use SNES. >>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? >>>>> >>>>> The intention was that there is enough information in the manual to do this. >>>>> >>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. >>>>> >>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> Thank you. >>>>>> Guer. >>>>>> >>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>> >>>>> -- >>>>> >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>> >>> -- >>> >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 11 19:07:32 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Oct 2023 20:07:32 -0400 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: <87ttqx148f.fsf@jedbrown.org> <87o7h512ly.fsf@jedbrown.org> Message-ID: On Wed, Oct 11, 2023 at 4:15?PM Brandon Denton wrote: > By natural coordinates, I am referring to the reference element > coordinates. Usually these are represented as (xi, eta, zeta) in the > literature. > > Yes. I would like to have the Jacobian and the derivatives of the map > available within PetscDSSetResidual() f0 and f1 functions. > Yes, we can get these passed an aux data. > I believe DMPlexComputeCellGeometryFEM() function provides this > information. Is there a way to get the cell, shape functions as well? It > not, can we talk about this more? I would like to understand how the shape > functions are addressed within PETSc. Dr. Kirk's approach uses the shape > function gradients in its SUPG parameter. I'd love to talk with you about > this is more detail. > There should be a way to formulate this in a basis independent way. I would much prefer that to explicit inclusion of the basis. Thanks, Matt > *From:* Matthew Knepley > *Sent:* Wednesday, October 11, 2023 3:13 PM > *To:* Brandon Denton > *Cc:* Jed Brown ; petsc-users > *Subject:* Re: [petsc-users] FEM Implementation of NS with SUPG > Stabilization > > On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton > wrote: > > Thank you for the discussion. > > Are we agreed then that the derivatives of the natural coordinates are > required for the described approach? If so, is this something PETSc can > currently do within the point-wise residual functions? > > > I am not sure what natural coordinates are. Do we just mean the Jacobian, > derivatives of the map between reference and real coordinates? If so, yes > the Jacobian is available. Right now I do not pass it > directly, but passing it is easy. > > Thanks, > > Matt > > > Matt - Thank you for the command line option for the 2nd derivatives. > Those will be needed to implement the discussed approach. Specifically in > the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). > What is a good reference for the usual SUPG method you are referencing? > I've been looking through my textbooks but haven't found a good reference. > > Jed - Thank you for the link. I will review the information on it. > > Sorry about the attachment. I will upload it to this thread later (I'm at > work right now and I can't do it from here). > ------------------------------ > *From:* Jed Brown > *Sent:* Wednesday, October 11, 2023 1:38 PM > *To:* Matthew Knepley > *Cc:* Brandon Denton ; petsc-users < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] FEM Implementation of NS with SUPG > Stabilization > > Matthew Knepley writes: > > > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown wrote: > > > >> I don't see an attachment, but his thesis used conservative variables > and > >> defined an effective length scale in a way that seemed to assume > constant > >> shape function gradients. I'm not aware of systematic literature > comparing > >> the covariant and contravariant length measures on anisotropic meshes, > but > >> I believe most people working in the Shakib/Hughes approach use the > >> covariant measure. Our docs have a brief discussion of this choice. > >> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0 > > >> > >> Matt, I don't understand how the second derivative comes into play as a > >> length measure on anistropic meshes -- the second derivatives can be > >> uniformly zero and yet you still need a length measure. > >> > > > > I was talking about the usual SUPG where we just penalize the true > residual. > > I think you're focused on computing the strong diffusive flux (which can > be done using second derivatives or by a projection; the latter produces > somewhat better results). But you still need a length scale and that's most > naturally computed using the derivative of reference coordinates with > respect to physical (or equivalently, the associated metric tensor). > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bldenton at buffalo.edu Wed Oct 11 22:44:10 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Thu, 12 Oct 2023 03:44:10 +0000 Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization In-Reply-To: References: <87ttqx148f.fsf@jedbrown.org> <87o7h512ly.fsf@jedbrown.org> Message-ID: How exactly does the aux data work? What is typically available there? Is it something the user can populate? ________________________________ From: Matthew Knepley Sent: Wednesday, October 11, 2023 8:07 PM To: Brandon Denton Cc: Jed Brown ; petsc-users Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization On Wed, Oct 11, 2023 at 4:15?PM Brandon Denton > wrote: By natural coordinates, I am referring to the reference element coordinates. Usually these are represented as (xi, eta, zeta) in the literature. Yes. I would like to have the Jacobian and the derivatives of the map available within PetscDSSetResidual() f0 and f1 functions. Yes, we can get these passed an aux data. I believe DMPlexComputeCellGeometryFEM() function provides this information. Is there a way to get the cell, shape functions as well? It not, can we talk about this more? I would like to understand how the shape functions are addressed within PETSc. Dr. Kirk's approach uses the shape function gradients in its SUPG parameter. I'd love to talk with you about this is more detail. There should be a way to formulate this in a basis independent way. I would much prefer that to explicit inclusion of the basis. Thanks, Matt From: Matthew Knepley > Sent: Wednesday, October 11, 2023 3:13 PM To: Brandon Denton > Cc: Jed Brown >; petsc-users > Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton > wrote: Thank you for the discussion. Are we agreed then that the derivatives of the natural coordinates are required for the described approach? If so, is this something PETSc can currently do within the point-wise residual functions? I am not sure what natural coordinates are. Do we just mean the Jacobian, derivatives of the map between reference and real coordinates? If so, yes the Jacobian is available. Right now I do not pass it directly, but passing it is easy. Thanks, Matt Matt - Thank you for the command line option for the 2nd derivatives. Those will be needed to implement the discussed approach. Specifically in the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). What is a good reference for the usual SUPG method you are referencing? I've been looking through my textbooks but haven't found a good reference. Jed - Thank you for the link. I will review the information on it. Sorry about the attachment. I will upload it to this thread later (I'm at work right now and I can't do it from here). ________________________________ From: Jed Brown > Sent: Wednesday, October 11, 2023 1:38 PM To: Matthew Knepley > Cc: Brandon Denton >; petsc-users > Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization Matthew Knepley > writes: > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown > wrote: > >> I don't see an attachment, but his thesis used conservative variables and >> defined an effective length scale in a way that seemed to assume constant >> shape function gradients. I'm not aware of systematic literature comparing >> the covariant and contravariant length measures on anisotropic meshes, but >> I believe most people working in the Shakib/Hughes approach use the >> covariant measure. Our docs have a brief discussion of this choice. >> >> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0 >> >> Matt, I don't understand how the second derivative comes into play as a >> length measure on anistropic meshes -- the second derivatives can be >> uniformly zero and yet you still need a length measure. >> > > I was talking about the usual SUPG where we just penalize the true residual. I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor). -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Oct 12 13:12:08 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 12 Oct 2023 20:12:08 +0200 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: References: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es> Message-ID: <442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es> I am attaching your example modified with the context stuff. With the PETSc branch that I indicated, now it works with NLEIGS, for instance: $ ./test_nep -nep_nleigs_ksp_type gmres -nep_nleigs_pc_type none -rg_interval_endpoints 0.2,1.1 -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_view -nep_error_relative ::ascii_info_detail And also other solvers such as SLP: $ ./test_nep -nep_type slp -nep_slp_ksp_type gmres -nep_slp_pc_type none -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_error_relative ::ascii_info_detail I will clean the example code an add it as a SLEPc example. Regards, Jose > El 11 oct 2023, a las 17:27, Kenneth C Hall escribi?: > > Jose, > > Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working. > > Thanks, and best regards, > Kenneth > -------------- next part -------------- A non-text attachment was scrubbed... Name: test_nep.F90 Type: application/octet-stream Size: 8471 bytes Desc: not available URL: From kenneth.c.hall at duke.edu Thu Oct 12 13:59:34 2023 From: kenneth.c.hall at duke.edu (Kenneth C Hall) Date: Thu, 12 Oct 2023 18:59:34 +0000 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: <442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es> References: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es> <442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es> Message-ID: Jose, Thanks very much for this. I will give it a try and let you know how it works. Best regards, Kenneth From: Jose E. Roman Date: Thursday, October 12, 2023 at 2:12 PM To: Kenneth C Hall Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) I am attaching your example modified with the context stuff. With the PETSc branch that I indicated, now it works with NLEIGS, for instance: $ ./test_nep -nep_nleigs_ksp_type gmres -nep_nleigs_pc_type none -rg_interval_endpoints 0.2,1.1 -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_view -nep_error_relative ::ascii_info_detail And also other solvers such as SLP: $ ./test_nep -nep_type slp -nep_slp_ksp_type gmres -nep_slp_pc_type none -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_error_relative ::ascii_info_detail I will clean the example code an add it as a SLEPc example. Regards, Jose > El 11 oct 2023, a las 17:27, Kenneth C Hall escribi?: > > Jose, > > Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working. > > Thanks, and best regards, > Kenneth > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erdemguer at proton.me Fri Oct 13 06:26:39 2023 From: erdemguer at proton.me (erdemguer) Date: Fri, 13 Oct 2023 11:26:39 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: Hi, unfortunately it's me again. I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like: Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 [1] ghost cell 14 [1] ghost cell 15 [1] ghost cell 16 [1] ghost cell 17 [1] ghost cell 18 [1] ghost cell 19 [1] ghost cell 20 [1] ghost cell 21 [1] ghost cell 22 After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 [0] ghost cell 13 [0] ghost cell 14 [0] ghost cell 15 [0] ghost cell 16 [0] ghost cell 17 [0] ghost cell 18 [0] ghost cell 19 [0] ghost cell 20 [0] ghost cell 21 [0] ghost cell 22 [0] ghost cell 23 After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 Fatal error in internal_Waitall: Unknown error class, error stack: internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed MPIR_Waitall(1099)........................: MPIR_Waitall_impl(1011)...................: MPIR_Waitall_state(976)...................: MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(411): ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address) Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working. If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1 Version: Petsc Release Version 3.20.0 Thank you, Guer Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Thursday, October 12th, 2023 at 12:59 AM, erdemguer wrote: > Thank you! That's exactly what I need. > > Sent with [Proton Mail](https://proton.me/) secure email. > > ------- Original Message ------- > On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley wrote: > >> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >> >>> Hi again, >> >> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached. >> >> Thanks, >> >> Matt >> >>> Here is my code: >>> #include >>> static char help[] = "dmplex"; >>> >>> int main(int argc, char **argv) >>> { >>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>> DM dm, dm_dist; >>> PetscSection section; >>> PetscInt cStart, cEndInterior, cEnd, rank; >>> PetscInt nc[3] = {3, 3, 3}; >>> PetscReal upper[3] = {1, 1, 1}; >>> >>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>> >>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm); >>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>> PetscCall(DMSetFromOptions(dm)); >>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>> >>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>> DMPlexComputeCellTypes(dm); >>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL)); >>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>> cEndInterior, cEnd); >>> >>> PetscInt nField = 1, nDof = 3, field = 0; >>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>> PetscSectionSetNumFields(section, nField); >>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>> for (PetscInt p = cStart; p < cEnd; p++) >>> { >>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>> } >>> >>> PetscCall(PetscSectionSetUp(section)); >>> >>> DMSetLocalSection(dm, section); >>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>> >>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>> if (dm_dist) >>> { >>> DMDestroy(&dm); >>> dm = dm_dist; >>> } >>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>> DMPlexComputeCellTypes(dm); >>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL)); >>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>> cEndInterior, cEnd); >>> >>> DMDestroy(&dm); >>> PetscCall(PetscFinalize());} >>> >>> This codes output is currently (on 2 processors) is: >>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>> >>> DMView outputs: >>> dm1_view (after creation): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 64 0 >>> Number of 1-cells per rank: 144 0 >>> Number of 2-cells per rank: 108 0 >>> Number of 3-cells per rank: 27 0 >>> Labels: >>> marker: 1 strata with value/size (1 (218)) >>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9)) >>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>> >>> dm2_view (after setfromoptions): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 40 46 >>> Number of 1-cells per rank: 83 95 >>> Number of 2-cells per rank: 57 64 >>> Number of 3-cells per rank: 13 14 >>> Labels: >>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>> marker: 1 strata with value/size (1 (109)) >>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>> >>> dm3_view (after setting local section): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 40 46 >>> Number of 1-cells per rank: 83 95 >>> Number of 2-cells per rank: 57 64 >>> Number of 3-cells per rank: 13 14 >>> Labels: >>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>> marker: 1 strata with value/size (1 (109)) >>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>> Field Field_0: adjacency FEM >>> >>> dm4_view (after setting adjacency): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 40 46 >>> Number of 1-cells per rank: 83 95 >>> Number of 2-cells per rank: 57 64 >>> Number of 3-cells per rank: 13 14 >>> Labels: >>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>> marker: 1 strata with value/size (1 (109)) >>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>> Field Field_0: adjacency FVM++ >>> >>> dm5_view (after distribution): >>> DM Object: Parallel Mesh 2 MPI processes >>> type: plex >>> Parallel Mesh in 3 dimensions: >>> Number of 0-cells per rank: 64 60 >>> Number of 1-cells per rank: 144 133 >>> Number of 2-cells per rank: 108 98 >>> Number of 3-cells per rank: 27 24 >>> Labels: >>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>> marker: 1 strata with value/size (1 (218)) >>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9)) >>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>> Field Field_0: adjacency FVM++ >>> >>> Thanks, >>> Guer. >>> >>> Sent with [Proton Mail](https://proton.me/) secure email. >>> >>> ------- Original Message ------- >>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley wrote: >>> >>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>>> >>>>> Hi, >>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: >>>>> >>>>> - DMPlexCreateBoxMesh >>>>> >>>>> - DMSetFromOptions >>>>> - PetscSectionCreate >>>>> - PetscSectionSetNumFields >>>>> - PetscSectionSetFieldDof >>>>> >>>>> - PetscSectionSetDof >>>>> >>>>> - PetscSectionSetUp >>>>> - DMSetLocalSection >>>>> - DMSetAdjacency >>>>> - DMPlexDistribute >>>>> >>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. >>>> >>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put >>>> >>>> DMViewFromOptions(dm, NULL, "-dm1_view") >>>> >>>> with a different string after each call. >>>> >>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start. >>>> >>>> Please send the actual code because the above is not specific enough. For example, you will not have >>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, >>>> so each process gets a unique set. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. >>>>> >>>>> Thanks again, >>>>> Guer. >>>>> >>>>> ------- Original Message ------- >>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: >>>>> >>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>>> >>>>>>> Create a DMPlex mesh from GMSH. >>>>>>> Reorder it with DMPlexPermute. >>>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>>> Create field(s) with multi-dofs. >>>>>>> Create residual vectors. >>>>>>> Define a function to calculate the residual for each cell and, use SNES. >>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? >>>>>> >>>>>> The intention was that there is enough information in the manual to do this. >>>>>> >>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. >>>>>> >>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> Thank you. >>>>>>> Guer. >>>>>>> >>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>> >>>>>> -- >>>>>> >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>> >>>> -- >>>> >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >> >> -- >> >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ex1.c URL: From knepley at gmail.com Fri Oct 13 07:00:01 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Oct 2023 08:00:01 -0400 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: > Hi, unfortunately it's me again. > > I have some weird troubles with creating matrix with DMPlex. Actually I > might not need to create matrix explicitly, but SNESSolve crashes at there > too. So, I updated the code you provided. When I tried to use > DMCreateMatrix() at first, I got an error "Unknown discretization type > for field 0" at first I applied DMSetLocalSection() and this error is gone. > But this time when I run the code with multiple processors, sometimes I got > an output like: > Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this. I have fixed it and attached. Thanks, Matt Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 > Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 > [1] ghost cell 14 > [1] ghost cell 15 > [1] ghost cell 16 > [1] ghost cell 17 > [1] ghost cell 18 > [1] ghost cell 19 > [1] ghost cell 20 > [1] ghost cell 21 > [1] ghost cell 22 > After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 > [0] ghost cell 13 > [0] ghost cell 14 > [0] ghost cell 15 > [0] ghost cell 16 > [0] ghost cell 17 > [0] ghost cell 18 > [0] ghost cell 19 > [0] ghost cell 20 > [0] ghost cell 21 > [0] ghost cell 22 > [0] ghost cell 23 > After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 > Fatal error in internal_Waitall: Unknown error class, error stack: > internal_Waitall(82)......................: MPI_Waitall(count=1, > array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed > MPIR_Waitall(1099)........................: > MPIR_Waitall_impl(1011)...................: > MPIR_Waitall_state(976)...................: > MPIDI_CH3i_Progress_wait(187).............: an error occurred while > handling an event returned by MPIDI_CH3I_Sock_Wait() > MPIDI_CH3I_Progress_handle_sock_event(411): > ReadMoreData(744).........................: ch3|sock|immedread > 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880 > MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains > invalid memory (set=0,sock=1,errno=14:Bad address) > > Sometimes the error message isn't appearing but for example I'm trying to > print size of the matrix but it isn't working. > If necessary, my Configure options --download-mpich --download-hwloc > --download-pastix --download-hypre --download-ml --download-ctetgen > --download-triangle --download-exodusii --download-netcdf --download-zlib > --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 > --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g > -O2" --with-debugging=1 > > Version: Petsc Release Version 3.20.0 > > Thank you, > Guer > > Sent with Proton Mail secure email. > > ------- Original Message ------- > On Thursday, October 12th, 2023 at 12:59 AM, erdemguer < > erdemguer at proton.me> wrote: > > Thank you! That's exactly what I need. > > Sent with Proton Mail secure email. > > ------- Original Message ------- > On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: > >> Hi again, >> > > I see the problem. FV ghosts mean extra boundary cells added in FV methods > using DMPlexCreateGhostCells() in order to impose boundary conditions. They > are not the "ghost" cells for overlapping parallel decompositions. I have > changed your code to give you what you want. It is attached. > > Thanks, > > Matt > >> Here is my code: >> #include >> static char help[] = "dmplex"; >> >> int main(int argc, char **argv) >> { >> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >> DM dm, dm_dist; >> PetscSection section; >> PetscInt cStart, cEndInterior, cEnd, rank; >> PetscInt nc[3] = {3, 3, 3}; >> PetscReal upper[3] = {1, 1, 1}; >> >> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >> >> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, >> NULL, PETSC_TRUE, &dm); >> DMViewFromOptions(dm, NULL, "-dm1_view"); >> PetscCall(DMSetFromOptions(dm)); >> DMViewFromOptions(dm, NULL, "-dm2_view"); >> >> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >> DMPlexComputeCellTypes(dm); >> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, >> &cEndInterior, NULL)); >> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, >> cEndInterior: %d, cEnd: %d\n", rank, cStart, >> cEndInterior, cEnd); >> >> PetscInt nField = 1, nDof = 3, field = 0; >> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >> PetscSectionSetNumFields(section, nField); >> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >> for (PetscInt p = cStart; p < cEnd; p++) >> { >> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >> PetscCall(PetscSectionSetDof(section, p, nDof)); >> } >> >> PetscCall(PetscSectionSetUp(section)); >> >> DMSetLocalSection(dm, section); >> DMViewFromOptions(dm, NULL, "-dm3_view"); >> >> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >> DMViewFromOptions(dm, NULL, "-dm4_view"); >> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >> if (dm_dist) >> { >> DMDestroy(&dm); >> dm = dm_dist; >> } >> DMViewFromOptions(dm, NULL, "-dm5_view"); >> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >> DMPlexComputeCellTypes(dm); >> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, >> &cEndInterior, NULL)); >> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, >> cEndInterior: %d, cEnd: %d\n", rank, cStart, >> cEndInterior, cEnd); >> >> DMDestroy(&dm); >> PetscCall(PetscFinalize()); >> } >> >> This codes output is currently (on 2 processors) is: >> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27 >> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >> >> DMView outputs: >> dm1_view (after creation): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 64 0 >> Number of 1-cells per rank: 144 0 >> Number of 2-cells per rank: 108 0 >> Number of 3-cells per rank: 27 0 >> Labels: >> marker: 1 strata with value/size (1 (218)) >> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 >> (9)) >> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >> >> dm2_view (after setfromoptions): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 40 46 >> Number of 1-cells per rank: 83 95 >> Number of 2-cells per rank: 57 64 >> Number of 3-cells per rank: 13 14 >> Labels: >> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >> marker: 1 strata with value/size (1 (109)) >> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >> >> dm3_view (after setting local section): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 40 46 >> Number of 1-cells per rank: 83 95 >> Number of 2-cells per rank: 57 64 >> Number of 3-cells per rank: 13 14 >> Labels: >> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >> marker: 1 strata with value/size (1 (109)) >> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >> Field Field_0: >> adjacency FEM >> >> dm4_view (after setting adjacency): >> DM Object: 2 MPI processes >> type: plex >> DM_0x84000004_0 in 3 dimensions: >> Number of 0-cells per rank: 40 46 >> Number of 1-cells per rank: 83 95 >> Number of 2-cells per rank: 57 64 >> Number of 3-cells per rank: 13 14 >> Labels: >> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >> marker: 1 strata with value/size (1 (109)) >> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >> Field Field_0: >> adjacency FVM++ >> >> dm5_view (after distribution): >> DM Object: Parallel Mesh 2 MPI processes >> type: plex >> Parallel Mesh in 3 dimensions: >> Number of 0-cells per rank: 64 60 >> Number of 1-cells per rank: 144 133 >> Number of 2-cells per rank: 108 98 >> Number of 3-cells per rank: 27 24 >> Labels: >> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >> marker: 1 strata with value/size (1 (218)) >> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 >> (9)) >> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >> Field Field_0: >> adjacency FVM++ >> >> Thanks, >> Guer. >> Sent with Proton Mail secure email. >> >> ------- Original Message ------- >> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley < >> knepley at gmail.com> wrote: >> >> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >> >>> >>> Hi, >>> Sorry for my late response. I tried with your suggestions and I think I >>> made a progress. But I still got issues. Let me explain my latest mesh >>> routine: >>> >>> >>> 1. DMPlexCreateBoxMesh >>> 2. DMSetFromOptions >>> 3. PetscSectionCreate >>> 4. PetscSectionSetNumFields >>> 5. PetscSectionSetFieldDof >>> 6. PetscSectionSetDof >>> 7. PetscSectionSetUp >>> 8. DMSetLocalSection >>> 9. DMSetAdjacency >>> 10. DMPlexDistribute >>> >>> >>> It's still not working but it's promising, if I call >>> DMPlexGetDepthStratum for cells, I can see that after distribution >>> processors have more cells. >>> >> >> Please send the output of DMPlexView() for each incarnation of the mesh. >> What I do is put >> >> DMViewFromOptions(dm, NULL, "-dm1_view") >> >> >> with a different string after each call. >> >>> But I couldn't figure out how to decide where the ghost/processor >>> boundary cells start. >>> >> >> Please send the actual code because the above is not specific enough. For >> example, you will not have >> "ghost cells" unless you partition with overlap. This is because by >> default cells are the partitioned quantity, >> so each process gets a unique set. >> >> Thanks, >> >> Matt >> >>> In older mails I saw there is a function DMPlexGetHybridBounds but I >>> think that function is deprecated. I tried to use, >>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting >>> -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, >>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling >>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. >>> I think I can calculate the ghost cell indices using cStart/cEnd before & >>> after distribution but I think there is a better way I'm currently missing. >>> >>> Thanks again, >>> Guer. >>> >>> ------- Original Message ------- >>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley < >>> knepley at gmail.com> wrote: >>> >>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >>>> Hi, >>>> >>>> I am currently using DMPlex in my code. It runs serially at the moment, >>>> but I'm interested in adding parallel options. Here is my workflow: >>>> >>>> Create a DMPlex mesh from GMSH. >>>> Reorder it with DMPlexPermute. >>>> Create necessary pre-processing arrays related to the mesh/problem. >>>> Create field(s) with multi-dofs. >>>> Create residual vectors. >>>> Define a function to calculate the residual for each cell and, use SNES. >>>> As you can see, I'm not using FV or FE structures (most examples do). >>>> Now, I'm trying to implement this in parallel using a similar approach. >>>> However, I'm struggling to understand how to create corresponding vectors >>>> and how to obtain index sets for each processor. Is there a tutorial or >>>> paper that covers this topic? >>>> >>> >>> The intention was that there is enough information in the manual to do >>> this. >>> >>> Using PetscFE/PetscFV is not required. However, I strongly encourage you >>> to use PetscSection. Without this, it would be incredibly hard to do what >>> you want. Once the DM has a Section, it can do things like automatically >>> create vectors and matrices for you. It can redistribute them, subset them, >>> etc. The Section describes how dofs are assigned to pieces of the mesh >>> (mesh points). This is in the manual, and there are a few examples that do >>> it by hand. >>> >>> So I suggest changing your code to use PetscSection, and then letting us >>> know if things still do not work. >>> >>> Thanks, >>> >>> Matt >>> >>>> Thank you. >>>> Guer. >>>> >>>> Sent with Proton Mail secure email. >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1.c Type: application/octet-stream Size: 3127 bytes Desc: not available URL: From erdemguer at proton.me Mon Oct 16 05:54:17 2023 From: erdemguer at proton.me (erdemguer) Date: Mon, 16 Oct 2023 10:54:17 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: Hey again. This code outputs for example: After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 [0] m: 39 n: 39[1] m: 42 n: 42 Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on processor boundaries? P.S. It looks like I should use PetscFV or something like that at the first place. At first I thought, "I will just use SNES, I will compute only residual and jacobian on cells so why do bother with PetscFV?" So Thanks, E. Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley wrote: > On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: > >> Hi, unfortunately it's me again. >> >> I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like: > > Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this. > I have fixed it and attached. > > Thanks, > > Matt > >> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 >> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >> [1] ghost cell 14 >> [1] ghost cell 15 >> [1] ghost cell 16 >> [1] ghost cell 17 >> [1] ghost cell 18 >> [1] ghost cell 19 >> [1] ghost cell 20 >> [1] ghost cell 21 >> [1] ghost cell 22 >> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 >> [0] ghost cell 13 >> [0] ghost cell 14 >> [0] ghost cell 15 >> [0] ghost cell 16 >> [0] ghost cell 17 >> [0] ghost cell 18 >> [0] ghost cell 19 >> [0] ghost cell 20 >> [0] ghost cell 21 >> [0] ghost cell 22 >> [0] ghost cell 23 >> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 >> Fatal error in internal_Waitall: Unknown error class, error stack: >> internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed >> MPIR_Waitall(1099)........................: >> MPIR_Waitall_impl(1011)...................: >> MPIR_Waitall_state(976)...................: >> MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait() >> MPIDI_CH3I_Progress_handle_sock_event(411): >> ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address) >> >> Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working. >> If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1 >> >> Version: Petsc Release Version 3.20.0 >> >> Thank you, >> Guer >> >> Sent with [Proton Mail](https://proton.me/) secure email. >> >> ------- Original Message ------- >> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer wrote: >> >>> Thank you! That's exactly what I need. >>> >>> Sent with [Proton Mail](https://proton.me/) secure email. >>> >>> ------- Original Message ------- >>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley wrote: >>> >>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >>>> >>>>> Hi again, >>>> >>>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> Here is my code: >>>>> #include >>>>> static char help[] = "dmplex"; >>>>> >>>>> int main(int argc, char **argv) >>>>> { >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>>>> DM dm, dm_dist; >>>>> PetscSection section; >>>>> PetscInt cStart, cEndInterior, cEnd, rank; >>>>> PetscInt nc[3] = {3, 3, 3}; >>>>> PetscReal upper[3] = {1, 1, 1}; >>>>> >>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>>>> >>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm); >>>>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>>>> PetscCall(DMSetFromOptions(dm)); >>>>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>>>> >>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>> DMPlexComputeCellTypes(dm); >>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL)); >>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>> cEndInterior, cEnd); >>>>> >>>>> PetscInt nField = 1, nDof = 3, field = 0; >>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>>>> PetscSectionSetNumFields(section, nField); >>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>>>> for (PetscInt p = cStart; p < cEnd; p++) >>>>> { >>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>>>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>>>> } >>>>> >>>>> PetscCall(PetscSectionSetUp(section)); >>>>> >>>>> DMSetLocalSection(dm, section); >>>>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>>>> >>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>>>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>>>> if (dm_dist) >>>>> { >>>>> DMDestroy(&dm); >>>>> dm = dm_dist; >>>>> } >>>>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>> DMPlexComputeCellTypes(dm); >>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL)); >>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>> cEndInterior, cEnd); >>>>> >>>>> DMDestroy(&dm); >>>>> PetscCall(PetscFinalize());} >>>>> >>>>> This codes output is currently (on 2 processors) is: >>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>>>> >>>>> DMView outputs: >>>>> dm1_view (after creation): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 64 0 >>>>> Number of 1-cells per rank: 144 0 >>>>> Number of 2-cells per rank: 108 0 >>>>> Number of 3-cells per rank: 27 0 >>>>> Labels: >>>>> marker: 1 strata with value/size (1 (218)) >>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9)) >>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>>>> >>>>> dm2_view (after setfromoptions): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 40 46 >>>>> Number of 1-cells per rank: 83 95 >>>>> Number of 2-cells per rank: 57 64 >>>>> Number of 3-cells per rank: 13 14 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>> marker: 1 strata with value/size (1 (109)) >>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>> >>>>> dm3_view (after setting local section): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 40 46 >>>>> Number of 1-cells per rank: 83 95 >>>>> Number of 2-cells per rank: 57 64 >>>>> Number of 3-cells per rank: 13 14 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>> marker: 1 strata with value/size (1 (109)) >>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>> Field Field_0: adjacency FEM >>>>> >>>>> dm4_view (after setting adjacency): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 40 46 >>>>> Number of 1-cells per rank: 83 95 >>>>> Number of 2-cells per rank: 57 64 >>>>> Number of 3-cells per rank: 13 14 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>> marker: 1 strata with value/size (1 (109)) >>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>> Field Field_0: adjacency FVM++ >>>>> >>>>> dm5_view (after distribution): >>>>> DM Object: Parallel Mesh 2 MPI processes >>>>> type: plex >>>>> Parallel Mesh in 3 dimensions: >>>>> Number of 0-cells per rank: 64 60 >>>>> Number of 1-cells per rank: 144 133 >>>>> Number of 2-cells per rank: 108 98 >>>>> Number of 3-cells per rank: 27 24 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>>> marker: 1 strata with value/size (1 (218)) >>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9)) >>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>>>> Field Field_0: adjacency FVM++ >>>>> >>>>> Thanks, >>>>> Guer. >>>>> >>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>> >>>>> ------- Original Message ------- >>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley wrote: >>>>> >>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>>>>> >>>>>>> Hi, >>>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: >>>>>>> >>>>>>> - DMPlexCreateBoxMesh >>>>>>> >>>>>>> - DMSetFromOptions >>>>>>> - PetscSectionCreate >>>>>>> - PetscSectionSetNumFields >>>>>>> - PetscSectionSetFieldDof >>>>>>> >>>>>>> - PetscSectionSetDof >>>>>>> >>>>>>> - PetscSectionSetUp >>>>>>> - DMSetLocalSection >>>>>>> - DMSetAdjacency >>>>>>> - DMPlexDistribute >>>>>>> >>>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. >>>>>> >>>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put >>>>>> >>>>>> DMViewFromOptions(dm, NULL, "-dm1_view") >>>>>> >>>>>> with a different string after each call. >>>>>> >>>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start. >>>>>> >>>>>> Please send the actual code because the above is not specific enough. For example, you will not have >>>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, >>>>>> so each process gets a unique set. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. >>>>>>> >>>>>>> Thanks again, >>>>>>> Guer. >>>>>>> >>>>>>> ------- Original Message ------- >>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: >>>>>>> >>>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>>>>> >>>>>>>>> Create a DMPlex mesh from GMSH. >>>>>>>>> Reorder it with DMPlexPermute. >>>>>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>>>>> Create field(s) with multi-dofs. >>>>>>>>> Create residual vectors. >>>>>>>>> Define a function to calculate the residual for each cell and, use SNES. >>>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? >>>>>>>> >>>>>>>> The intention was that there is enough information in the manual to do this. >>>>>>>> >>>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. >>>>>>>> >>>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> Guer. >>>>>>>>> >>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>>>> >>>>>> -- >>>>>> >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>> >>>> -- >>>> >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 16 08:11:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Oct 2023 09:11:58 -0400 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: On Mon, Oct 16, 2023 at 6:54?AM erdemguer wrote: > Hey again. > > This code outputs for example: > > After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 > After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 > [0] m: 39 n: 39 > [1] m: 42 n: 42 > > Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on > processor boundaries? > Here is my output master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_refine 1 Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32 After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24 After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24 [0] m: 48 n: 48 [1] m: 48 n: 48 The mesh is 4x4 and also split into two triangles, so 32 triangles. Then we split it and have 8 overlap cells on each side. You can get quads using master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16 After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12 After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12 [0] m: 24 n: 24 [1] m: 24 n: 24 It is the same 4x4 mesh, but now with quads. Thanks, Matt P.S. It looks like I should use PetscFV or something like that at the first > place. At first I thought, "I will just use SNES, I will compute only > residual and jacobian on cells so why do bother with PetscFV?" So > > Thanks, > E. > Sent with Proton Mail secure email. > > ------- Original Message ------- > On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: > >> Hi, unfortunately it's me again. >> >> I have some weird troubles with creating matrix with DMPlex. Actually I >> might not need to create matrix explicitly, but SNESSolve crashes at there >> too. So, I updated the code you provided. When I tried to use >> DMCreateMatrix() at first, I got an error "Unknown discretization type >> for field 0" at first I applied DMSetLocalSection() and this error is gone. >> But this time when I run the code with multiple processors, sometimes I got >> an output like: >> > > Some setup was out of order so the section size on proc1 was 0, and I was > not good about checking this. > I have fixed it and attached. > > Thanks, > > Matt > > Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 >> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >> [1] ghost cell 14 >> [1] ghost cell 15 >> [1] ghost cell 16 >> [1] ghost cell 17 >> [1] ghost cell 18 >> [1] ghost cell 19 >> [1] ghost cell 20 >> [1] ghost cell 21 >> [1] ghost cell 22 >> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 >> [0] ghost cell 13 >> [0] ghost cell 14 >> [0] ghost cell 15 >> [0] ghost cell 16 >> [0] ghost cell 17 >> [0] ghost cell 18 >> [0] ghost cell 19 >> [0] ghost cell 20 >> [0] ghost cell 21 >> [0] ghost cell 22 >> [0] ghost cell 23 >> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 >> Fatal error in internal_Waitall: Unknown error class, error stack: >> internal_Waitall(82)......................: MPI_Waitall(count=1, >> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed >> MPIR_Waitall(1099)........................: >> MPIR_Waitall_impl(1011)...................: >> MPIR_Waitall_state(976)...................: >> MPIDI_CH3i_Progress_wait(187).............: an error occurred while >> handling an event returned by MPIDI_CH3I_Sock_Wait() >> MPIDI_CH3I_Progress_handle_sock_event(411): >> ReadMoreData(744).........................: ch3|sock|immedread >> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880 >> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains >> invalid memory (set=0,sock=1,errno=14:Bad address) >> >> Sometimes the error message isn't appearing but for example I'm trying to >> print size of the matrix but it isn't working. >> If necessary, my Configure options --download-mpich --download-hwloc >> --download-pastix --download-hypre --download-ml --download-ctetgen >> --download-triangle --download-exodusii --download-netcdf --download-zlib >> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 >> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g >> -O2" --with-debugging=1 >> >> Version: Petsc Release Version 3.20.0 >> >> Thank you, >> Guer >> >> Sent with Proton Mail secure email. >> >> ------- Original Message ------- >> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer < >> erdemguer at proton.me> wrote: >> >> Thank you! That's exactly what I need. >> >> Sent with Proton Mail secure email. >> >> ------- Original Message ------- >> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley < >> knepley at gmail.com> wrote: >> >> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >> >>> Hi again, >>> >> >> I see the problem. FV ghosts mean extra boundary cells added in FV >> methods using DMPlexCreateGhostCells() in order to impose boundary >> conditions. They are not the "ghost" cells for overlapping parallel >> decompositions. I have changed your code to give you what you want. It is >> attached. >> >> Thanks, >> >> Matt >> >>> Here is my code: >>> #include >>> static char help[] = "dmplex"; >>> >>> int main(int argc, char **argv) >>> { >>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>> DM dm, dm_dist; >>> PetscSection section; >>> PetscInt cStart, cEndInterior, cEnd, rank; >>> PetscInt nc[3] = {3, 3, 3}; >>> PetscReal upper[3] = {1, 1, 1}; >>> >>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>> >>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, >>> NULL, PETSC_TRUE, &dm); >>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>> PetscCall(DMSetFromOptions(dm)); >>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>> >>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>> DMPlexComputeCellTypes(dm); >>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, >>> &cEndInterior, NULL)); >>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, >>> cEndInterior: %d, cEnd: %d\n", rank, cStart, >>> cEndInterior, cEnd); >>> >>> PetscInt nField = 1, nDof = 3, field = 0; >>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>> PetscSectionSetNumFields(section, nField); >>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>> for (PetscInt p = cStart; p < cEnd; p++) >>> { >>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>> } >>> >>> PetscCall(PetscSectionSetUp(section)); >>> >>> DMSetLocalSection(dm, section); >>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>> >>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>> if (dm_dist) >>> { >>> DMDestroy(&dm); >>> dm = dm_dist; >>> } >>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>> DMPlexComputeCellTypes(dm); >>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, >>> &cEndInterior, NULL)); >>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, >>> cEndInterior: %d, cEnd: %d\n", rank, cStart, >>> cEndInterior, cEnd); >>> >>> DMDestroy(&dm); >>> PetscCall(PetscFinalize()); >>> } >>> >>> This codes output is currently (on 2 processors) is: >>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27 >>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>> >>> DMView outputs: >>> dm1_view (after creation): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 64 0 >>> Number of 1-cells per rank: 144 0 >>> Number of 2-cells per rank: 108 0 >>> Number of 3-cells per rank: 27 0 >>> Labels: >>> marker: 1 strata with value/size (1 (218)) >>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), >>> 2 (9)) >>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>> >>> dm2_view (after setfromoptions): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 40 46 >>> Number of 1-cells per rank: 83 95 >>> Number of 2-cells per rank: 57 64 >>> Number of 3-cells per rank: 13 14 >>> Labels: >>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>> marker: 1 strata with value/size (1 (109)) >>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>> >>> dm3_view (after setting local section): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 40 46 >>> Number of 1-cells per rank: 83 95 >>> Number of 2-cells per rank: 57 64 >>> Number of 3-cells per rank: 13 14 >>> Labels: >>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>> marker: 1 strata with value/size (1 (109)) >>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>> Field Field_0: >>> adjacency FEM >>> >>> dm4_view (after setting adjacency): >>> DM Object: 2 MPI processes >>> type: plex >>> DM_0x84000004_0 in 3 dimensions: >>> Number of 0-cells per rank: 40 46 >>> Number of 1-cells per rank: 83 95 >>> Number of 2-cells per rank: 57 64 >>> Number of 3-cells per rank: 13 14 >>> Labels: >>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>> marker: 1 strata with value/size (1 (109)) >>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>> Field Field_0: >>> adjacency FVM++ >>> >>> dm5_view (after distribution): >>> DM Object: Parallel Mesh 2 MPI processes >>> type: plex >>> Parallel Mesh in 3 dimensions: >>> Number of 0-cells per rank: 64 60 >>> Number of 1-cells per rank: 144 133 >>> Number of 2-cells per rank: 108 98 >>> Number of 3-cells per rank: 27 24 >>> Labels: >>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>> marker: 1 strata with value/size (1 (218)) >>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), >>> 6 (9)) >>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>> Field Field_0: >>> adjacency FVM++ >>> >>> Thanks, >>> Guer. >>> Sent with Proton Mail secure email. >>> >>> ------- Original Message ------- >>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley < >>> knepley at gmail.com> wrote: >>> >>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>> >>>> >>>> Hi, >>>> Sorry for my late response. I tried with your suggestions and I think I >>>> made a progress. But I still got issues. Let me explain my latest mesh >>>> routine: >>>> >>>> >>>> 1. DMPlexCreateBoxMesh >>>> 2. DMSetFromOptions >>>> 3. PetscSectionCreate >>>> 4. PetscSectionSetNumFields >>>> 5. PetscSectionSetFieldDof >>>> 6. PetscSectionSetDof >>>> 7. PetscSectionSetUp >>>> 8. DMSetLocalSection >>>> 9. DMSetAdjacency >>>> 10. DMPlexDistribute >>>> >>>> >>>> It's still not working but it's promising, if I call >>>> DMPlexGetDepthStratum for cells, I can see that after distribution >>>> processors have more cells. >>>> >>> >>> Please send the output of DMPlexView() for each incarnation of the mesh. >>> What I do is put >>> >>> DMViewFromOptions(dm, NULL, "-dm1_view") >>> >>> >>> with a different string after each call. >>> >>>> But I couldn't figure out how to decide where the ghost/processor >>>> boundary cells start. >>>> >>> >>> Please send the actual code because the above is not specific enough. >>> For example, you will not have >>> "ghost cells" unless you partition with overlap. This is because by >>> default cells are the partitioned quantity, >>> so each process gets a unique set. >>> >>> Thanks, >>> >>> Matt >>> >>>> In older mails I saw there is a function DMPlexGetHybridBounds but I >>>> think that function is deprecated. I tried to use, >>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting >>>> -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, >>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling >>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. >>>> I think I can calculate the ghost cell indices using cStart/cEnd before & >>>> after distribution but I think there is a better way I'm currently missing. >>>> >>>> Thanks again, >>>> Guer. >>>> >>>> ------- Original Message ------- >>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley < >>>> knepley at gmail.com> wrote: >>>> >>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am currently using DMPlex in my code. It runs serially at the >>>>> moment, but I'm interested in adding parallel options. Here is my workflow: >>>>> >>>>> Create a DMPlex mesh from GMSH. >>>>> Reorder it with DMPlexPermute. >>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>> Create field(s) with multi-dofs. >>>>> Create residual vectors. >>>>> Define a function to calculate the residual for each cell and, use >>>>> SNES. >>>>> As you can see, I'm not using FV or FE structures (most examples do). >>>>> Now, I'm trying to implement this in parallel using a similar approach. >>>>> However, I'm struggling to understand how to create corresponding vectors >>>>> and how to obtain index sets for each processor. Is there a tutorial or >>>>> paper that covers this topic? >>>>> >>>> >>>> The intention was that there is enough information in the manual to do >>>> this. >>>> >>>> Using PetscFE/PetscFV is not required. However, I strongly encourage >>>> you to use PetscSection. Without this, it would be incredibly hard to do >>>> what you want. Once the DM has a Section, it can do things like >>>> automatically create vectors and matrices for you. It can redistribute >>>> them, subset them, etc. The Section describes how dofs are assigned to >>>> pieces of the mesh (mesh points). This is in the manual, and there are a >>>> few examples that do it by hand. >>>> >>>> So I suggest changing your code to use PetscSection, and then letting >>>> us know if things still do not work. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> Thank you. >>>>> Guer. >>>>> >>>>> Sent with Proton Mail secure email. >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From erdemguer at proton.me Mon Oct 16 08:22:43 2023 From: erdemguer at proton.me (erdemguer) Date: Mon, 16 Oct 2023 13:22:43 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: Thank you for your responses many times. Looks like I'm missing something, sorry for my confusion, but let's take processor 0 on your first output. cEndInterior: 16 and cEnd: 24. I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2 (col = 56) have influence on it. (Cell 18 is on processor boundary) Shouldn't I have to write values on the (42,56)? Thanks, Guer Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley wrote: > On Mon, Oct 16, 2023 at 6:54?AM erdemguer wrote: > >> Hey again. >> >> This code outputs for example: >> >> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 >> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 >> [0] m: 39 n: 39[1] m: 42 n: 42 >> >> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on processor boundaries? > > Here is my output > > master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_refine 1 > Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 > Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32 > After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24 > After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24 > [0] m: 48 n: 48 > [1] m: 48 n: 48 > > The mesh is 4x4 and also split into two triangles, so 32 triangles. Then we split it and have 8 overlap cells on each side. You can get quads using > > master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view > Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 > Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16 > After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12 > After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12 > [0] m: 24 n: 24 > [1] m: 24 n: 24 > > It is the same 4x4 mesh, but now with quads. > > Thanks, > > Matt > >> P.S. It looks like I should use PetscFV or something like that at the first place. At first I thought, "I will just use SNES, I will compute only residual and jacobian on cells so why do bother with PetscFV?" So >> >> Thanks, >> E. >> >> Sent with [Proton Mail](https://proton.me/) secure email. >> >> ------- Original Message ------- >> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley wrote: >> >>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: >>> >>>> Hi, unfortunately it's me again. >>>> >>>> I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like: >>> >>> Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this. >>> I have fixed it and attached. >>> >>> Thanks, >>> >>> Matt >>> >>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 >>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >>>> [1] ghost cell 14 >>>> [1] ghost cell 15 >>>> [1] ghost cell 16 >>>> [1] ghost cell 17 >>>> [1] ghost cell 18 >>>> [1] ghost cell 19 >>>> [1] ghost cell 20 >>>> [1] ghost cell 21 >>>> [1] ghost cell 22 >>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 >>>> [0] ghost cell 13 >>>> [0] ghost cell 14 >>>> [0] ghost cell 15 >>>> [0] ghost cell 16 >>>> [0] ghost cell 17 >>>> [0] ghost cell 18 >>>> [0] ghost cell 19 >>>> [0] ghost cell 20 >>>> [0] ghost cell 21 >>>> [0] ghost cell 22 >>>> [0] ghost cell 23 >>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 >>>> Fatal error in internal_Waitall: Unknown error class, error stack: >>>> internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed >>>> MPIR_Waitall(1099)........................: >>>> MPIR_Waitall_impl(1011)...................: >>>> MPIR_Waitall_state(976)...................: >>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait() >>>> MPIDI_CH3I_Progress_handle_sock_event(411): >>>> ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address) >>>> >>>> Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working. >>>> If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1 >>>> >>>> Version: Petsc Release Version 3.20.0 >>>> >>>> Thank you, >>>> Guer >>>> >>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>> >>>> ------- Original Message ------- >>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer wrote: >>>> >>>>> Thank you! That's exactly what I need. >>>>> >>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>> >>>>> ------- Original Message ------- >>>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley wrote: >>>>> >>>>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >>>>>> >>>>>>> Hi again, >>>>>> >>>>>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> Here is my code: >>>>>>> #include >>>>>>> static char help[] = "dmplex"; >>>>>>> >>>>>>> int main(int argc, char **argv) >>>>>>> { >>>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>>>>>> DM dm, dm_dist; >>>>>>> PetscSection section; >>>>>>> PetscInt cStart, cEndInterior, cEnd, rank; >>>>>>> PetscInt nc[3] = {3, 3, 3}; >>>>>>> PetscReal upper[3] = {1, 1, 1}; >>>>>>> >>>>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>>>>>> >>>>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm); >>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>>>>>> PetscCall(DMSetFromOptions(dm)); >>>>>>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>>>>>> >>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>>>> DMPlexComputeCellTypes(dm); >>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL)); >>>>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>>>> cEndInterior, cEnd); >>>>>>> >>>>>>> PetscInt nField = 1, nDof = 3, field = 0; >>>>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>>>>>> PetscSectionSetNumFields(section, nField); >>>>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>>>>>> for (PetscInt p = cStart; p < cEnd; p++) >>>>>>> { >>>>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>>>>>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>>>>>> } >>>>>>> >>>>>>> PetscCall(PetscSectionSetUp(section)); >>>>>>> >>>>>>> DMSetLocalSection(dm, section); >>>>>>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>>>>>> >>>>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>>>>>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>>>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>>>>>> if (dm_dist) >>>>>>> { >>>>>>> DMDestroy(&dm); >>>>>>> dm = dm_dist; >>>>>>> } >>>>>>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>>>> DMPlexComputeCellTypes(dm); >>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL)); >>>>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>>>> cEndInterior, cEnd); >>>>>>> >>>>>>> DMDestroy(&dm); >>>>>>> PetscCall(PetscFinalize());} >>>>>>> >>>>>>> This codes output is currently (on 2 processors) is: >>>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>>>>>> >>>>>>> DMView outputs: >>>>>>> dm1_view (after creation): >>>>>>> DM Object: 2 MPI processes >>>>>>> type: plex >>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>> Number of 0-cells per rank: 64 0 >>>>>>> Number of 1-cells per rank: 144 0 >>>>>>> Number of 2-cells per rank: 108 0 >>>>>>> Number of 3-cells per rank: 27 0 >>>>>>> Labels: >>>>>>> marker: 1 strata with value/size (1 (218)) >>>>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9)) >>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>>>>>> >>>>>>> dm2_view (after setfromoptions): >>>>>>> DM Object: 2 MPI processes >>>>>>> type: plex >>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>> Number of 0-cells per rank: 40 46 >>>>>>> Number of 1-cells per rank: 83 95 >>>>>>> Number of 2-cells per rank: 57 64 >>>>>>> Number of 3-cells per rank: 13 14 >>>>>>> Labels: >>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>>>> marker: 1 strata with value/size (1 (109)) >>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>>>> >>>>>>> dm3_view (after setting local section): >>>>>>> DM Object: 2 MPI processes >>>>>>> type: plex >>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>> Number of 0-cells per rank: 40 46 >>>>>>> Number of 1-cells per rank: 83 95 >>>>>>> Number of 2-cells per rank: 57 64 >>>>>>> Number of 3-cells per rank: 13 14 >>>>>>> Labels: >>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>>>> marker: 1 strata with value/size (1 (109)) >>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>>>> Field Field_0: adjacency FEM >>>>>>> >>>>>>> dm4_view (after setting adjacency): >>>>>>> DM Object: 2 MPI processes >>>>>>> type: plex >>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>> Number of 0-cells per rank: 40 46 >>>>>>> Number of 1-cells per rank: 83 95 >>>>>>> Number of 2-cells per rank: 57 64 >>>>>>> Number of 3-cells per rank: 13 14 >>>>>>> Labels: >>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>>>> marker: 1 strata with value/size (1 (109)) >>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>>>> Field Field_0: adjacency FVM++ >>>>>>> >>>>>>> dm5_view (after distribution): >>>>>>> DM Object: Parallel Mesh 2 MPI processes >>>>>>> type: plex >>>>>>> Parallel Mesh in 3 dimensions: >>>>>>> Number of 0-cells per rank: 64 60 >>>>>>> Number of 1-cells per rank: 144 133 >>>>>>> Number of 2-cells per rank: 108 98 >>>>>>> Number of 3-cells per rank: 27 24 >>>>>>> Labels: >>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>>>>> marker: 1 strata with value/size (1 (218)) >>>>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9)) >>>>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>>>>>> Field Field_0: adjacency FVM++ >>>>>>> >>>>>>> Thanks, >>>>>>> Guer. >>>>>>> >>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>>> >>>>>>> ------- Original Message ------- >>>>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley wrote: >>>>>>> >>>>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: >>>>>>>>> >>>>>>>>> - DMPlexCreateBoxMesh >>>>>>>>> >>>>>>>>> - DMSetFromOptions >>>>>>>>> - PetscSectionCreate >>>>>>>>> - PetscSectionSetNumFields >>>>>>>>> - PetscSectionSetFieldDof >>>>>>>>> >>>>>>>>> - PetscSectionSetDof >>>>>>>>> >>>>>>>>> - PetscSectionSetUp >>>>>>>>> - DMSetLocalSection >>>>>>>>> - DMSetAdjacency >>>>>>>>> - DMPlexDistribute >>>>>>>>> >>>>>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. >>>>>>>> >>>>>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put >>>>>>>> >>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view") >>>>>>>> >>>>>>>> with a different string after each call. >>>>>>>> >>>>>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start. >>>>>>>> >>>>>>>> Please send the actual code because the above is not specific enough. For example, you will not have >>>>>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, >>>>>>>> so each process gets a unique set. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. >>>>>>>>> >>>>>>>>> Thanks again, >>>>>>>>> Guer. >>>>>>>>> >>>>>>>>> ------- Original Message ------- >>>>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: >>>>>>>>> >>>>>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>>>>>>> >>>>>>>>>>> Create a DMPlex mesh from GMSH. >>>>>>>>>>> Reorder it with DMPlexPermute. >>>>>>>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>>>>>>> Create field(s) with multi-dofs. >>>>>>>>>>> Create residual vectors. >>>>>>>>>>> Define a function to calculate the residual for each cell and, use SNES. >>>>>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? >>>>>>>>>> >>>>>>>>>> The intention was that there is enough information in the manual to do this. >>>>>>>>>> >>>>>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. >>>>>>>>>> >>>>>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>>> Thank you. >>>>>>>>>>> Guer. >>>>>>>>>>> >>>>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>>>> >>>>>> -- >>>>>> >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>> >>> -- >>> >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 16 08:26:14 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Oct 2023 09:26:14 -0400 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: On Mon, Oct 16, 2023 at 9:22?AM erdemguer wrote: > Thank you for your responses many times. Looks like I'm missing something, > sorry for my confusion, but let's take processor 0 on your first output. > cEndInterior: 16 and cEnd: 24. > I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2 > (col = 56) have influence on it. (Cell 18 is on processor boundary) > Shouldn't I have to write values on the (42,56)? > Imagine you are me getting this mail. When I mail you, I show you _exactly_ what I ran and which command line options I used. You do not. I provide you all the output. You do not. You can see that someone would only be guessing when replying to this email. Also note that you have two dofs per cell, so the cell numbers are not the row numbers for the Jacobian. Please send something reproducible when you want help on running. Thanks, Matt > Thanks, > Guer > > Sent with Proton Mail secure email. > > ------- Original Message ------- > On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Mon, Oct 16, 2023 at 6:54?AM erdemguer wrote: > >> Hey again. >> >> This code outputs for example: >> >> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 >> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 >> [0] m: 39 n: 39 >> [1] m: 42 n: 42 >> >> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on >> processor boundaries? >> > > Here is my output > > master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 > -malloc_debug 0 -dm_refine 1 > Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 > Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32 > After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24 > After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24 > [0] m: 48 n: 48 > [1] m: 48 n: 48 > > The mesh is 4x4 and also split into two triangles, so 32 triangles. Then > we split it and have 8 overlap cells on each side. You can get quads using > > master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 > -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view > Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 > Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16 > After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12 > After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12 > [0] m: 24 n: 24 > [1] m: 24 n: 24 > It is the same 4x4 mesh, but now with quads. > > Thanks, > > Matt > > P.S. It looks like I should use PetscFV or something like that at the >> first place. At first I thought, "I will just use SNES, I will compute only >> residual and jacobian on cells so why do bother with PetscFV?" So >> >> Thanks, >> E. >> Sent with Proton Mail secure email. >> >> ------- Original Message ------- >> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley < >> knepley at gmail.com> wrote: >> >> On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: >> >>> Hi, unfortunately it's me again. >>> >>> I have some weird troubles with creating matrix with DMPlex. Actually I >>> might not need to create matrix explicitly, but SNESSolve crashes at there >>> too. So, I updated the code you provided. When I tried to use >>> DMCreateMatrix() at first, I got an error "Unknown discretization type >>> for field 0" at first I applied DMSetLocalSection() and this error is gone. >>> But this time when I run the code with multiple processors, sometimes I got >>> an output like: >>> >> >> Some setup was out of order so the section size on proc1 was 0, and I was >> not good about checking this. >> I have fixed it and attached. >> >> Thanks, >> >> Matt >> >> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 >>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >>> [1] ghost cell 14 >>> [1] ghost cell 15 >>> [1] ghost cell 16 >>> [1] ghost cell 17 >>> [1] ghost cell 18 >>> [1] ghost cell 19 >>> [1] ghost cell 20 >>> [1] ghost cell 21 >>> [1] ghost cell 22 >>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 >>> [0] ghost cell 13 >>> [0] ghost cell 14 >>> [0] ghost cell 15 >>> [0] ghost cell 16 >>> [0] ghost cell 17 >>> [0] ghost cell 18 >>> [0] ghost cell 19 >>> [0] ghost cell 20 >>> [0] ghost cell 21 >>> [0] ghost cell 22 >>> [0] ghost cell 23 >>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 >>> Fatal error in internal_Waitall: Unknown error class, error stack: >>> internal_Waitall(82)......................: MPI_Waitall(count=1, >>> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed >>> MPIR_Waitall(1099)........................: >>> MPIR_Waitall_impl(1011)...................: >>> MPIR_Waitall_state(976)...................: >>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while >>> handling an event returned by MPIDI_CH3I_Sock_Wait() >>> MPIDI_CH3I_Progress_handle_sock_event(411): >>> ReadMoreData(744).........................: ch3|sock|immedread >>> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880 >>> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains >>> invalid memory (set=0,sock=1,errno=14:Bad address) >>> >>> Sometimes the error message isn't appearing but for example I'm trying >>> to print size of the matrix but it isn't working. >>> If necessary, my Configure options --download-mpich --download-hwloc >>> --download-pastix --download-hypre --download-ml --download-ctetgen >>> --download-triangle --download-exodusii --download-netcdf --download-zlib >>> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 >>> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g >>> -O2" --with-debugging=1 >>> >>> Version: Petsc Release Version 3.20.0 >>> >>> Thank you, >>> Guer >>> >>> Sent with Proton Mail secure email. >>> >>> ------- Original Message ------- >>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer < >>> erdemguer at proton.me> wrote: >>> >>> Thank you! That's exactly what I need. >>> >>> Sent with Proton Mail secure email. >>> >>> ------- Original Message ------- >>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley < >>> knepley at gmail.com> wrote: >>> >>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >>> >>>> Hi again, >>>> >>> >>> I see the problem. FV ghosts mean extra boundary cells added in FV >>> methods using DMPlexCreateGhostCells() in order to impose boundary >>> conditions. They are not the "ghost" cells for overlapping parallel >>> decompositions. I have changed your code to give you what you want. It is >>> attached. >>> >>> Thanks, >>> >>> Matt >>> >>>> Here is my code: >>>> #include >>>> static char help[] = "dmplex"; >>>> >>>> int main(int argc, char **argv) >>>> { >>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>>> DM dm, dm_dist; >>>> PetscSection section; >>>> PetscInt cStart, cEndInterior, cEnd, rank; >>>> PetscInt nc[3] = {3, 3, 3}; >>>> PetscReal upper[3] = {1, 1, 1}; >>>> >>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>>> >>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, >>>> NULL, PETSC_TRUE, &dm); >>>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>>> PetscCall(DMSetFromOptions(dm)); >>>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>>> >>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>> DMPlexComputeCellTypes(dm); >>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, >>>> &cEndInterior, NULL)); >>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, >>>> cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>> cEndInterior, cEnd); >>>> >>>> PetscInt nField = 1, nDof = 3, field = 0; >>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>>> PetscSectionSetNumFields(section, nField); >>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>>> for (PetscInt p = cStart; p < cEnd; p++) >>>> { >>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>>> } >>>> >>>> PetscCall(PetscSectionSetUp(section)); >>>> >>>> DMSetLocalSection(dm, section); >>>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>>> >>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>>> if (dm_dist) >>>> { >>>> DMDestroy(&dm); >>>> dm = dm_dist; >>>> } >>>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>> DMPlexComputeCellTypes(dm); >>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, >>>> &cEndInterior, NULL)); >>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, >>>> cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>> cEndInterior, cEnd); >>>> >>>> DMDestroy(&dm); >>>> PetscCall(PetscFinalize()); >>>> } >>>> >>>> This codes output is currently (on 2 processors) is: >>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27 >>>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>>> >>>> DMView outputs: >>>> dm1_view (after creation): >>>> DM Object: 2 MPI processes >>>> type: plex >>>> DM_0x84000004_0 in 3 dimensions: >>>> Number of 0-cells per rank: 64 0 >>>> Number of 1-cells per rank: 144 0 >>>> Number of 2-cells per rank: 108 0 >>>> Number of 3-cells per rank: 27 0 >>>> Labels: >>>> marker: 1 strata with value/size (1 (218)) >>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), >>>> 2 (9)) >>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>>> >>>> dm2_view (after setfromoptions): >>>> DM Object: 2 MPI processes >>>> type: plex >>>> DM_0x84000004_0 in 3 dimensions: >>>> Number of 0-cells per rank: 40 46 >>>> Number of 1-cells per rank: 83 95 >>>> Number of 2-cells per rank: 57 64 >>>> Number of 3-cells per rank: 13 14 >>>> Labels: >>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>> marker: 1 strata with value/size (1 (109)) >>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>> >>>> dm3_view (after setting local section): >>>> DM Object: 2 MPI processes >>>> type: plex >>>> DM_0x84000004_0 in 3 dimensions: >>>> Number of 0-cells per rank: 40 46 >>>> Number of 1-cells per rank: 83 95 >>>> Number of 2-cells per rank: 57 64 >>>> Number of 3-cells per rank: 13 14 >>>> Labels: >>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>> marker: 1 strata with value/size (1 (109)) >>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>> Field Field_0: >>>> adjacency FEM >>>> >>>> dm4_view (after setting adjacency): >>>> DM Object: 2 MPI processes >>>> type: plex >>>> DM_0x84000004_0 in 3 dimensions: >>>> Number of 0-cells per rank: 40 46 >>>> Number of 1-cells per rank: 83 95 >>>> Number of 2-cells per rank: 57 64 >>>> Number of 3-cells per rank: 13 14 >>>> Labels: >>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>> marker: 1 strata with value/size (1 (109)) >>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>> Field Field_0: >>>> adjacency FVM++ >>>> >>>> dm5_view (after distribution): >>>> DM Object: Parallel Mesh 2 MPI processes >>>> type: plex >>>> Parallel Mesh in 3 dimensions: >>>> Number of 0-cells per rank: 64 60 >>>> Number of 1-cells per rank: 144 133 >>>> Number of 2-cells per rank: 108 98 >>>> Number of 3-cells per rank: 27 24 >>>> Labels: >>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>> marker: 1 strata with value/size (1 (218)) >>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), >>>> 6 (9)) >>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>>> Field Field_0: >>>> adjacency FVM++ >>>> >>>> Thanks, >>>> Guer. >>>> Sent with Proton Mail secure email. >>>> >>>> ------- Original Message ------- >>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley < >>>> knepley at gmail.com> wrote: >>>> >>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>>> >>>>> >>>>> Hi, >>>>> Sorry for my late response. I tried with your suggestions and I think >>>>> I made a progress. But I still got issues. Let me explain my latest mesh >>>>> routine: >>>>> >>>>> >>>>> 1. DMPlexCreateBoxMesh >>>>> 2. DMSetFromOptions >>>>> 3. PetscSectionCreate >>>>> 4. PetscSectionSetNumFields >>>>> 5. PetscSectionSetFieldDof >>>>> 6. PetscSectionSetDof >>>>> 7. PetscSectionSetUp >>>>> 8. DMSetLocalSection >>>>> 9. DMSetAdjacency >>>>> 10. DMPlexDistribute >>>>> >>>>> >>>>> It's still not working but it's promising, if I call >>>>> DMPlexGetDepthStratum for cells, I can see that after distribution >>>>> processors have more cells. >>>>> >>>> >>>> Please send the output of DMPlexView() for each incarnation of the >>>> mesh. What I do is put >>>> >>>> DMViewFromOptions(dm, NULL, "-dm1_view") >>>> >>>> >>>> with a different string after each call. >>>> >>>>> But I couldn't figure out how to decide where the ghost/processor >>>>> boundary cells start. >>>>> >>>> >>>> Please send the actual code because the above is not specific enough. >>>> For example, you will not have >>>> "ghost cells" unless you partition with overlap. This is because by >>>> default cells are the partitioned quantity, >>>> so each process gets a unique set. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I >>>>> think that function is deprecated. I tried to use, >>>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting >>>>> -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, >>>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling >>>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. >>>>> I think I can calculate the ghost cell indices using cStart/cEnd before & >>>>> after distribution but I think there is a better way I'm currently missing. >>>>> >>>>> Thanks again, >>>>> Guer. >>>>> >>>>> ------- Original Message ------- >>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley < >>>>> knepley at gmail.com> wrote: >>>>> >>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am currently using DMPlex in my code. It runs serially at the >>>>>> moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>> >>>>>> Create a DMPlex mesh from GMSH. >>>>>> Reorder it with DMPlexPermute. >>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>> Create field(s) with multi-dofs. >>>>>> Create residual vectors. >>>>>> Define a function to calculate the residual for each cell and, use >>>>>> SNES. >>>>>> As you can see, I'm not using FV or FE structures (most examples do). >>>>>> Now, I'm trying to implement this in parallel using a similar approach. >>>>>> However, I'm struggling to understand how to create corresponding vectors >>>>>> and how to obtain index sets for each processor. Is there a tutorial or >>>>>> paper that covers this topic? >>>>>> >>>>> >>>>> The intention was that there is enough information in the manual to do >>>>> this. >>>>> >>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage >>>>> you to use PetscSection. Without this, it would be incredibly hard to do >>>>> what you want. Once the DM has a Section, it can do things like >>>>> automatically create vectors and matrices for you. It can redistribute >>>>> them, subset them, etc. The Section describes how dofs are assigned to >>>>> pieces of the mesh (mesh points). This is in the manual, and there are a >>>>> few examples that do it by hand. >>>>> >>>>> So I suggest changing your code to use PetscSection, and then letting >>>>> us know if things still do not work. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> Thank you. >>>>>> Guer. >>>>>> >>>>>> Sent with Proton Mail secure email. >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From erdemguer at proton.me Mon Oct 16 09:10:11 2023 From: erdemguer at proton.me (erdemguer) Date: Mon, 16 Oct 2023 14:10:11 +0000 Subject: [petsc-users] Parallel DMPlex In-Reply-To: References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> Message-ID: <90bHf8yDZXoFytUTk641jXgda3Mn6NMzpRaq3fL8oFuz05hAmy1THxKDvq7gKZZcy3ejvqnFZAaKLy5-WegRmL-TDPZVzjM_Y5-SR0FK3DY=@proton.me> I'm truly sorry for my bad. I set the nDof = 1 for simplicity. You can find my code in the attachments. In that code I tried to find an example of a cell which is neighbor to a cell in the another processor and print them. Here is my output: (base) ? build git:(main) ? /petsc/lib/petsc/bin/petscmpiexec -n 2 ./ex1_eg -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 3,3,3 Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 [0] m: 13 n: 13 [1] m: 14 n: 14 [1] Face: 94, Center Cell: 7, Ghost Neighbor Cell: 23[0] Face: 145, Center Cell: 12, Ghost Neighbor Cell: 20 For example, if I'm writing residual for cell 12 on rank 0, I thought I need to write on (12,20) on the matrix too. But looks like that isn't the case. Thanks, Guer Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Monday, October 16th, 2023 at 4:26 PM, Matthew Knepley wrote: > On Mon, Oct 16, 2023 at 9:22?AM erdemguer wrote: > >> Thank you for your responses many times. Looks like I'm missing something, sorry for my confusion, but let's take processor 0 on your first output. cEndInterior: 16 and cEnd: 24. >> I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2 (col = 56) have influence on it. (Cell 18 is on processor boundary) >> Shouldn't I have to write values on the (42,56)? > > Imagine you are me getting this mail. When I mail you, I show you _exactly_ what I ran and which command line options I used. You do not. I provide you all the output. You do not. You can see that someone would only be guessing when replying to this email. Also note that you have two dofs per cell, so the cell numbers are not the row numbers for the Jacobian. Please send something reproducible when you want help on running. > > Thanks, > > Matt > >> Thanks, >> Guer >> >> Sent with [Proton Mail](https://proton.me/) secure email. >> >> ------- Original Message ------- >> On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley wrote: >> >>> On Mon, Oct 16, 2023 at 6:54?AM erdemguer wrote: >>> >>>> Hey again. >>>> >>>> This code outputs for example: >>>> >>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 >>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 >>>> [0] m: 39 n: 39[1] m: 42 n: 42 >>>> >>>> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on processor boundaries? >>> >>> Here is my output >>> >>> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_refine 1 >>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32 >>> After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24 >>> After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24 >>> [0] m: 48 n: 48 >>> [1] m: 48 n: 48 >>> >>> The mesh is 4x4 and also split into two triangles, so 32 triangles. Then we split it and have 8 overlap cells on each side. You can get quads using >>> >>> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view >>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16 >>> After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12 >>> After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12 >>> [0] m: 24 n: 24 >>> [1] m: 24 n: 24 >>> >>> It is the same 4x4 mesh, but now with quads. >>> >>> Thanks, >>> >>> Matt >>> >>>> P.S. It looks like I should use PetscFV or something like that at the first place. At first I thought, "I will just use SNES, I will compute only residual and jacobian on cells so why do bother with PetscFV?" So >>>> >>>> Thanks, >>>> E. >>>> >>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>> >>>> ------- Original Message ------- >>>> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley wrote: >>>> >>>>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: >>>>> >>>>>> Hi, unfortunately it's me again. >>>>>> >>>>>> I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like: >>>>> >>>>> Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this. >>>>> I have fixed it and attached. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 >>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >>>>>> [1] ghost cell 14 >>>>>> [1] ghost cell 15 >>>>>> [1] ghost cell 16 >>>>>> [1] ghost cell 17 >>>>>> [1] ghost cell 18 >>>>>> [1] ghost cell 19 >>>>>> [1] ghost cell 20 >>>>>> [1] ghost cell 21 >>>>>> [1] ghost cell 22 >>>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 >>>>>> [0] ghost cell 13 >>>>>> [0] ghost cell 14 >>>>>> [0] ghost cell 15 >>>>>> [0] ghost cell 16 >>>>>> [0] ghost cell 17 >>>>>> [0] ghost cell 18 >>>>>> [0] ghost cell 19 >>>>>> [0] ghost cell 20 >>>>>> [0] ghost cell 21 >>>>>> [0] ghost cell 22 >>>>>> [0] ghost cell 23 >>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 >>>>>> Fatal error in internal_Waitall: Unknown error class, error stack: >>>>>> internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed >>>>>> MPIR_Waitall(1099)........................: >>>>>> MPIR_Waitall_impl(1011)...................: >>>>>> MPIR_Waitall_state(976)...................: >>>>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait() >>>>>> MPIDI_CH3I_Progress_handle_sock_event(411): >>>>>> ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address) >>>>>> >>>>>> Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working. >>>>>> If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1 >>>>>> >>>>>> Version: Petsc Release Version 3.20.0 >>>>>> >>>>>> Thank you, >>>>>> Guer >>>>>> >>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>> >>>>>> ------- Original Message ------- >>>>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer wrote: >>>>>> >>>>>>> Thank you! That's exactly what I need. >>>>>>> >>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>>> >>>>>>> ------- Original Message ------- >>>>>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley wrote: >>>>>>> >>>>>>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >>>>>>>> >>>>>>>>> Hi again, >>>>>>>> >>>>>>>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>>> Here is my code: >>>>>>>>> #include >>>>>>>>> static char help[] = "dmplex"; >>>>>>>>> >>>>>>>>> int main(int argc, char **argv) >>>>>>>>> { >>>>>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>>>>>>>> DM dm, dm_dist; >>>>>>>>> PetscSection section; >>>>>>>>> PetscInt cStart, cEndInterior, cEnd, rank; >>>>>>>>> PetscInt nc[3] = {3, 3, 3}; >>>>>>>>> PetscReal upper[3] = {1, 1, 1}; >>>>>>>>> >>>>>>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>>>>>>>> >>>>>>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm); >>>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>>>>>>>> PetscCall(DMSetFromOptions(dm)); >>>>>>>>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>>>>>>>> >>>>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>>>>>> DMPlexComputeCellTypes(dm); >>>>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL)); >>>>>>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>>>>>> cEndInterior, cEnd); >>>>>>>>> >>>>>>>>> PetscInt nField = 1, nDof = 3, field = 0; >>>>>>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>>>>>>>> PetscSectionSetNumFields(section, nField); >>>>>>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>>>>>>>> for (PetscInt p = cStart; p < cEnd; p++) >>>>>>>>> { >>>>>>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>>>>>>>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>>>>>>>> } >>>>>>>>> >>>>>>>>> PetscCall(PetscSectionSetUp(section)); >>>>>>>>> >>>>>>>>> DMSetLocalSection(dm, section); >>>>>>>>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>>>>>>>> >>>>>>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>>>>>>>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>>>>>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>>>>>>>> if (dm_dist) >>>>>>>>> { >>>>>>>>> DMDestroy(&dm); >>>>>>>>> dm = dm_dist; >>>>>>>>> } >>>>>>>>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>>>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>>>>>> DMPlexComputeCellTypes(dm); >>>>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL)); >>>>>>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>>>>>> cEndInterior, cEnd); >>>>>>>>> >>>>>>>>> DMDestroy(&dm); >>>>>>>>> PetscCall(PetscFinalize());} >>>>>>>>> >>>>>>>>> This codes output is currently (on 2 processors) is: >>>>>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>>>>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>>>>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>>>>>>>> >>>>>>>>> DMView outputs: >>>>>>>>> dm1_view (after creation): >>>>>>>>> DM Object: 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>>>> Number of 0-cells per rank: 64 0 >>>>>>>>> Number of 1-cells per rank: 144 0 >>>>>>>>> Number of 2-cells per rank: 108 0 >>>>>>>>> Number of 3-cells per rank: 27 0 >>>>>>>>> Labels: >>>>>>>>> marker: 1 strata with value/size (1 (218)) >>>>>>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9)) >>>>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>>>>>>>> >>>>>>>>> dm2_view (after setfromoptions): >>>>>>>>> DM Object: 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>>>> Number of 0-cells per rank: 40 46 >>>>>>>>> Number of 1-cells per rank: 83 95 >>>>>>>>> Number of 2-cells per rank: 57 64 >>>>>>>>> Number of 3-cells per rank: 13 14 >>>>>>>>> Labels: >>>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>>>>>> marker: 1 strata with value/size (1 (109)) >>>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>>>>>> >>>>>>>>> dm3_view (after setting local section): >>>>>>>>> DM Object: 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>>>> Number of 0-cells per rank: 40 46 >>>>>>>>> Number of 1-cells per rank: 83 95 >>>>>>>>> Number of 2-cells per rank: 57 64 >>>>>>>>> Number of 3-cells per rank: 13 14 >>>>>>>>> Labels: >>>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>>>>>> marker: 1 strata with value/size (1 (109)) >>>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>>>>>> Field Field_0: adjacency FEM >>>>>>>>> >>>>>>>>> dm4_view (after setting adjacency): >>>>>>>>> DM Object: 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> DM_0x84000004_0 in 3 dimensions: >>>>>>>>> Number of 0-cells per rank: 40 46 >>>>>>>>> Number of 1-cells per rank: 83 95 >>>>>>>>> Number of 2-cells per rank: 57 64 >>>>>>>>> Number of 3-cells per rank: 13 14 >>>>>>>>> Labels: >>>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>>>>>> marker: 1 strata with value/size (1 (109)) >>>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>>>>>> Field Field_0: adjacency FVM++ >>>>>>>>> >>>>>>>>> dm5_view (after distribution): >>>>>>>>> DM Object: Parallel Mesh 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> Parallel Mesh in 3 dimensions: >>>>>>>>> Number of 0-cells per rank: 64 60 >>>>>>>>> Number of 1-cells per rank: 144 133 >>>>>>>>> Number of 2-cells per rank: 108 98 >>>>>>>>> Number of 3-cells per rank: 27 24 >>>>>>>>> Labels: >>>>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>>>>>>> marker: 1 strata with value/size (1 (218)) >>>>>>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9)) >>>>>>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>>>>>>>> Field Field_0: adjacency FVM++ >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Guer. >>>>>>>>> >>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>>>>> >>>>>>>>> ------- Original Message ------- >>>>>>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley wrote: >>>>>>>>> >>>>>>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine: >>>>>>>>>>> >>>>>>>>>>> - DMPlexCreateBoxMesh >>>>>>>>>>> >>>>>>>>>>> - DMSetFromOptions >>>>>>>>>>> - PetscSectionCreate >>>>>>>>>>> - PetscSectionSetNumFields >>>>>>>>>>> - PetscSectionSetFieldDof >>>>>>>>>>> >>>>>>>>>>> - PetscSectionSetDof >>>>>>>>>>> >>>>>>>>>>> - PetscSectionSetUp >>>>>>>>>>> - DMSetLocalSection >>>>>>>>>>> - DMSetAdjacency >>>>>>>>>>> - DMPlexDistribute >>>>>>>>>>> >>>>>>>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. >>>>>>>>>> >>>>>>>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put >>>>>>>>>> >>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view") >>>>>>>>>> >>>>>>>>>> with a different string after each call. >>>>>>>>>> >>>>>>>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start. >>>>>>>>>> >>>>>>>>>> Please send the actual code because the above is not specific enough. For example, you will not have >>>>>>>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity, >>>>>>>>>> so each process gets a unique set. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing. >>>>>>>>>>> >>>>>>>>>>> Thanks again, >>>>>>>>>>> Guer. >>>>>>>>>>> >>>>>>>>>>> ------- Original Message ------- >>>>>>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley wrote: >>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>>>>>>>>> >>>>>>>>>>>>> Create a DMPlex mesh from GMSH. >>>>>>>>>>>>> Reorder it with DMPlexPermute. >>>>>>>>>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>>>>>>>>> Create field(s) with multi-dofs. >>>>>>>>>>>>> Create residual vectors. >>>>>>>>>>>>> Define a function to calculate the residual for each cell and, use SNES. >>>>>>>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic? >>>>>>>>>>>> >>>>>>>>>>>> The intention was that there is enough information in the manual to do this. >>>>>>>>>>>> >>>>>>>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand. >>>>>>>>>>>> >>>>>>>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>>> Thank you. >>>>>>>>>>>>> Guer. >>>>>>>>>>>>> >>>>>>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>>>> >>>>> -- >>>>> >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) >>> >>> -- >>> >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ex1_eg.c URL: From facklerpw at ornl.gov Mon Oct 16 09:33:01 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Mon, 16 Oct 2023 14:33:01 +0000 Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Junchao, I've attached updated timing plots (red and blue are swapped from before; yellow is the new one). There is an improvement for the NE_3 case only with CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI cases, MatShift doesn't show up (I assume because we're using different preconditioner arguments). So, there must be some other primary culprit. I'll try to get updated profiling data to you soon. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Fackler, Philip via Xolotl-psi-development Sent: Wednesday, October 11, 2023 11:31 To: Junchao Zhang Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface I'm on it. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Wednesday, October 11, 2023 10:14 To: Fackler, Philip Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net ; Blondel, Sophie Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface Hi, Philip, Could you try this branch jczhang/2023-10-05/feature-support-matshift-aijkokkos ? Thanks. --Junchao Zhang On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip > wrote: Aha! That makes sense. Thank you. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Thursday, October 5, 2023 17:29 To: Fackler, Philip > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net >; Blondel, Sophie > Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface Wait a moment, it seems it was because we do not have a GPU implementation of MatShift... Let me see how to add it. --Junchao Zhang On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang > wrote: Hi, Philip, I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface? MatSetValues() needs to copy the data from device to host and thus is expensive. Do you have profiling results with COO enabled? [Screenshot 2023-10-05 at 10.55.29?AM.png] --Junchao Zhang On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: Hi, Philip, I will look into the tarballs and get back to you. Thanks. --Junchao Zhang On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users > wrote: We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration). Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible. The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this. The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated. The tarballs linked here and here are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction. Thanks for your help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Total Solve Times.png Type: image/png Size: 15648 bytes Desc: Total Solve Times.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RHSJacobian() calls.png Type: image/png Size: 15568 bytes Desc: RHSJacobian() calls.png URL: From knepley at gmail.com Mon Oct 16 09:35:57 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Oct 2023 10:35:57 -0400 Subject: [petsc-users] Parallel DMPlex In-Reply-To: <90bHf8yDZXoFytUTk641jXgda3Mn6NMzpRaq3fL8oFuz05hAmy1THxKDvq7gKZZcy3ejvqnFZAaKLy5-WegRmL-TDPZVzjM_Y5-SR0FK3DY=@proton.me> References: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me> <90bHf8yDZXoFytUTk641jXgda3Mn6NMzpRaq3fL8oFuz05hAmy1THxKDvq7gKZZcy3ejvqnFZAaKLy5-WegRmL-TDPZVzjM_Y5-SR0FK3DY=@proton.me> Message-ID: On Mon, Oct 16, 2023 at 10:10?AM erdemguer wrote: > I'm truly sorry for my bad. > I set the nDof = 1 for simplicity. You can find my code in the > attachments. In that code I tried to find an example of a cell which is > neighbor to a cell in the another processor and print them. > Here is my output: > (base) ? build git:(main) ? /petsc/lib/petsc/bin/petscmpiexec -n 2 > ./ex1_eg -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 3,3,3 > Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 > Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 > After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 > After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 > [0] m: 13 n: 13 > [1] m: 14 n: 14 > [1] Face: 94, Center Cell: 7, Ghost Neighbor Cell: 23 > [0] Face: 145, Center Cell: 12, Ghost Neighbor Cell: 20 > You can force us to have the same partition using -petscpartitioner_type simple, master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 3,3,3 -petscpartitioner_type simple Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 After Distribution Rank: 0, cStart: 0, cEndInterior: 14, cEnd: 27 After Distribution Rank: 1, cStart: 0, cEndInterior: 13, cEnd: 26 [0] m: 14 n: 14 [1] m: 13 n: 13 [1] Face: 89, Center Cell: 0, Ghost Neighbor Cell: 25 [0] Face: 140, Center Cell: 13, Ghost Neighbor Cell: 14 > For example, if I'm writing residual for cell 12 on rank 0, I thought I > need to write on (12,20) on the matrix too. But looks like that isn't the > case. > There are two problems here: 1) 20 is the _local_ number of that cell, but matrices use global numbers. If you want to know that global number of that dof, it is two steps. First, you need the cell number on the other process. You can get this from the pointSF. If leaves[i] = 20, then remotes[i].index = Then you need the dof for that remote cell. However, this work has already been done by the global Section. So DMGetGlobalSection(dm, &gsec); PetscSectionGetOffset(gsec, 20, &off); off = -(off + 1); since dofs we do not own will be encoded as -(dof + 1). 2) You need to decide how you want to assemble. Do we assemble the contributions from the cells we own, or from the faces we own. Most FV people divide up the faces. Thanks, Matt > Thanks, > Guer > > Sent with Proton Mail secure email. > > ------- Original Message ------- > On Monday, October 16th, 2023 at 4:26 PM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Mon, Oct 16, 2023 at 9:22?AM erdemguer wrote: > >> Thank you for your responses many times. Looks like I'm missing >> something, sorry for my confusion, but let's take processor 0 on your first >> output. cEndInterior: 16 and cEnd: 24. >> I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2 >> (col = 56) have influence on it. (Cell 18 is on processor boundary) >> Shouldn't I have to write values on the (42,56)? >> > > Imagine you are me getting this mail. When I mail you, I show you > _exactly_ what I ran and which command line options I used. You do not. I > provide you all the output. You do not. You can see that someone would only > be guessing when replying to this email. Also note that you have two dofs > per cell, so the cell numbers are not the row numbers for the Jacobian. > Please send something reproducible when you want help on running. > > Thanks, > > Matt > >> Thanks, >> Guer >> >> Sent with Proton Mail secure email. >> >> ------- Original Message ------- >> On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley < >> knepley at gmail.com> wrote: >> >> On Mon, Oct 16, 2023 at 6:54?AM erdemguer wrote: >> >>> Hey again. >>> >>> This code outputs for example: >>> >>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24 >>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27 >>> [0] m: 39 n: 39 >>> [1] m: 42 n: 42 >>> >>> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on >>> processor boundaries? >>> >> >> Here is my output >> >> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 >> -malloc_debug 0 -dm_refine 1 >> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >> Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32 >> After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24 >> After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24 >> [0] m: 48 n: 48 >> [1] m: 48 n: 48 >> >> The mesh is 4x4 and also split into two triangles, so 32 triangles. Then >> we split it and have 8 overlap cells on each side. You can get quads using >> >> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 >> -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view >> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >> Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16 >> After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12 >> After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12 >> [0] m: 24 n: 24 >> [1] m: 24 n: 24 >> It is the same 4x4 mesh, but now with quads. >> >> Thanks, >> >> Matt >> >> P.S. It looks like I should use PetscFV or something like that at the >>> first place. At first I thought, "I will just use SNES, I will compute only >>> residual and jacobian on cells so why do bother with PetscFV?" So >>> >>> Thanks, >>> E. >>> Sent with Proton Mail secure email. >>> >>> ------- Original Message ------- >>> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley < >>> knepley at gmail.com> wrote: >>> >>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer wrote: >>> >>>> Hi, unfortunately it's me again. >>>> >>>> I have some weird troubles with creating matrix with DMPlex. Actually I >>>> might not need to create matrix explicitly, but SNESSolve crashes at there >>>> too. So, I updated the code you provided. When I tried to use >>>> DMCreateMatrix() at first, I got an error "Unknown discretization type >>>> for field 0" at first I applied DMSetLocalSection() and this error is gone. >>>> But this time when I run the code with multiple processors, sometimes I got >>>> an output like: >>>> >>> >>> Some setup was out of order so the section size on proc1 was 0, and I >>> was not good about checking this. >>> I have fixed it and attached. >>> >>> Thanks, >>> >>> Matt >>> >>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27 >>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0 >>>> [1] ghost cell 14 >>>> [1] ghost cell 15 >>>> [1] ghost cell 16 >>>> [1] ghost cell 17 >>>> [1] ghost cell 18 >>>> [1] ghost cell 19 >>>> [1] ghost cell 20 >>>> [1] ghost cell 21 >>>> [1] ghost cell 22 >>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23 >>>> [0] ghost cell 13 >>>> [0] ghost cell 14 >>>> [0] ghost cell 15 >>>> [0] ghost cell 16 >>>> [0] ghost cell 17 >>>> [0] ghost cell 18 >>>> [0] ghost cell 19 >>>> [0] ghost cell 20 >>>> [0] ghost cell 21 >>>> [0] ghost cell 22 >>>> [0] ghost cell 23 >>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24 >>>> Fatal error in internal_Waitall: Unknown error class, error stack: >>>> internal_Waitall(82)......................: MPI_Waitall(count=1, >>>> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed >>>> MPIR_Waitall(1099)........................: >>>> MPIR_Waitall_impl(1011)...................: >>>> MPIR_Waitall_state(976)...................: >>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while >>>> handling an event returned by MPIDI_CH3I_Sock_Wait() >>>> MPIDI_CH3I_Progress_handle_sock_event(411): >>>> ReadMoreData(744).........................: ch3|sock|immedread >>>> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880 >>>> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer >>>> contains invalid memory (set=0,sock=1,errno=14:Bad address) >>>> >>>> Sometimes the error message isn't appearing but for example I'm trying >>>> to print size of the matrix but it isn't working. >>>> If necessary, my Configure options --download-mpich --download-hwloc >>>> --download-pastix --download-hypre --download-ml --download-ctetgen >>>> --download-triangle --download-exodusii --download-netcdf --download-zlib >>>> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 >>>> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g >>>> -O2" --with-debugging=1 >>>> >>>> Version: Petsc Release Version 3.20.0 >>>> >>>> Thank you, >>>> Guer >>>> >>>> Sent with Proton Mail secure email. >>>> >>>> ------- Original Message ------- >>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer < >>>> erdemguer at proton.me> wrote: >>>> >>>> Thank you! That's exactly what I need. >>>> >>>> Sent with Proton Mail secure email. >>>> >>>> ------- Original Message ------- >>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley < >>>> knepley at gmail.com> wrote: >>>> >>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer wrote: >>>> >>>>> Hi again, >>>>> >>>> >>>> I see the problem. FV ghosts mean extra boundary cells added in FV >>>> methods using DMPlexCreateGhostCells() in order to impose boundary >>>> conditions. They are not the "ghost" cells for overlapping parallel >>>> decompositions. I have changed your code to give you what you want. It is >>>> attached. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> Here is my code: >>>>> #include >>>>> static char help[] = "dmplex"; >>>>> >>>>> int main(int argc, char **argv) >>>>> { >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>>>> DM dm, dm_dist; >>>>> PetscSection section; >>>>> PetscInt cStart, cEndInterior, cEnd, rank; >>>>> PetscInt nc[3] = {3, 3, 3}; >>>>> PetscReal upper[3] = {1, 1, 1}; >>>>> >>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >>>>> >>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, >>>>> NULL, PETSC_TRUE, &dm); >>>>> DMViewFromOptions(dm, NULL, "-dm1_view"); >>>>> PetscCall(DMSetFromOptions(dm)); >>>>> DMViewFromOptions(dm, NULL, "-dm2_view"); >>>>> >>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>> DMPlexComputeCellTypes(dm); >>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, >>>>> &cEndInterior, NULL)); >>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: >>>>> %d, cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>> cEndInterior, cEnd); >>>>> >>>>> PetscInt nField = 1, nDof = 3, field = 0; >>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, §ion)); >>>>> PetscSectionSetNumFields(section, nField); >>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd)); >>>>> for (PetscInt p = cStart; p < cEnd; p++) >>>>> { >>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof)); >>>>> PetscCall(PetscSectionSetDof(section, p, nDof)); >>>>> } >>>>> >>>>> PetscCall(PetscSectionSetUp(section)); >>>>> >>>>> DMSetLocalSection(dm, section); >>>>> DMViewFromOptions(dm, NULL, "-dm3_view"); >>>>> >>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE); >>>>> DMViewFromOptions(dm, NULL, "-dm4_view"); >>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist)); >>>>> if (dm_dist) >>>>> { >>>>> DMDestroy(&dm); >>>>> dm = dm_dist; >>>>> } >>>>> DMViewFromOptions(dm, NULL, "-dm5_view"); >>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd)); >>>>> DMPlexComputeCellTypes(dm); >>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, >>>>> &cEndInterior, NULL)); >>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, >>>>> cEndInterior: %d, cEnd: %d\n", rank, cStart, >>>>> cEndInterior, cEnd); >>>>> >>>>> DMDestroy(&dm); >>>>> PetscCall(PetscFinalize()); >>>>> } >>>>> >>>>> This codes output is currently (on 2 processors) is: >>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14 >>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13 >>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27 >>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24 >>>>> >>>>> DMView outputs: >>>>> dm1_view (after creation): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 64 0 >>>>> Number of 1-cells per rank: 144 0 >>>>> Number of 2-cells per rank: 108 0 >>>>> Number of 3-cells per rank: 27 0 >>>>> Labels: >>>>> marker: 1 strata with value/size (1 (218)) >>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 >>>>> (9), 2 (9)) >>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144)) >>>>> >>>>> dm2_view (after setfromoptions): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 40 46 >>>>> Number of 1-cells per rank: 83 95 >>>>> Number of 2-cells per rank: 57 64 >>>>> Number of 3-cells per rank: 13 14 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>> marker: 1 strata with value/size (1 (109)) >>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>> >>>>> dm3_view (after setting local section): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 40 46 >>>>> Number of 1-cells per rank: 83 95 >>>>> Number of 2-cells per rank: 57 64 >>>>> Number of 3-cells per rank: 13 14 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>> marker: 1 strata with value/size (1 (109)) >>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>> Field Field_0: >>>>> adjacency FEM >>>>> >>>>> dm4_view (after setting adjacency): >>>>> DM Object: 2 MPI processes >>>>> type: plex >>>>> DM_0x84000004_0 in 3 dimensions: >>>>> Number of 0-cells per rank: 40 46 >>>>> Number of 1-cells per rank: 83 95 >>>>> Number of 2-cells per rank: 57 64 >>>>> Number of 3-cells per rank: 13 14 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13)) >>>>> marker: 1 strata with value/size (1 (109)) >>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) >>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13)) >>>>> Field Field_0: >>>>> adjacency FVM++ >>>>> >>>>> dm5_view (after distribution): >>>>> DM Object: Parallel Mesh 2 MPI processes >>>>> type: plex >>>>> Parallel Mesh in 3 dimensions: >>>>> Number of 0-cells per rank: 64 60 >>>>> Number of 1-cells per rank: 144 133 >>>>> Number of 2-cells per rank: 108 98 >>>>> Number of 3-cells per rank: 27 24 >>>>> Labels: >>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) >>>>> marker: 1 strata with value/size (1 (218)) >>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 >>>>> (9), 6 (9)) >>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27)) >>>>> Field Field_0: >>>>> adjacency FVM++ >>>>> >>>>> Thanks, >>>>> Guer. >>>>> Sent with Proton Mail secure email. >>>>> >>>>> ------- Original Message ------- >>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley < >>>>> knepley at gmail.com> wrote: >>>>> >>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer wrote: >>>>> >>>>>> >>>>>> Hi, >>>>>> Sorry for my late response. I tried with your suggestions and I think >>>>>> I made a progress. But I still got issues. Let me explain my latest mesh >>>>>> routine: >>>>>> >>>>>> >>>>>> 1. DMPlexCreateBoxMesh >>>>>> 2. DMSetFromOptions >>>>>> 3. PetscSectionCreate >>>>>> 4. PetscSectionSetNumFields >>>>>> 5. PetscSectionSetFieldDof >>>>>> 6. PetscSectionSetDof >>>>>> 7. PetscSectionSetUp >>>>>> 8. DMSetLocalSection >>>>>> 9. DMSetAdjacency >>>>>> 10. DMPlexDistribute >>>>>> >>>>>> >>>>>> It's still not working but it's promising, if I call >>>>>> DMPlexGetDepthStratum for cells, I can see that after distribution >>>>>> processors have more cells. >>>>>> >>>>> >>>>> Please send the output of DMPlexView() for each incarnation of the >>>>> mesh. What I do is put >>>>> >>>>> DMViewFromOptions(dm, NULL, "-dm1_view") >>>>> >>>>> >>>>> with a different string after each call. >>>>> >>>>>> But I couldn't figure out how to decide where the ghost/processor >>>>>> boundary cells start. >>>>>> >>>>> >>>>> Please send the actual code because the above is not specific enough. >>>>> For example, you will not have >>>>> "ghost cells" unless you partition with overlap. This is because by >>>>> default cells are the partitioned quantity, >>>>> so each process gets a unique set. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I >>>>>> think that function is deprecated. I tried to use, >>>>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm >>>>>> getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, >>>>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling >>>>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. >>>>>> I think I can calculate the ghost cell indices using cStart/cEnd before & >>>>>> after distribution but I think there is a better way I'm currently missing. >>>>>> >>>>>> Thanks again, >>>>>> Guer. >>>>>> >>>>>> ------- Original Message ------- >>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley < >>>>>> knepley at gmail.com> wrote: >>>>>> >>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users < >>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am currently using DMPlex in my code. It runs serially at the >>>>>>> moment, but I'm interested in adding parallel options. Here is my workflow: >>>>>>> >>>>>>> Create a DMPlex mesh from GMSH. >>>>>>> Reorder it with DMPlexPermute. >>>>>>> Create necessary pre-processing arrays related to the mesh/problem. >>>>>>> Create field(s) with multi-dofs. >>>>>>> Create residual vectors. >>>>>>> Define a function to calculate the residual for each cell and, use >>>>>>> SNES. >>>>>>> As you can see, I'm not using FV or FE structures (most examples >>>>>>> do). Now, I'm trying to implement this in parallel using a similar >>>>>>> approach. However, I'm struggling to understand how to create corresponding >>>>>>> vectors and how to obtain index sets for each processor. Is there a >>>>>>> tutorial or paper that covers this topic? >>>>>>> >>>>>> >>>>>> The intention was that there is enough information in the manual to >>>>>> do this. >>>>>> >>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage >>>>>> you to use PetscSection. Without this, it would be incredibly hard to do >>>>>> what you want. Once the DM has a Section, it can do things like >>>>>> automatically create vectors and matrices for you. It can redistribute >>>>>> them, subset them, etc. The Section describes how dofs are assigned to >>>>>> pieces of the mesh (mesh points). This is in the manual, and there are a >>>>>> few examples that do it by hand. >>>>>> >>>>>> So I suggest changing your code to use PetscSection, and then letting >>>>>> us know if things still do not work. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> Thank you. >>>>>>> Guer. >>>>>>> >>>>>>> Sent with Proton Mail secure email. >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Oct 16 13:29:30 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 16 Oct 2023 18:29:30 +0000 Subject: [petsc-users] Using Sundials from PETSc Message-ID: Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc? We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code. Thanks, Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 16 14:03:35 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Oct 2023 15:03:35 -0400 Subject: [petsc-users] Using Sundials from PETSc In-Reply-To: References: Message-ID: On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, we were wondering if it would be possible to call the latest version > of Sundials from PETSc? > The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now. We would be happy to get an MR for the updates if you want to try it. > We are interested in doing chemistry using GPUs and already have > interfaces to PETSc from our code. > How does the GPU interest interact with the SUNDIALS version? Thanks, Matt > Thanks, > Marcos > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Oct 16 14:11:37 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 16 Oct 2023 14:11:37 -0500 (CDT) Subject: [petsc-users] Using Sundials from PETSc In-Reply-To: References: Message-ID: <459f4b88-e5da-2123-9fcd-b5ab9653c0b6@mcs.anl.gov> I'll note - current sundials release has some interfaces to petsc functionality Satish On Mon, 16 Oct 2023, Matthew Knepley wrote: > On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Hi, we were wondering if it would be possible to call the latest version > > of Sundials from PETSc? > > > > The short answer is, no. We are at v2.5 and they are at v6.5. There were no > dates on the version history page, so I do not know how out of date we are. > There have not been any requests for update until now. > > We would be happy to get an MR for the updates if you want to try it. > > > > We are interested in doing chemistry using GPUs and already have > > interfaces to PETSc from our code. > > > > How does the GPU interest interact with the SUNDIALS version? > > Thanks, > > Matt > > > > Thanks, > > Marcos > > > > > From junchao.zhang at gmail.com Mon Oct 16 14:24:28 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 16 Oct 2023 14:24:28 -0500 Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface In-Reply-To: References: Message-ID: Hi, Philip, That branch was merged to petsc/main today. Let me know once you have new profiling results. Thanks. --Junchao Zhang On Mon, Oct 16, 2023 at 9:33?AM Fackler, Philip wrote: > Junchao, > > I've attached updated timing plots (red and blue are swapped from before; > yellow is the new one). There is an improvement for the NE_3 case only with > CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI > cases, MatShift doesn't show up (I assume because we're using different > preconditioner arguments). So, there must be some other primary culprit. > I'll try to get updated profiling data to you soon. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Fackler, Philip via Xolotl-psi-development < > xolotl-psi-development at lists.sourceforge.net> > *Sent:* Wednesday, October 11, 2023 11:31 > *To:* Junchao Zhang > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] > Unexpected performance losses switching to COO interface > > I'm on it. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Wednesday, October 11, 2023 10:14 > *To:* Fackler, Philip > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie < > sblondel at utk.edu> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses > switching to COO interface > > Hi, Philip, > Could you try this branch > jczhang/2023-10-05/feature-support-matshift-aijkokkos ? > > Thanks. > --Junchao Zhang > > > On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip wrote: > > Aha! That makes sense. Thank you. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang > *Sent:* Thursday, October 5, 2023 17:29 > *To:* Fackler, Philip > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie < > sblondel at utk.edu> > *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses > switching to COO interface > > Wait a moment, it seems it was because we do not have a GPU implementation > of MatShift... > Let me see how to add it. > --Junchao Zhang > > > On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang > wrote: > > Hi, Philip, > I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() > instead of the COO interface? MatSetValues() needs to copy the data from > device to host and thus is expensive. > Do you have profiling results with COO enabled? > > [image: Screenshot 2023-10-05 at 10.55.29?AM.png] > > > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang > wrote: > > Hi, Philip, > I will look into the tarballs and get back to you. > Thanks. > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > We finally have xolotl ported to use the new COO interface and the > aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port > to our previous version (using MatSetValuesStencil and the default Mat and > Vec implementations), we expected to see an improvement in performance for > both the "serial" and "cuda" builds (here I'm referring to the kokkos > configuration). > > Attached are two plots that show timings for three different cases. All of > these were run on Ascent (the Summit-like training system) with 6 MPI tasks > (on a single node). The CUDA cases were given one GPU per task (and used > CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases > we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent > as possible. > > The performance of RHSJacobian (where the bulk of computation happens in > xolotl) behaved basically as expected (better than expected in the serial > build). NE_3 case in CUDA was the only one that performed worse, but not > surprisingly, since its workload for the GPUs is much smaller. We've still > got more optimization to do on this. > > The real surprise was how much worse the overall solve times were. This > seems to be due simply to switching to the kokkos-based implementation. I'm > wondering if there are any changes we can make in configuration or runtime > arguments to help with PETSc's performance here. Any help looking into this > would be appreciated. > > The tarballs linked here > > and here > > are profiling databases which, once extracted, can be viewed with > hpcviewer. I don't know how helpful that will be, but hopefully it can give > you some direction. > > Thanks for your help, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Oct 16 15:07:58 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 16 Oct 2023 20:07:58 +0000 Subject: [petsc-users] Using Sundials from PETSc In-Reply-To: References: Message-ID: Hi Mathew, we have code that time splits the combustion step from the chemical species transport, so on each computational cell for each fluid flow time step, once transport is done we have the mixture chemical composition as initial condition. We are looking into doing finite rate chemistry with skeletal combustion models (20+ equations) in each cell for each fluid time step. Sundials provides the CVODE solver for the time integration of these, and would be interesting to see if we can make use of GPU acceleration. From their User Guide for Version 6.6.0 there are several GPU implementations for building RHS and using linear, nonlinear and stiff ODE solvers. Thank you Satish for the comment. Might be better at this point to first get an idea on what the implementation in our code using Sundials directly would look like. Then, we can see if it is possible and makes sense to access it through PETSc. We have things working in CPU making use of and older version of CVODE. BTW after some changes in our code we are starting running larger cases using GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced already. Thanks! ________________________________ From: Matthew Knepley Sent: Monday, October 16, 2023 3:03 PM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov ; Paul, Chandan (IntlAssoc) Subject: Re: [petsc-users] Using Sundials from PETSc On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc? The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now. We would be happy to get an MR for the updates if you want to try it. We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code. How does the GPU interest interact with the SUNDIALS version? Thanks, Matt Thanks, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 16 15:31:14 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Oct 2023 16:31:14 -0400 Subject: [petsc-users] Using Sundials from PETSc In-Reply-To: References: Message-ID: On Mon, Oct 16, 2023 at 4:08?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Mathew, we have code that time splits the combustion step from the > chemical species transport, so on each computational cell for each fluid > flow time step, once transport is done we have the mixture chemical > composition as initial condition. We are looking into doing finite rate > chemistry with skeletal combustion models (20+ equations) in each cell for > each fluid time step. Sundials provides the CVODE solver for the time > integration of these, and would be interesting to see if we can make use of > GPU acceleration. From their User Guide for Version 6.6.0 there are several > GPU implementations for building RHS and using linear, nonlinear and stiff > ODE solvers. > We are doing a similar thing in CHREST (https://www.buffalo.edu/chrest.html). Since we normally use hundreds of species and thousands of reactions for the reduced mechanism, we are using TChem2 to build and solve the system in each cell. Since these systems are so small, you are likely to need some way of batching them within a warp. Do you have an idea for this already? Thanks, Matt > Thank you Satish for the comment. Might be better at this point to first > get an idea on what the implementation in our code using Sundials directly > would look like. Then, we can see if it is possible and makes sense to > access it through PETSc. > We have things working in CPU making use of and older version of CVODE. > > BTW after some changes in our code we are starting running larger cases > using GPU accelerated iterative solvers from PETSc, so we have PETSc > interfaced already. > > Thanks! > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Monday, October 16, 2023 3:03 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Paul, Chandan > (IntlAssoc) > *Subject:* Re: [petsc-users] Using Sundials from PETSc > > On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, we were wondering if it would be possible to call the latest version > of Sundials from PETSc? > > > The short answer is, no. We are at v2.5 and they are at v6.5. There were > no dates on the version history page, so I do not know how out of date we > are. There have not been any requests for update until now. > > We would be happy to get an MR for the updates if you want to try it. > > > We are interested in doing chemistry using GPUs and already have > interfaces to PETSc from our code. > > > How does the GPU interest interact with the SUNDIALS version? > > Thanks, > > Matt > > > Thanks, > Marcos > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Oct 16 16:15:26 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 16 Oct 2023 21:15:26 +0000 Subject: [petsc-users] Using Sundials from PETSc In-Reply-To: References: Message-ID: Hi Matt, very interesting project you are working on. We haven't gone deep on how we would do this in GPUs and are starting to look at options. We will explore if it is possible to batch work needed for several cells within a thread group on the gpu. We use a single Cartesian mesh per MPI process (usually with 40^3 to 50^3 cells). Something I implemented to avoid the MPI process over-subscription of GPU with PETSc solvers was to cluster several MPI Processes per GPU on resource sets. Then, the processes in the set would pass matrix (at setup) and RHS to a single process (set master) which communicates with the GPU. The GPU solution is then brought back to the set master which distributes it to the MPI processes in the set as needed. So, only a set of processes as large as the number of GPUs in the calculation (with their own MPI communicator) call the PETSc matrix and vector building, and solve routines. The neat thing is that all MPI communications are local to the node. This idea is not new, it was developed by the researchers at GWU that interfaced PETSc to AMGx back when there were no native GPU solvers in PETSc, HYPRE and other libs (~2016). Best, Marcos ________________________________ From: Matthew Knepley Sent: Monday, October 16, 2023 4:31 PM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov ; Paul, Chandan (IntlAssoc) Subject: Re: [petsc-users] Using Sundials from PETSc On Mon, Oct 16, 2023 at 4:08?PM Vanella, Marcos (Fed) > wrote: Hi Mathew, we have code that time splits the combustion step from the chemical species transport, so on each computational cell for each fluid flow time step, once transport is done we have the mixture chemical composition as initial condition. We are looking into doing finite rate chemistry with skeletal combustion models (20+ equations) in each cell for each fluid time step. Sundials provides the CVODE solver for the time integration of these, and would be interesting to see if we can make use of GPU acceleration. From their User Guide for Version 6.6.0 there are several GPU implementations for building RHS and using linear, nonlinear and stiff ODE solvers. We are doing a similar thing in CHREST (https://www.buffalo.edu/chrest.html). Since we normally use hundreds of species and thousands of reactions for the reduced mechanism, we are using TChem2 to build and solve the system in each cell. Since these systems are so small, you are likely to need some way of batching them within a warp. Do you have an idea for this already? Thanks, Matt Thank you Satish for the comment. Might be better at this point to first get an idea on what the implementation in our code using Sundials directly would look like. Then, we can see if it is possible and makes sense to access it through PETSc. We have things working in CPU making use of and older version of CVODE. BTW after some changes in our code we are starting running larger cases using GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced already. Thanks! ________________________________ From: Matthew Knepley > Sent: Monday, October 16, 2023 3:03 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov >; Paul, Chandan (IntlAssoc) > Subject: Re: [petsc-users] Using Sundials from PETSc On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc? The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now. We would be happy to get an MR for the updates if you want to try it. We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code. How does the GPU interest interact with the SUNDIALS version? Thanks, Matt Thanks, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Oct 17 13:31:15 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 17 Oct 2023 20:31:15 +0200 Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) In-Reply-To: References: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es> <442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es> Message-ID: Kenneth, I have worked a bit more on your example and put it in SLEPc https://gitlab.com/slepc/slepc/-/merge_requests/596 This version also has MATOP_DESTROY to avoid memory leaks. Thanks. Jose > El 12 oct 2023, a las 20:59, Kenneth C Hall escribi?: > > Jose, > > Thanks very much for this. I will give it a try and let you know how it works. > > Best regards, > Kenneth > > From: Jose E. Roman > Date: Thursday, October 12, 2023 at 2:12 PM > To: Kenneth C Hall > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda) > > I am attaching your example modified with the context stuff. > > With the PETSc branch that I indicated, now it works with NLEIGS, for instance: > > $ ./test_nep -nep_nleigs_ksp_type gmres -nep_nleigs_pc_type none -rg_interval_endpoints 0.2,1.1 -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_view -nep_error_relative ::ascii_info_detail > > And also other solvers such as SLP: > > $ ./test_nep -nep_type slp -nep_slp_ksp_type gmres -nep_slp_pc_type none -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_error_relative ::ascii_info_detail > > I will clean the example code an add it as a SLEPc example. > > Regards, > Jose > > > > El 11 oct 2023, a las 17:27, Kenneth C Hall escribi?: > > > > Jose, > > > > Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working. > > > > Thanks, and best regards, > > Kenneth > > From degregori at dkrz.de Wed Oct 18 04:54:43 2023 From: degregori at dkrz.de (Enrico) Date: Wed, 18 Oct 2023 11:54:43 +0200 Subject: [petsc-users] Coordinate format internal reordering Message-ID: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> Hello, I'm trying to use Petsc to solve a linear system in an application. I'm using the coordinate format to define the matrix and the vector (it should work better on GPU but at the moment every test is on CPU). After the call to VecSetValuesCOO, I've noticed that the vector is storing the data in a different way from my application. For example with two processes in the application process 0 owns cells 2, 3, 4 process 1 owns cells 0, 1, 5 But in the vector data structure of Petsc process 0 owns cells 0, 1, 2 process 1 owns cells 3, 4, 5 This is in principle not a big issue, but after solving the linear system I get the solution vector x and I want to get the values in the correct processes. Is there a way to get vector values from other processes or to get a mapping so that I can do it myself? Cheers, Enrico Degregori From yc17470 at connect.um.edu.mo Wed Oct 18 05:06:38 2023 From: yc17470 at connect.um.edu.mo (Gong Yujie) Date: Wed, 18 Oct 2023 10:06:38 +0000 Subject: [petsc-users] Error when installing PETSc Message-ID: Dear PETSc developers, I got an error message when installing PETSc with a clang compiler. Could you please help me find the problem? The configure.log is attached. Best Regards, Yujie Here is the detail of the error: ============================================================================================= Configuring PETSc to compile on your system ============================================================================================= ============================================================================================= ***** WARNING: Using default optimization C flags -g -O3 You might consider manually setting optimal optimization flags for your system with COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples ============================================================================================= ============================================================================================= ***** WARNING: Using default Cxx optimization flags -g -O3 You might consider manually setting optimal optimization flags for your system with CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples ============================================================================================= ============================================================================================= ***** WARNING: Using default FORTRAN optimization flags -O You might consider manually setting optimal optimization flags for your system with FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples ============================================================================================= ============================================================================================= Trying to download https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz for ============================================================================================= ============================================================================================= Running configure on OPENMPI; this may take several minutes ============================================================================================= ============================================================================================= Running make on OPENMPI; this may take several minutes ============================================================================================= ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Error running make; make install on OPENMPI ******************************************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1016438 bytes Desc: configure.log URL: From knepley at gmail.com Wed Oct 18 06:39:29 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 18 Oct 2023 07:39:29 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> Message-ID: On Wed, Oct 18, 2023 at 5:55?AM Enrico wrote: > Hello, > > I'm trying to use Petsc to solve a linear system in an application. I'm > using the coordinate format to define the matrix and the vector (it > should work better on GPU but at the moment every test is on CPU). After > the call to VecSetValuesCOO, I've noticed that the vector is storing the > data in a different way from my application. For example with two > processes in the application > > process 0 owns cells 2, 3, 4 > > process 1 owns cells 0, 1, 5 > > But in the vector data structure of Petsc > > process 0 owns cells 0, 1, 2 > > process 1 owns cells 3, 4, 5 > > This is in principle not a big issue, but after solving the linear > system I get the solution vector x and I want to get the values in the > correct processes. Is there a way to get vector values from other > processes or to get a mapping so that I can do it myself? > By definition, PETSc vectors and matrices own contiguous row blocks. If you want to have another, global ordering, we support that with https://petsc.org/main/manualpages/AO/ Thanks, Matt > Cheers, > Enrico Degregori > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 18 06:41:27 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 18 Oct 2023 07:41:27 -0400 Subject: [petsc-users] Error when installing PETSc In-Reply-To: References: Message-ID: On Wed, Oct 18, 2023 at 6:07?AM Gong Yujie wrote: > Dear PETSc developers, > > I got an error message when installing PETSc with a clang compiler. Could > you please help me find the problem? The configure.log is attached. > Your compiler segfaulted when compiling OpenMPI: Making all in mca/crs make[2]: Entering directory '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs' GENERATE opal_crs.7 CC base/crs_base_open.lo CC base/crs_base_close.lo CC base/crs_base_select.lo CC base/crs_base_fns.lo make[2]: Leaving directory '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs' make[1]: Leaving directory '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal'/bin/sh: line 7: 6327 Illegal instruction (core dumped) ../../../config/ make_manpage.pl --package-name='Open MPI' --package-version='4.1.0' --ompi-date='Dec 18, 2020' --opal-date='Dec 18, 2020' --orte-date='Dec 18, 2020' --input=opal_crs.7in --output=opal_crs.7 make[2]: *** [Makefile:2215: opal_crs.7] Error 132 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [Makefile:2383: all-recursive] Error 1 make: *** [Makefile:1901: all-recursive] Error 1 I suggest compiling MPICH instead. Thanks, Matt > Best Regards, > Yujie > > Here is the detail of the error: > > ============================================================================================= > > Configuring PETSc to compile on your system > > ============================================================================================= > ============================================================================================= > > ***** WARNING: Using default optimization C flags -g -O3 > You might consider manually setting optimal optimization flags for your > system with > COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for > examples > ============================================================================================= > > ============================================================================================= > > ***** WARNING: Using default Cxx optimization flags -g -O3 > > You might consider manually setting optimal optimization flags for your > system with > CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for > examples > ============================================================================================= > > ============================================================================================= > > ***** WARNING: Using default FORTRAN optimization flags -O > > You might consider manually setting optimal optimization flags for your > system with > FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for > examples > ============================================================================================= > > ============================================================================================= > > Trying to download > https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz > for > ============================================================================================= > > ============================================================================================= > > Running configure on OPENMPI; this may take several minutes > > ============================================================================================= > > ============================================================================================= > > Running make on OPENMPI; this may take several minutes > > ============================================================================================= > > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ------------------------------------------------------------------------------- > Error running make; make install on OPENMPI > > ******************************************************************************* > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Oct 18 07:15:36 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 18 Oct 2023 08:15:36 -0400 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: Message-ID: Hi Jeremy, I hope you don't mind putting this on the list (w/o data), but this is documentation and you are the second user that found regressions. Sorry for the churn. There is a lot here so we can iterate, but here is a pass at your questions. *** Using MIS-2 instead of square graph was motivated by setup cost/performance but on GPUs with some recent fixes in Kokkos (in a branch) square graph seems OK. My experience was that square graph is better in terms of quality and we have a power user, like you all, that found this also. So I switched the default back to square graph. Interesting that you found that MIS-2 (new method) could be faster, but it might be because the two methods coarsen at different rates and that can make a big difference. (the way to test would be to adjust parameters to get similar coarsen rates, but I digress) It's hard to understand the differences between these two methods in terms of aggregate quality so we need to just experiment and have options. *** As far as your thermal problem. There was a complaint that the eigen estimates for chebyshev smoother were not recomputed for nonlinear problems and I added an option to do that and turned it on by default: Use '-pc_gamg_recompute_esteig false' to get back to the original. (I should have turned it off by default) Now, if your problem is symmetric and you use CG to compute the eigen estimates there should be no difference. If you use CG to compute the eigen estimates in GAMG (and have GAMG give them to cheby, the default) that when you recompute the eigen estimates the cheby eigen estimator is used and that will use gmres by default unless you set the SPD property in your matrix. So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and -options_left) CG is a much better estimator for SPD. And I found that the cheby eigen estimator uses an LAPACK *eigen* method to compute the eigen bounds and GAMG uses a *singular value* method. The two give very different results on the lid driven cavity test (ex19). eigen is lower, which is safer but not optimal if it is too low. I have a branch to have cheby use the singular value method, but I don't plan on merging it (enough churn and I don't understand these differences). *** '-pc_gamg_low_memory_threshold_filter false' recovers the old filtering method. This is the default now because there is a bug in the (new) low memory filter. This bug is very rare and catastrophic. We are working on it and will turn it on by default when it's fixed. This does not affect the semantics of the solver, just work and memory complexity. *** As far as tet4 vs tet10, I would guess that tet4 wants more aggressive coarsening. The default is to do aggressive on one (1) level. You might want more levels for tet4. And the new MIS-k coarsening can use any k (default is 2) wth '-mat_coarsen_misk_distance k' (eg, k=3) I have not added hooks to have a more complex schedule to specify the method on each level. Thanks, Mark On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) < jeremy.theler-ext at ansys.com> wrote: > Hey Mark > > Regarding the changes in the coarsening algorithm in 3.20 with respect to > 3.19 in general we see that for some problems the MIS strategy gives and > overall performance which is slightly better and for some others it is > slightly worse than the "baseline" from 3.19. > We also saw that current main has switched back to the old square > coarsening algorithm by default, which again, in some cases is better and > in others is worse than 3.19 without any extra command-line option. > > Now what seems weird to us is that we have a test case which is a heat > conduction problem with radiation boundary conditions (so it is non linear) > using tet10 and we see > > 1. that in parallel v3.20 is way worse than v3.19, although the memory > usage is similar > 2. that petsc main (with no extra flags, just the defaults) recover > the 3.19 performance but memory usage is significantly larger > > > I tried using the -pc_gamg_low_memory_threshold_filter flag and the > results were the same. > > Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI > ranks. > Is there any explanation about these two points we are seeing? > Another weird finding is that if we use tet4 instead of tet10, v3.20 is > only 10% slower than the other two and main does not need more memory than > the other two. > > BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and main > should you be interested. > > Let me know if it is better to move this discussion into the PETSc mailing > list. > > Regards, > jeremy theler > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Oct 18 09:33:33 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 18 Oct 2023 09:33:33 -0500 (CDT) Subject: [petsc-users] Error when installing PETSc In-Reply-To: References: Message-ID: <38ad7e4e-1dc8-6dbf-c34d-1bbc6b8180fa@mcs.anl.gov> > Working directory: /home/tt/petsc-3.16.0 use latest petsc release - 3.20 > --with-fc=flang I don't think this ever worked. Use --with-fc=gfortran instead /opt/ohpc/pub/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.3.0/m4-1.4.19-lwqcw3hzoxoia5q6nzolylxaf5zevluk/bin/m4: internal error detected; please report this bug to : Illegal instruction You might need to report this to your admin who installed this spack package. They might need to rebuild spack for 'x86_64' instead of 'skylake_avx512' Or use a different m4 - say from /usr/bin - if you have it there. Satish On Wed, 18 Oct 2023, Matthew Knepley wrote: > On Wed, Oct 18, 2023 at 6:07?AM Gong Yujie > wrote: > > > Dear PETSc developers, > > > > I got an error message when installing PETSc with a clang compiler. Could > > you please help me find the problem? The configure.log is attached. > > > > Your compiler segfaulted when compiling OpenMPI: > > Making all in mca/crs > make[2]: Entering directory > '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs' > GENERATE opal_crs.7 > CC base/crs_base_open.lo > CC base/crs_base_close.lo > CC base/crs_base_select.lo > CC base/crs_base_fns.lo > make[2]: Leaving directory > '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs' > make[1]: Leaving directory > '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal'/bin/sh: > line 7: 6327 Illegal instruction (core dumped) ../../../config/ > make_manpage.pl --package-name='Open MPI' --package-version='4.1.0' > --ompi-date='Dec 18, 2020' --opal-date='Dec 18, 2020' --orte-date='Dec 18, > 2020' --input=opal_crs.7in --output=opal_crs.7 > make[2]: *** [Makefile:2215: opal_crs.7] Error 132 > make[2]: *** Waiting for unfinished jobs.... > make[1]: *** [Makefile:2383: all-recursive] Error 1 > make: *** [Makefile:1901: all-recursive] Error 1 > > I suggest compiling MPICH instead. > > Thanks, > > Matt > > > > Best Regards, > > Yujie > > > > Here is the detail of the error: > > > > ============================================================================================= > > > > Configuring PETSc to compile on your system > > > > ============================================================================================= > > ============================================================================================= > > > > ***** WARNING: Using default optimization C flags -g -O3 > > You might consider manually setting optimal optimization flags for your > > system with > > COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for > > examples > > ============================================================================================= > > > > ============================================================================================= > > > > ***** WARNING: Using default Cxx optimization flags -g -O3 > > > > You might consider manually setting optimal optimization flags for your > > system with > > CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for > > examples > > ============================================================================================= > > > > ============================================================================================= > > > > ***** WARNING: Using default FORTRAN optimization flags -O > > > > You might consider manually setting optimal optimization flags for your > > system with > > FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for > > examples > > ============================================================================================= > > > > ============================================================================================= > > > > Trying to download > > https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz > > for > > ============================================================================================= > > > > ============================================================================================= > > > > Running configure on OPENMPI; this may take several minutes > > > > ============================================================================================= > > > > ============================================================================================= > > > > Running make on OPENMPI; this may take several minutes > > > > ============================================================================================= > > > > > > ******************************************************************************* > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > > details): > > > > ------------------------------------------------------------------------------- > > Error running make; make install on OPENMPI > > > > ******************************************************************************* > > > > > > From degregori at dkrz.de Thu Oct 19 05:51:39 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 12:51:39 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> Message-ID: <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> Hello, if I create an application ordering using AOCreateBasic, should I provide the same array for const PetscInt myapp[] and const PetscInt mypetsc[] in order to get the same ordering of the application within PETSC? And once I define the ordering so that the local vector and matrix are defined in PETSC as in my application, how can I use it to create the actual vector and matrix? Thanks in advance for the help. Cheers, Enrico On 18/10/2023 13:39, Matthew Knepley wrote: > On Wed, Oct 18, 2023 at 5:55?AM Enrico > wrote: > > Hello, > > I'm trying to use Petsc to solve a linear system in an application. I'm > using the coordinate format to define the matrix and the vector (it > should work better on GPU but at the moment every test is on CPU). > After > the call to VecSetValuesCOO, I've noticed that the vector is storing > the > data in a different way from my application. For example with two > processes in the application > > process 0 owns cells 2, 3, 4 > > process 1 owns cells 0, 1, 5 > > But in the vector data structure of Petsc > > process 0 owns cells 0, 1, 2 > > process 1 owns cells 3, 4, 5 > > This is in principle not a big issue, but after solving the linear > system I get the solution vector x and I want to get the values in the > correct processes. Is there a way to get vector values from other > processes or to get a mapping so that I can do it myself? > > > By definition, PETSc vectors and matrices own contiguous row blocks. If > you want to have another, > global ordering, we support that with > https://petsc.org/main/manualpages/AO/ > > > ? Thanks, > > ? ? ?Matt > > Cheers, > Enrico Degregori > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Thu Oct 19 07:50:46 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 08:50:46 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> Message-ID: On Thu, Oct 19, 2023 at 6:51?AM Enrico wrote: > Hello, > > if I create an application ordering using AOCreateBasic, should I > provide the same array for const PetscInt myapp[] and const PetscInt > mypetsc[] in order to get the same ordering of the application within > PETSC? > Are you asking if the identity permutation can be constructed using the same array twice? Yes. > And once I define the ordering so that the local vector and matrix are > defined in PETSC as in my application, how can I use it to create the > actual vector and matrix? > The vectors and matrices do not change. The AO is a permutation. You can use it to permute a vector into another order, or to convert on index to another. Thanks, Matt > Thanks in advance for the help. > > Cheers, > Enrico > > On 18/10/2023 13:39, Matthew Knepley wrote: > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > wrote: > > > > Hello, > > > > I'm trying to use Petsc to solve a linear system in an application. > I'm > > using the coordinate format to define the matrix and the vector (it > > should work better on GPU but at the moment every test is on CPU). > > After > > the call to VecSetValuesCOO, I've noticed that the vector is storing > > the > > data in a different way from my application. For example with two > > processes in the application > > > > process 0 owns cells 2, 3, 4 > > > > process 1 owns cells 0, 1, 5 > > > > But in the vector data structure of Petsc > > > > process 0 owns cells 0, 1, 2 > > > > process 1 owns cells 3, 4, 5 > > > > This is in principle not a big issue, but after solving the linear > > system I get the solution vector x and I want to get the values in > the > > correct processes. Is there a way to get vector values from other > > processes or to get a mapping so that I can do it myself? > > > > > > By definition, PETSc vectors and matrices own contiguous row blocks. If > > you want to have another, > > global ordering, we support that with > > https://petsc.org/main/manualpages/AO/ > > > > > > Thanks, > > > > Matt > > > > Cheers, > > Enrico Degregori > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From degregori at dkrz.de Thu Oct 19 07:57:39 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 14:57:39 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> Message-ID: <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> Maybe I wasn't clear enough. I would like to completely get rid of Petsc ordering because I don't want extra communication between processes to construct the vector and the matrix (since I have to fill them every time step because I'm just using the linear solver with a Mat and a Vec data structure). I don't understand how I can do that. My initial idea was to create another global index ordering within my application to use only for the Petsc interface but then I think that the ghost cells are wrong. On 19/10/2023 14:50, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 6:51?AM Enrico > wrote: > > Hello, > > if I create an application ordering using AOCreateBasic, should I > provide the same array for const PetscInt myapp[] and const PetscInt > mypetsc[] in order to get the same ordering of the application > within PETSC? > > > Are you asking if the identity permutation can be constructed using the > same array twice? Yes. > > And once I define the ordering so that the local vector and matrix are > defined in PETSC as in my application, how can I use it to create the > actual vector and matrix? > > > The vectors and matrices do not change. The AO is a permutation. You can > use it to permute > a vector into another order, or to convert on index to another. > > ? Thanks, > > ? ? ? Matt > > Thanks in advance for the help. > > Cheers, > Enrico > > On 18/10/2023 13:39, Matthew Knepley wrote: > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > >> wrote: > > > >? ? ?Hello, > > > >? ? ?I'm trying to use Petsc to solve a linear system in an > application. I'm > >? ? ?using the coordinate format to define the matrix and the > vector (it > >? ? ?should work better on GPU but at the moment every test is on > CPU). > >? ? ?After > >? ? ?the call to VecSetValuesCOO, I've noticed that the vector is > storing > >? ? ?the > >? ? ?data in a different way from my application. For example with two > >? ? ?processes in the application > > > >? ? ?process 0 owns cells 2, 3, 4 > > > >? ? ?process 1 owns cells 0, 1, 5 > > > >? ? ?But in the vector data structure of Petsc > > > >? ? ?process 0 owns cells 0, 1, 2 > > > >? ? ?process 1 owns cells 3, 4, 5 > > > >? ? ?This is in principle not a big issue, but after solving the > linear > >? ? ?system I get the solution vector x and I want to get the > values in the > >? ? ?correct processes. Is there a way to get vector values from other > >? ? ?processes or to get a mapping so that I can do it myself? > > > > > > By definition, PETSc vectors and matrices own contiguous row > blocks. If > > you want to have another, > > global ordering, we support that with > > https://petsc.org/main/manualpages/AO/ > > > > > > > >? ? Thanks, > > > >? ? ? ?Matt > > > >? ? ?Cheers, > >? ? ?Enrico Degregori > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Thu Oct 19 08:25:40 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 09:25:40 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> Message-ID: On Thu, Oct 19, 2023 at 8:57?AM Enrico wrote: > Maybe I wasn't clear enough. I would like to completely get rid of Petsc > ordering because I don't want extra communication between processes to > construct the vector and the matrix (since I have to fill them every > time step because I'm just using the linear solver with a Mat and a Vec > data structure). I don't understand how I can do that. > Any program you write to do linear algebra will have contiguous storage because it is so much faster. Contiguous indexing makes sense for contiguous storage. If you want to use non-contiguous indexing for contiguous storage, you would need some translation layer. The AO is such a translation, but you could do this any way you want. Thanks, Matt > My initial idea was to create another global index ordering within my > application to use only for the Petsc interface but then I think that > the ghost cells are wrong. > > On 19/10/2023 14:50, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > wrote: > > > > Hello, > > > > if I create an application ordering using AOCreateBasic, should I > > provide the same array for const PetscInt myapp[] and const PetscInt > > mypetsc[] in order to get the same ordering of the application > > within PETSC? > > > > > > Are you asking if the identity permutation can be constructed using the > > same array twice? Yes. > > > > And once I define the ordering so that the local vector and matrix > are > > defined in PETSC as in my application, how can I use it to create the > > actual vector and matrix? > > > > > > The vectors and matrices do not change. The AO is a permutation. You can > > use it to permute > > a vector into another order, or to convert on index to another. > > > > Thanks, > > > > Matt > > > > Thanks in advance for the help. > > > > Cheers, > > Enrico > > > > On 18/10/2023 13:39, Matthew Knepley wrote: > > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > > > >> wrote: > > > > > > Hello, > > > > > > I'm trying to use Petsc to solve a linear system in an > > application. I'm > > > using the coordinate format to define the matrix and the > > vector (it > > > should work better on GPU but at the moment every test is on > > CPU). > > > After > > > the call to VecSetValuesCOO, I've noticed that the vector is > > storing > > > the > > > data in a different way from my application. For example with > two > > > processes in the application > > > > > > process 0 owns cells 2, 3, 4 > > > > > > process 1 owns cells 0, 1, 5 > > > > > > But in the vector data structure of Petsc > > > > > > process 0 owns cells 0, 1, 2 > > > > > > process 1 owns cells 3, 4, 5 > > > > > > This is in principle not a big issue, but after solving the > > linear > > > system I get the solution vector x and I want to get the > > values in the > > > correct processes. Is there a way to get vector values from > other > > > processes or to get a mapping so that I can do it myself? > > > > > > > > > By definition, PETSc vectors and matrices own contiguous row > > blocks. If > > > you want to have another, > > > global ordering, we support that with > > > https://petsc.org/main/manualpages/AO/ > > > > > > > > > > > > > Thanks, > > > > > > Matt > > > > > > Cheers, > > > Enrico Degregori > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to > which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From degregori at dkrz.de Thu Oct 19 09:51:41 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 16:51:41 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> Message-ID: <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> In the application the storage is contiguous but the global indexing is not. I would like to use AO as a translation layer but I don't understand it. My case is actually simple even if it is in a large application, I have Mat A, Vec b and Vec x After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to the data and they are in the wrong ordering, so for example the first element of the solution array on process 0 belongs to process 1 in the application. Is it at this point that I should use the AO translation layer? This would be quite bad, it means to build Mat A and Vec b there is MPI communication and also to get the data of Vec x back in the application. Anyway, I've tried to use AOPetscToApplicationPermuteReal on the solution array but it doesn't work as I would like. Is this function suppose to do MPI communication between processes and fetch the values of the application ordering? Cheers, Enrico On 19/10/2023 15:25, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 8:57?AM Enrico > wrote: > > Maybe I wasn't clear enough. I would like to completely get rid of > Petsc > ordering because I don't want extra communication between processes to > construct the vector and the matrix (since I have to fill them every > time step because I'm just using the linear solver with a Mat and a Vec > data structure). I don't understand how I can do that. > > > Any program you write to do linear algebra will have contiguous storage > because it > is so much faster. Contiguous indexing makes sense for contiguous > storage. If you > want to use non-contiguous indexing for contiguous storage, you would > need some > translation layer. The AO is such a translation, but you could do this > any way you want. > > ? Thanks, > > ? ? ?Matt > > My initial idea was to create another global index ordering within my > application to use only for the Petsc interface but then I think that > the ghost cells are wrong. > > On 19/10/2023 14:50, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > >> wrote: > > > >? ? ?Hello, > > > >? ? ?if I create an application ordering using AOCreateBasic, should I > >? ? ?provide the same array for const PetscInt myapp[] and const > PetscInt > >? ? ?mypetsc[] in order to get the same ordering of the application > >? ? ?within PETSC? > > > > > > Are you asking if the identity permutation can be constructed > using the > > same array twice? Yes. > > > >? ? ?And once I define the ordering so that the local vector and > matrix are > >? ? ?defined in PETSC as in my application, how can I use it to > create the > >? ? ?actual vector and matrix? > > > > > > The vectors and matrices do not change. The AO is a permutation. > You can > > use it to permute > > a vector into another order, or to convert on index to another. > > > >? ? Thanks, > > > >? ? ? ? Matt > > > >? ? ?Thanks in advance for the help. > > > >? ? ?Cheers, > >? ? ?Enrico > > > >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote: > >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico > >? ? ?> > >? ? ? > > >>> wrote: > >? ? ? > > >? ? ? >? ? ?Hello, > >? ? ? > > >? ? ? >? ? ?I'm trying to use Petsc to solve a linear system in an > >? ? ?application. I'm > >? ? ? >? ? ?using the coordinate format to define the matrix and the > >? ? ?vector (it > >? ? ? >? ? ?should work better on GPU but at the moment every test > is on > >? ? ?CPU). > >? ? ? >? ? ?After > >? ? ? >? ? ?the call to VecSetValuesCOO, I've noticed that the > vector is > >? ? ?storing > >? ? ? >? ? ?the > >? ? ? >? ? ?data in a different way from my application. For > example with two > >? ? ? >? ? ?processes in the application > >? ? ? > > >? ? ? >? ? ?process 0 owns cells 2, 3, 4 > >? ? ? > > >? ? ? >? ? ?process 1 owns cells 0, 1, 5 > >? ? ? > > >? ? ? >? ? ?But in the vector data structure of Petsc > >? ? ? > > >? ? ? >? ? ?process 0 owns cells 0, 1, 2 > >? ? ? > > >? ? ? >? ? ?process 1 owns cells 3, 4, 5 > >? ? ? > > >? ? ? >? ? ?This is in principle not a big issue, but after > solving the > >? ? ?linear > >? ? ? >? ? ?system I get the solution vector x and I want to get the > >? ? ?values in the > >? ? ? >? ? ?correct processes. Is there a way to get vector values > from other > >? ? ? >? ? ?processes or to get a mapping so that I can do it myself? > >? ? ? > > >? ? ? > > >? ? ? > By definition, PETSc vectors and matrices own contiguous row > >? ? ?blocks. If > >? ? ? > you want to have another, > >? ? ? > global ordering, we support that with > >? ? ? > https://petsc.org/main/manualpages/AO/ > > >? ? ? > > >? ? ? > > >? ? ? >> > >? ? ? > > >? ? ? >? ? Thanks, > >? ? ? > > >? ? ? >? ? ? ?Matt > >? ? ? > > >? ? ? >? ? ?Cheers, > >? ? ? >? ? ?Enrico Degregori > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > -- > >? ? ? > What most experimenters take for granted before they begin > their > >? ? ? > experiments is infinitely more interesting than any > results to which > >? ? ? > their experiments lead. > >? ? ? > -- Norbert Wiener > >? ? ? > > >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? > >? ? ? >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Thu Oct 19 10:21:33 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 11:21:33 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> Message-ID: On Thu, Oct 19, 2023 at 10:51?AM Enrico wrote: > In the application the storage is contiguous but the global indexing is > not. I would like to use AO as a translation layer but I don't > understand it. > Why would you choose to index differently from your storage? > My case is actually simple even if it is in a large application, I have > > Mat A, Vec b and Vec x > > After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to the > data and they are in the wrong ordering, so for example the first > element of the solution array on process 0 belongs to process 1 in the > application. > Again, this seems to be a poor choice of layout. What we typically do is to partition the data into chunks owned by each process first. > Is it at this point that I should use the AO translation layer? This > would be quite bad, it means to build Mat A and Vec b there is MPI > communication and also to get the data of Vec x back in the application. > If you want to store data that process i updates on process j, this will need communication. > Anyway, I've tried to use AOPetscToApplicationPermuteReal on the > solution array but it doesn't work as I would like. Is this function > suppose to do MPI communication between processes and fetch the values > of the application ordering? > There is no communication here. That function call just changes one integer into another. If you want to update values on another process, we recommend using VecScatter() or MatSetValues(), both of which take global indices and do communication if necessary. Thanks, Matt > Cheers, > Enrico > > On 19/10/2023 15:25, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 8:57?AM Enrico > > wrote: > > > > Maybe I wasn't clear enough. I would like to completely get rid of > > Petsc > > ordering because I don't want extra communication between processes > to > > construct the vector and the matrix (since I have to fill them every > > time step because I'm just using the linear solver with a Mat and a > Vec > > data structure). I don't understand how I can do that. > > > > > > Any program you write to do linear algebra will have contiguous storage > > because it > > is so much faster. Contiguous indexing makes sense for contiguous > > storage. If you > > want to use non-contiguous indexing for contiguous storage, you would > > need some > > translation layer. The AO is such a translation, but you could do this > > any way you want. > > > > Thanks, > > > > Matt > > > > My initial idea was to create another global index ordering within my > > application to use only for the Petsc interface but then I think that > > the ghost cells are wrong. > > > > On 19/10/2023 14:50, Matthew Knepley wrote: > > > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > > > >> wrote: > > > > > > Hello, > > > > > > if I create an application ordering using AOCreateBasic, > should I > > > provide the same array for const PetscInt myapp[] and const > > PetscInt > > > mypetsc[] in order to get the same ordering of the application > > > within PETSC? > > > > > > > > > Are you asking if the identity permutation can be constructed > > using the > > > same array twice? Yes. > > > > > > And once I define the ordering so that the local vector and > > matrix are > > > defined in PETSC as in my application, how can I use it to > > create the > > > actual vector and matrix? > > > > > > > > > The vectors and matrices do not change. The AO is a permutation. > > You can > > > use it to permute > > > a vector into another order, or to convert on index to another. > > > > > > Thanks, > > > > > > Matt > > > > > > Thanks in advance for the help. > > > > > > Cheers, > > > Enrico > > > > > > On 18/10/2023 13:39, Matthew Knepley wrote: > > > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > > > > > > > > > > >>> wrote: > > > > > > > > Hello, > > > > > > > > I'm trying to use Petsc to solve a linear system in an > > > application. I'm > > > > using the coordinate format to define the matrix and > the > > > vector (it > > > > should work better on GPU but at the moment every test > > is on > > > CPU). > > > > After > > > > the call to VecSetValuesCOO, I've noticed that the > > vector is > > > storing > > > > the > > > > data in a different way from my application. For > > example with two > > > > processes in the application > > > > > > > > process 0 owns cells 2, 3, 4 > > > > > > > > process 1 owns cells 0, 1, 5 > > > > > > > > But in the vector data structure of Petsc > > > > > > > > process 0 owns cells 0, 1, 2 > > > > > > > > process 1 owns cells 3, 4, 5 > > > > > > > > This is in principle not a big issue, but after > > solving the > > > linear > > > > system I get the solution vector x and I want to get > the > > > values in the > > > > correct processes. Is there a way to get vector values > > from other > > > > processes or to get a mapping so that I can do it > myself? > > > > > > > > > > > > By definition, PETSc vectors and matrices own contiguous > row > > > blocks. If > > > > you want to have another, > > > > global ordering, we support that with > > > > https://petsc.org/main/manualpages/AO/ > > > > > > > > > > > > > > > > >> > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Cheers, > > > > Enrico Degregori > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin > > their > > > > experiments is infinitely more interesting than any > > results to which > > > > their experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > >> > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to > which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From degregori at dkrz.de Thu Oct 19 10:33:16 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 17:33:16 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> Message-ID: The layout is not poor, just the global indices are not contiguous,this has nothing to do with the local memory layout which is extremely optimized for different architectures. I can not change the layout anyway because it's a climate model with a million lines of code. I don't understand why Petsc is doing all this MPI communication under the hood. I mean, it is changing the layout of the application and doing a lot of communication. Is there no way to force the same layout and provide info about how to do the halo exchange? In this way I can have the same memory layout and there is no communication when I fill or fetch the vectors and the matrix. Cheers, Enrico On 19/10/2023 17:21, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 10:51?AM Enrico > wrote: > > In the application the storage is contiguous but the global indexing is > not. I would like to use AO as a translation layer but I don't > understand it. > > > Why would you choose to index differently from your storage? > > My case is actually simple even if it is in a large application, I have > > Mat A, Vec b and Vec x > > After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to > the > data and they are in the wrong ordering, so for example the first > element of the solution array on process 0 belongs to process 1 in the > application. > > > Again, this seems to be a poor choice of layout. What we typically do is > to partition > the data into chunks owned by each process first. > > Is it at this point that I should use the AO translation layer? This > would be quite bad, it means to build Mat A and Vec b there is MPI > communication and also to get the data of Vec x back in the application. > > > If you want to store data that process i updates on process j, this will > need communication. > > Anyway, I've tried to use AOPetscToApplicationPermuteReal on the > solution array but it doesn't work as I would like. Is this function > suppose to do MPI communication between processes and fetch the values > of the application ordering? > > > There is no communication here. That function call just changes one > integer into another. > If you want to update values on another process, we recommend using > VecScatter() or > MatSetValues(), both of which take global indices and do communication > if necessary. > > ? Thanks, > > ? ? Matt > > Cheers, > Enrico > > On 19/10/2023 15:25, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 8:57?AM Enrico > > >> wrote: > > > >? ? ?Maybe I wasn't clear enough. I would like to completely get > rid of > >? ? ?Petsc > >? ? ?ordering because I don't want extra communication between > processes to > >? ? ?construct the vector and the matrix (since I have to fill > them every > >? ? ?time step because I'm just using the linear solver with a Mat > and a Vec > >? ? ?data structure). I don't understand how I can do that. > > > > > > Any program you write to do linear algebra will have contiguous > storage > > because it > > is so much faster. Contiguous indexing makes sense for contiguous > > storage. If you > > want to use non-contiguous indexing for contiguous storage, you > would > > need some > > translation layer. The AO is such a translation, but you could do > this > > any way you want. > > > >? ? Thanks, > > > >? ? ? ?Matt > > > >? ? ?My initial idea was to create another global index ordering > within my > >? ? ?application to use only for the Petsc interface but then I > think that > >? ? ?the ghost cells are wrong. > > > >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote: > >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico > >? ? ?> > >? ? ? > > >>> wrote: > >? ? ? > > >? ? ? >? ? ?Hello, > >? ? ? > > >? ? ? >? ? ?if I create an application ordering using > AOCreateBasic, should I > >? ? ? >? ? ?provide the same array for const PetscInt myapp[] and > const > >? ? ?PetscInt > >? ? ? >? ? ?mypetsc[] in order to get the same ordering of the > application > >? ? ? >? ? ?within PETSC? > >? ? ? > > >? ? ? > > >? ? ? > Are you asking if the identity permutation can be constructed > >? ? ?using the > >? ? ? > same array twice? Yes. > >? ? ? > > >? ? ? >? ? ?And once I define the ordering so that the local > vector and > >? ? ?matrix are > >? ? ? >? ? ?defined in PETSC as in my application, how can I use it to > >? ? ?create the > >? ? ? >? ? ?actual vector and matrix? > >? ? ? > > >? ? ? > > >? ? ? > The vectors and matrices do not change. The AO is a > permutation. > >? ? ?You can > >? ? ? > use it to permute > >? ? ? > a vector into another order, or to convert on index to > another. > >? ? ? > > >? ? ? >? ? Thanks, > >? ? ? > > >? ? ? >? ? ? ? Matt > >? ? ? > > >? ? ? >? ? ?Thanks in advance for the help. > >? ? ? > > >? ? ? >? ? ?Cheers, > >? ? ? >? ? ?Enrico > >? ? ? > > >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote: > >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>> wrote: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve a linear > system in an > >? ? ? >? ? ?application. I'm > >? ? ? >? ? ? >? ? ?using the coordinate format to define the > matrix and the > >? ? ? >? ? ?vector (it > >? ? ? >? ? ? >? ? ?should work better on GPU but at the moment > every test > >? ? ?is on > >? ? ? >? ? ?CPU). > >? ? ? >? ? ? >? ? ?After > >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've noticed that the > >? ? ?vector is > >? ? ? >? ? ?storing > >? ? ? >? ? ? >? ? ?the > >? ? ? >? ? ? >? ? ?data in a different way from my application. For > >? ? ?example with two > >? ? ? >? ? ? >? ? ?processes in the application > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?But in the vector data structure of Petsc > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?This is in principle not a big issue, but after > >? ? ?solving the > >? ? ? >? ? ?linear > >? ? ? >? ? ? >? ? ?system I get the solution vector x and I want > to get the > >? ? ? >? ? ?values in the > >? ? ? >? ? ? >? ? ?correct processes. Is there a way to get vector > values > >? ? ?from other > >? ? ? >? ? ? >? ? ?processes or to get a mapping so that I can do > it myself? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > By definition, PETSc vectors and matrices own > contiguous row > >? ? ? >? ? ?blocks. If > >? ? ? >? ? ? > you want to have another, > >? ? ? >? ? ? > global ordering, we support that with > >? ? ? >? ? ? > https://petsc.org/main/manualpages/AO/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ?Enrico Degregori > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > -- > >? ? ? >? ? ? > What most experimenters take for granted before > they begin > >? ? ?their > >? ? ? >? ? ? > experiments is infinitely more interesting than any > >? ? ?results to which > >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? > > >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > -- > >? ? ? > What most experimenters take for granted before they begin > their > >? ? ? > experiments is infinitely more interesting than any > results to which > >? ? ? > their experiments lead. > >? ? ? > -- Norbert Wiener > >? ? ? > > >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? > >? ? ? >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Thu Oct 19 10:36:45 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 11:36:45 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> Message-ID: On Thu, Oct 19, 2023 at 11:33?AM Enrico wrote: > The layout is not poor, just the global indices are not contiguous,this > has nothing to do with the local memory layout which is extremely > optimized for different architectures. I can not change the layout > anyway because it's a climate model with a million lines of code. > > I don't understand why Petsc is doing all this MPI communication under > the hood. I don't think we are communicating under the hood. > I mean, it is changing the layout of the application and doing > a lot of communication. We do not create the layout. The user creates the data layout when they create a vector or matrix. > Is there no way to force the same layout and > provide info about how to do the halo exchange? In this way I can have > the same memory layout and there is no communication when I fill or > fetch the vectors and the matrix. > Yes, you tell the vector/matrix your data layout when you create it. Thanks, Matt > Cheers, > Enrico > > On 19/10/2023 17:21, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 10:51?AM Enrico > > wrote: > > > > In the application the storage is contiguous but the global indexing > is > > not. I would like to use AO as a translation layer but I don't > > understand it. > > > > > > Why would you choose to index differently from your storage? > > > > My case is actually simple even if it is in a large application, I > have > > > > Mat A, Vec b and Vec x > > > > After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to > > the > > data and they are in the wrong ordering, so for example the first > > element of the solution array on process 0 belongs to process 1 in > the > > application. > > > > > > Again, this seems to be a poor choice of layout. What we typically do is > > to partition > > the data into chunks owned by each process first. > > > > Is it at this point that I should use the AO translation layer? This > > would be quite bad, it means to build Mat A and Vec b there is MPI > > communication and also to get the data of Vec x back in the > application. > > > > > > If you want to store data that process i updates on process j, this will > > need communication. > > > > Anyway, I've tried to use AOPetscToApplicationPermuteReal on the > > solution array but it doesn't work as I would like. Is this function > > suppose to do MPI communication between processes and fetch the > values > > of the application ordering? > > > > > > There is no communication here. That function call just changes one > > integer into another. > > If you want to update values on another process, we recommend using > > VecScatter() or > > MatSetValues(), both of which take global indices and do communication > > if necessary. > > > > Thanks, > > > > Matt > > > > Cheers, > > Enrico > > > > On 19/10/2023 15:25, Matthew Knepley wrote: > > > On Thu, Oct 19, 2023 at 8:57?AM Enrico > > > > >> wrote: > > > > > > Maybe I wasn't clear enough. I would like to completely get > > rid of > > > Petsc > > > ordering because I don't want extra communication between > > processes to > > > construct the vector and the matrix (since I have to fill > > them every > > > time step because I'm just using the linear solver with a Mat > > and a Vec > > > data structure). I don't understand how I can do that. > > > > > > > > > Any program you write to do linear algebra will have contiguous > > storage > > > because it > > > is so much faster. Contiguous indexing makes sense for contiguous > > > storage. If you > > > want to use non-contiguous indexing for contiguous storage, you > > would > > > need some > > > translation layer. The AO is such a translation, but you could do > > this > > > any way you want. > > > > > > Thanks, > > > > > > Matt > > > > > > My initial idea was to create another global index ordering > > within my > > > application to use only for the Petsc interface but then I > > think that > > > the ghost cells are wrong. > > > > > > On 19/10/2023 14:50, Matthew Knepley wrote: > > > > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > > > > > > > > > > >>> wrote: > > > > > > > > Hello, > > > > > > > > if I create an application ordering using > > AOCreateBasic, should I > > > > provide the same array for const PetscInt myapp[] and > > const > > > PetscInt > > > > mypetsc[] in order to get the same ordering of the > > application > > > > within PETSC? > > > > > > > > > > > > Are you asking if the identity permutation can be > constructed > > > using the > > > > same array twice? Yes. > > > > > > > > And once I define the ordering so that the local > > vector and > > > matrix are > > > > defined in PETSC as in my application, how can I use > it to > > > create the > > > > actual vector and matrix? > > > > > > > > > > > > The vectors and matrices do not change. The AO is a > > permutation. > > > You can > > > > use it to permute > > > > a vector into another order, or to convert on index to > > another. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Thanks in advance for the help. > > > > > > > > Cheers, > > > > Enrico > > > > > > > > On 18/10/2023 13:39, Matthew Knepley wrote: > > > > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > > > > > > > > > > > >> > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > > Hello, > > > > > > > > > > I'm trying to use Petsc to solve a linear > > system in an > > > > application. I'm > > > > > using the coordinate format to define the > > matrix and the > > > > vector (it > > > > > should work better on GPU but at the moment > > every test > > > is on > > > > CPU). > > > > > After > > > > > the call to VecSetValuesCOO, I've noticed that > the > > > vector is > > > > storing > > > > > the > > > > > data in a different way from my application. For > > > example with two > > > > > processes in the application > > > > > > > > > > process 0 owns cells 2, 3, 4 > > > > > > > > > > process 1 owns cells 0, 1, 5 > > > > > > > > > > But in the vector data structure of Petsc > > > > > > > > > > process 0 owns cells 0, 1, 2 > > > > > > > > > > process 1 owns cells 3, 4, 5 > > > > > > > > > > This is in principle not a big issue, but after > > > solving the > > > > linear > > > > > system I get the solution vector x and I want > > to get the > > > > values in the > > > > > correct processes. Is there a way to get vector > > values > > > from other > > > > > processes or to get a mapping so that I can do > > it myself? > > > > > > > > > > > > > > > By definition, PETSc vectors and matrices own > > contiguous row > > > > blocks. If > > > > > you want to have another, > > > > > global ordering, we support that with > > > > > https://petsc.org/main/manualpages/AO/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > Cheers, > > > > > Enrico Degregori > > > > > > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before > > they begin > > > their > > > > > experiments is infinitely more interesting than any > > > results to which > > > > > their experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin > > their > > > > experiments is infinitely more interesting than any > > results to which > > > > their experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > >> > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to > which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From degregori at dkrz.de Thu Oct 19 11:28:17 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 18:28:17 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> Message-ID: I make an example. If I have a vector with global indices {0,1,2,3,4,5} and process 0 owns {2,3,4} while process 1 owns {0,1,5}, the resulting vector data structure on Petsc on process 0 owns {0,1,2}. This means that the points {0,1} has been sent from process 1 to process 0. I would like to have {2,3,4} on process 0 also in Petsc. Is it more clear now? On 19/10/2023 17:36, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 11:33?AM Enrico > wrote: > > The layout is not poor, just the global indices are not contiguous,this > has nothing to do with the local memory layout which is extremely > optimized for different architectures. I can not change the layout > anyway because it's a climate model with a million lines of code. > > I don't understand why Petsc is doing all this MPI communication under > the hood. > > > I don't think we are communicating under?the hood. > > I mean, it is changing the layout of the application and doing > a lot of communication. > > > We do not create the layout. The user creates the data layout when they > create a vector or matrix. > > Is there no way to force the same layout and > provide info about how to do the halo exchange? In this way I can have > the same memory layout and there is no communication when I fill or > fetch the vectors and the matrix. > > > Yes, you tell the vector/matrix your data layout when you create it. > > ? Thanks, > > ? ? ? Matt > > Cheers, > Enrico > > On 19/10/2023 17:21, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 10:51?AM Enrico > > >> wrote: > > > >? ? ?In the application the storage is contiguous but the global > indexing is > >? ? ?not. I would like to use AO as a translation layer but I don't > >? ? ?understand it. > > > > > > Why would you choose to index differently from your storage? > > > >? ? ?My case is actually simple even if it is in a large > application, I have > > > >? ? ?Mat A, Vec b and Vec x > > > >? ? ?After calling KSPSolve, I use VecGetArrayReadF90 to get a > pointer to > >? ? ?the > >? ? ?data and they are in the wrong ordering, so for example the first > >? ? ?element of the solution array on process 0 belongs to process > 1 in the > >? ? ?application. > > > > > > Again, this seems to be a poor choice of layout. What we > typically do is > > to partition > > the data into chunks owned by each process first. > > > >? ? ?Is it at this point that I should use the AO translation > layer? This > >? ? ?would be quite bad, it means to build Mat A and Vec b there > is MPI > >? ? ?communication and also to get the data of Vec x back in the > application. > > > > > > If you want to store data that process i updates on process j, > this will > > need communication. > > > >? ? ?Anyway, I've tried to use AOPetscToApplicationPermuteReal on the > >? ? ?solution array but it doesn't work as I would like. Is this > function > >? ? ?suppose to do MPI communication between processes and fetch > the values > >? ? ?of the application ordering? > > > > > > There is no communication here. That function call just changes one > > integer into another. > > If you want to update values on another process, we recommend using > > VecScatter() or > > MatSetValues(), both of which take global indices and do > communication > > if necessary. > > > >? ? Thanks, > > > >? ? ? Matt > > > >? ? ?Cheers, > >? ? ?Enrico > > > >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote: > >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico > >? ? ?> > >? ? ? > > >>> wrote: > >? ? ? > > >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to > completely get > >? ? ?rid of > >? ? ? >? ? ?Petsc > >? ? ? >? ? ?ordering because I don't want extra communication between > >? ? ?processes to > >? ? ? >? ? ?construct the vector and the matrix (since I have to fill > >? ? ?them every > >? ? ? >? ? ?time step because I'm just using the linear solver > with a Mat > >? ? ?and a Vec > >? ? ? >? ? ?data structure). I don't understand how I can do that. > >? ? ? > > >? ? ? > > >? ? ? > Any program you write to do linear algebra will have > contiguous > >? ? ?storage > >? ? ? > because it > >? ? ? > is so much faster. Contiguous indexing makes sense for > contiguous > >? ? ? > storage. If you > >? ? ? > want to use non-contiguous indexing for contiguous > storage, you > >? ? ?would > >? ? ? > need some > >? ? ? > translation layer. The AO is such a translation, but you > could do > >? ? ?this > >? ? ? > any way you want. > >? ? ? > > >? ? ? >? ? Thanks, > >? ? ? > > >? ? ? >? ? ? ?Matt > >? ? ? > > >? ? ? >? ? ?My initial idea was to create another global index > ordering > >? ? ?within my > >? ? ? >? ? ?application to use only for the Petsc interface but then I > >? ? ?think that > >? ? ? >? ? ?the ghost cells are wrong. > >? ? ? > > >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote: > >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>> wrote: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?if I create an application ordering using > >? ? ?AOCreateBasic, should I > >? ? ? >? ? ? >? ? ?provide the same array for const PetscInt > myapp[] and > >? ? ?const > >? ? ? >? ? ?PetscInt > >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same ordering of the > >? ? ?application > >? ? ? >? ? ? >? ? ?within PETSC? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Are you asking if the identity permutation can be > constructed > >? ? ? >? ? ?using the > >? ? ? >? ? ? > same array twice? Yes. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?And once I define the ordering so that the local > >? ? ?vector and > >? ? ? >? ? ?matrix are > >? ? ? >? ? ? >? ? ?defined in PETSC as in my application, how can > I use it to > >? ? ? >? ? ?create the > >? ? ? >? ? ? >? ? ?actual vector and matrix? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > The vectors and matrices do not change. The AO is a > >? ? ?permutation. > >? ? ? >? ? ?You can > >? ? ? >? ? ? > use it to permute > >? ? ? >? ? ? > a vector into another order, or to convert on index to > >? ? ?another. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? Matt > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Thanks in advance for the help. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ?Enrico > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote: > >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico > >? ? ? > > > >? ? ? >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>>> wrote: > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve a linear > >? ? ?system in an > >? ? ? >? ? ? >? ? ?application. I'm > >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to define the > >? ? ?matrix and the > >? ? ? >? ? ? >? ? ?vector (it > >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU but at the moment > >? ? ?every test > >? ? ? >? ? ?is on > >? ? ? >? ? ? >? ? ?CPU). > >? ? ? >? ? ? >? ? ? >? ? ?After > >? ? ? >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've > noticed that the > >? ? ? >? ? ?vector is > >? ? ? >? ? ? >? ? ?storing > >? ? ? >? ? ? >? ? ? >? ? ?the > >? ? ? >? ? ? >? ? ? >? ? ?data in a different way from my > application. For > >? ? ? >? ? ?example with two > >? ? ? >? ? ? >? ? ? >? ? ?processes in the application > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data structure of Petsc > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a big issue, > but after > >? ? ? >? ? ?solving the > >? ? ? >? ? ? >? ? ?linear > >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution vector x and I > want > >? ? ?to get the > >? ? ? >? ? ? >? ? ?values in the > >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is there a way to get > vector > >? ? ?values > >? ? ? >? ? ?from other > >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a mapping so that I > can do > >? ? ?it myself? > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and matrices own > >? ? ?contiguous row > >? ? ? >? ? ? >? ? ?blocks. If > >? ? ? >? ? ? >? ? ? > you want to have another, > >? ? ? >? ? ? >? ? ? > global ordering, we support that with > >? ? ? >? ? ? >? ? ? > https://petsc.org/main/manualpages/AO/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before > >? ? ?they begin > >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting > than any > >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > -- > >? ? ? >? ? ? > What most experimenters take for granted before > they begin > >? ? ?their > >? ? ? >? ? ? > experiments is infinitely more interesting than any > >? ? ?results to which > >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? > > >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > -- > >? ? ? > What most experimenters take for granted before they begin > their > >? ? ? > experiments is infinitely more interesting than any > results to which > >? ? ? > their experiments lead. > >? ? ? > -- Norbert Wiener > >? ? ? > > >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? > >? ? ? >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From degregori at dkrz.de Thu Oct 19 12:00:44 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 19:00:44 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> Message-ID: <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de> Here is a very very simple reproducer of my problem. It is a fortran program and it has to run with 2 processes. The output is: process 0 : xx_v( 1 ) = 0.000000000000000 process 0 : xx_v( 2 ) = 1.000000000000000 process 0 : xx_v( 3 ) = 2.000000000000000 process 1 : xx_v( 1 ) = 3.000000000000000 process 1 : xx_v( 2 ) = 4.000000000000000 process 1 : xx_v( 3 ) = 5.000000000000000 and I would like to have: process 0 : xx_v( 1 ) = 2.000000000000000 process 0 : xx_v( 2 ) = 3.000000000000000 process 0 : xx_v( 3 ) = 4.000000000000000 process 1 : xx_v( 1 ) = 0.000000000000000 process 1 : xx_v( 2 ) = 1.000000000000000 process 1 : xx_v( 3 ) = 5.000000000000000 How can I do that? program main #include use petscksp implicit none PetscErrorCode ierr PetscInt :: Psize = 6 integer :: Lsize PetscInt :: work_size PetscInt :: work_rank Vec :: b integer, allocatable, dimension(:) :: glb_index double precision, allocatable, dimension(:) :: array PetscScalar, pointer :: xx_v(:) integer :: i PetscCount :: csize CALL PetscInitialize(ierr) Lsize = 3 csize = Lsize allocate(glb_index(0:Lsize-1), array(0:Lsize-1)) CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr); CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr); if (work_rank == 0) then glb_index(0) = 2 glb_index(1) = 3 glb_index(2) = 4 array(0) = 2 array(1) = 3 array(2) = 4 else if (work_rank == 1) then glb_index(0) = 0 glb_index(1) = 1 glb_index(2) = 5 array(0) = 0 array(1) = 1 array(2) = 5 end if ! Create and fill rhs vector CALL VecCreate(PETSC_COMM_WORLD, b, ierr); CALL VecSetSizes(b, Lsize, Psize, ierr); CALL VecSetType(b, VECMPI, ierr); CALL VecSetPreallocationCOO(b, csize, glb_index, ierr) CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr) CALL VecGetArrayReadF90(b, xx_v, ierr) do i=1,Lsize write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i) end do CALL VecRestoreArrayReadF90(b, xx_v, ierr) deallocate(glb_index, array) CALL VecDestroy(b,ierr) CALL PetscFinalize(ierr) end program main On 19/10/2023 17:36, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 11:33?AM Enrico > wrote: > > The layout is not poor, just the global indices are not contiguous,this > has nothing to do with the local memory layout which is extremely > optimized for different architectures. I can not change the layout > anyway because it's a climate model with a million lines of code. > > I don't understand why Petsc is doing all this MPI communication under > the hood. > > > I don't think we are communicating under?the hood. > > I mean, it is changing the layout of the application and doing > a lot of communication. > > > We do not create the layout. The user creates the data layout when they > create a vector or matrix. > > Is there no way to force the same layout and > provide info about how to do the halo exchange? In this way I can have > the same memory layout and there is no communication when I fill or > fetch the vectors and the matrix. > > > Yes, you tell the vector/matrix your data layout when you create it. > > ? Thanks, > > ? ? ? Matt > > Cheers, > Enrico > > On 19/10/2023 17:21, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 10:51?AM Enrico > > >> wrote: > > > >? ? ?In the application the storage is contiguous but the global > indexing is > >? ? ?not. I would like to use AO as a translation layer but I don't > >? ? ?understand it. > > > > > > Why would you choose to index differently from your storage? > > > >? ? ?My case is actually simple even if it is in a large > application, I have > > > >? ? ?Mat A, Vec b and Vec x > > > >? ? ?After calling KSPSolve, I use VecGetArrayReadF90 to get a > pointer to > >? ? ?the > >? ? ?data and they are in the wrong ordering, so for example the first > >? ? ?element of the solution array on process 0 belongs to process > 1 in the > >? ? ?application. > > > > > > Again, this seems to be a poor choice of layout. What we > typically do is > > to partition > > the data into chunks owned by each process first. > > > >? ? ?Is it at this point that I should use the AO translation > layer? This > >? ? ?would be quite bad, it means to build Mat A and Vec b there > is MPI > >? ? ?communication and also to get the data of Vec x back in the > application. > > > > > > If you want to store data that process i updates on process j, > this will > > need communication. > > > >? ? ?Anyway, I've tried to use AOPetscToApplicationPermuteReal on the > >? ? ?solution array but it doesn't work as I would like. Is this > function > >? ? ?suppose to do MPI communication between processes and fetch > the values > >? ? ?of the application ordering? > > > > > > There is no communication here. That function call just changes one > > integer into another. > > If you want to update values on another process, we recommend using > > VecScatter() or > > MatSetValues(), both of which take global indices and do > communication > > if necessary. > > > >? ? Thanks, > > > >? ? ? Matt > > > >? ? ?Cheers, > >? ? ?Enrico > > > >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote: > >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico > >? ? ?> > >? ? ? > > >>> wrote: > >? ? ? > > >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to > completely get > >? ? ?rid of > >? ? ? >? ? ?Petsc > >? ? ? >? ? ?ordering because I don't want extra communication between > >? ? ?processes to > >? ? ? >? ? ?construct the vector and the matrix (since I have to fill > >? ? ?them every > >? ? ? >? ? ?time step because I'm just using the linear solver > with a Mat > >? ? ?and a Vec > >? ? ? >? ? ?data structure). I don't understand how I can do that. > >? ? ? > > >? ? ? > > >? ? ? > Any program you write to do linear algebra will have > contiguous > >? ? ?storage > >? ? ? > because it > >? ? ? > is so much faster. Contiguous indexing makes sense for > contiguous > >? ? ? > storage. If you > >? ? ? > want to use non-contiguous indexing for contiguous > storage, you > >? ? ?would > >? ? ? > need some > >? ? ? > translation layer. The AO is such a translation, but you > could do > >? ? ?this > >? ? ? > any way you want. > >? ? ? > > >? ? ? >? ? Thanks, > >? ? ? > > >? ? ? >? ? ? ?Matt > >? ? ? > > >? ? ? >? ? ?My initial idea was to create another global index > ordering > >? ? ?within my > >? ? ? >? ? ?application to use only for the Petsc interface but then I > >? ? ?think that > >? ? ? >? ? ?the ghost cells are wrong. > >? ? ? > > >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote: > >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>> wrote: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?if I create an application ordering using > >? ? ?AOCreateBasic, should I > >? ? ? >? ? ? >? ? ?provide the same array for const PetscInt > myapp[] and > >? ? ?const > >? ? ? >? ? ?PetscInt > >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same ordering of the > >? ? ?application > >? ? ? >? ? ? >? ? ?within PETSC? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Are you asking if the identity permutation can be > constructed > >? ? ? >? ? ?using the > >? ? ? >? ? ? > same array twice? Yes. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?And once I define the ordering so that the local > >? ? ?vector and > >? ? ? >? ? ?matrix are > >? ? ? >? ? ? >? ? ?defined in PETSC as in my application, how can > I use it to > >? ? ? >? ? ?create the > >? ? ? >? ? ? >? ? ?actual vector and matrix? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > The vectors and matrices do not change. The AO is a > >? ? ?permutation. > >? ? ? >? ? ?You can > >? ? ? >? ? ? > use it to permute > >? ? ? >? ? ? > a vector into another order, or to convert on index to > >? ? ?another. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? Matt > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Thanks in advance for the help. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ?Enrico > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote: > >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico > >? ? ? > > > >? ? ? >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>>> wrote: > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve a linear > >? ? ?system in an > >? ? ? >? ? ? >? ? ?application. I'm > >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to define the > >? ? ?matrix and the > >? ? ? >? ? ? >? ? ?vector (it > >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU but at the moment > >? ? ?every test > >? ? ? >? ? ?is on > >? ? ? >? ? ? >? ? ?CPU). > >? ? ? >? ? ? >? ? ? >? ? ?After > >? ? ? >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've > noticed that the > >? ? ? >? ? ?vector is > >? ? ? >? ? ? >? ? ?storing > >? ? ? >? ? ? >? ? ? >? ? ?the > >? ? ? >? ? ? >? ? ? >? ? ?data in a different way from my > application. For > >? ? ? >? ? ?example with two > >? ? ? >? ? ? >? ? ? >? ? ?processes in the application > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data structure of Petsc > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a big issue, > but after > >? ? ? >? ? ?solving the > >? ? ? >? ? ? >? ? ?linear > >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution vector x and I > want > >? ? ?to get the > >? ? ? >? ? ? >? ? ?values in the > >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is there a way to get > vector > >? ? ?values > >? ? ? >? ? ?from other > >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a mapping so that I > can do > >? ? ?it myself? > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and matrices own > >? ? ?contiguous row > >? ? ? >? ? ? >? ? ?blocks. If > >? ? ? >? ? ? >? ? ? > you want to have another, > >? ? ? >? ? ? >? ? ? > global ordering, we support that with > >? ? ? >? ? ? >? ? ? > https://petsc.org/main/manualpages/AO/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before > >? ? ?they begin > >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting > than any > >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > -- > >? ? ? >? ? ? > What most experimenters take for granted before > they begin > >? ? ?their > >? ? ? >? ? ? > experiments is infinitely more interesting than any > >? ? ?results to which > >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? > > >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > -- > >? ? ? > What most experimenters take for granted before they begin > their > >? ? ? > experiments is infinitely more interesting than any > results to which > >? ? ? > their experiments lead. > >? ? ? > -- Norbert Wiener > >? ? ? > > >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? > >? ? ? >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Thu Oct 19 12:43:17 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 13:43:17 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de> References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de> Message-ID: On Thu, Oct 19, 2023 at 1:00?PM Enrico wrote: > Here is a very very simple reproducer of my problem. It is a fortran > program and it has to run with 2 processes. > You seem to be saying that you start with one partition of your data, but you would like another partition. For this, you have to initially communicate. For this I would use VecScatter. However, since most data is generated, I would consider not generating my data in that initial distribution. There are many examples in the repository. In the discretization of a PDE, we first divide the domain, then number up each piece, then assemble the linear algebra objects. Thanks, Matt > The output is: > > process 0 : xx_v( 1 ) = 0.000000000000000 > process 0 : xx_v( 2 ) = 1.000000000000000 > process 0 : xx_v( 3 ) = 2.000000000000000 > process 1 : xx_v( 1 ) = 3.000000000000000 > process 1 : xx_v( 2 ) = 4.000000000000000 > process 1 : xx_v( 3 ) = 5.000000000000000 > > and I would like to have: > > process 0 : xx_v( 1 ) = 2.000000000000000 > process 0 : xx_v( 2 ) = 3.000000000000000 > process 0 : xx_v( 3 ) = 4.000000000000000 > process 1 : xx_v( 1 ) = 0.000000000000000 > process 1 : xx_v( 2 ) = 1.000000000000000 > process 1 : xx_v( 3 ) = 5.000000000000000 > > How can I do that? > > program main > #include > use petscksp > implicit none > > PetscErrorCode ierr > PetscInt :: Psize = 6 > integer :: Lsize > PetscInt :: work_size > PetscInt :: work_rank > Vec :: b > integer, allocatable, dimension(:) :: glb_index > double precision, allocatable, dimension(:) :: array > PetscScalar, pointer :: xx_v(:) > integer :: i > PetscCount :: csize > > CALL PetscInitialize(ierr) > > Lsize = 3 > csize = Lsize > > allocate(glb_index(0:Lsize-1), array(0:Lsize-1)) > > CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr); > CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr); > if (work_rank == 0) then > glb_index(0) = 2 > glb_index(1) = 3 > glb_index(2) = 4 > array(0) = 2 > array(1) = 3 > array(2) = 4 > else if (work_rank == 1) then > glb_index(0) = 0 > glb_index(1) = 1 > glb_index(2) = 5 > array(0) = 0 > array(1) = 1 > array(2) = 5 > end if > > ! Create and fill rhs vector > CALL VecCreate(PETSC_COMM_WORLD, b, ierr); > CALL VecSetSizes(b, Lsize, Psize, ierr); > CALL VecSetType(b, VECMPI, ierr); > > CALL VecSetPreallocationCOO(b, csize, glb_index, ierr) > CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr) > > CALL VecGetArrayReadF90(b, xx_v, ierr) > > do i=1,Lsize > write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i) > end do > > CALL VecRestoreArrayReadF90(b, xx_v, ierr) > > deallocate(glb_index, array) > CALL VecDestroy(b,ierr) > > CALL PetscFinalize(ierr) > > end program main > > > On 19/10/2023 17:36, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 11:33?AM Enrico > > wrote: > > > > The layout is not poor, just the global indices are not > contiguous,this > > has nothing to do with the local memory layout which is extremely > > optimized for different architectures. I can not change the layout > > anyway because it's a climate model with a million lines of code. > > > > I don't understand why Petsc is doing all this MPI communication > under > > the hood. > > > > > > I don't think we are communicating under the hood. > > > > I mean, it is changing the layout of the application and doing > > a lot of communication. > > > > > > We do not create the layout. The user creates the data layout when they > > create a vector or matrix. > > > > Is there no way to force the same layout and > > provide info about how to do the halo exchange? In this way I can > have > > the same memory layout and there is no communication when I fill or > > fetch the vectors and the matrix. > > > > > > Yes, you tell the vector/matrix your data layout when you create it. > > > > Thanks, > > > > Matt > > > > Cheers, > > Enrico > > > > On 19/10/2023 17:21, Matthew Knepley wrote: > > > On Thu, Oct 19, 2023 at 10:51?AM Enrico > > > > >> wrote: > > > > > > In the application the storage is contiguous but the global > > indexing is > > > not. I would like to use AO as a translation layer but I don't > > > understand it. > > > > > > > > > Why would you choose to index differently from your storage? > > > > > > My case is actually simple even if it is in a large > > application, I have > > > > > > Mat A, Vec b and Vec x > > > > > > After calling KSPSolve, I use VecGetArrayReadF90 to get a > > pointer to > > > the > > > data and they are in the wrong ordering, so for example the > first > > > element of the solution array on process 0 belongs to process > > 1 in the > > > application. > > > > > > > > > Again, this seems to be a poor choice of layout. What we > > typically do is > > > to partition > > > the data into chunks owned by each process first. > > > > > > Is it at this point that I should use the AO translation > > layer? This > > > would be quite bad, it means to build Mat A and Vec b there > > is MPI > > > communication and also to get the data of Vec x back in the > > application. > > > > > > > > > If you want to store data that process i updates on process j, > > this will > > > need communication. > > > > > > Anyway, I've tried to use AOPetscToApplicationPermuteReal on > the > > > solution array but it doesn't work as I would like. Is this > > function > > > suppose to do MPI communication between processes and fetch > > the values > > > of the application ordering? > > > > > > > > > There is no communication here. That function call just changes > one > > > integer into another. > > > If you want to update values on another process, we recommend > using > > > VecScatter() or > > > MatSetValues(), both of which take global indices and do > > communication > > > if necessary. > > > > > > Thanks, > > > > > > Matt > > > > > > Cheers, > > > Enrico > > > > > > On 19/10/2023 15:25, Matthew Knepley wrote: > > > > On Thu, Oct 19, 2023 at 8:57?AM Enrico > > > > > > > > > > > >>> wrote: > > > > > > > > Maybe I wasn't clear enough. I would like to > > completely get > > > rid of > > > > Petsc > > > > ordering because I don't want extra communication > between > > > processes to > > > > construct the vector and the matrix (since I have to > fill > > > them every > > > > time step because I'm just using the linear solver > > with a Mat > > > and a Vec > > > > data structure). I don't understand how I can do that. > > > > > > > > > > > > Any program you write to do linear algebra will have > > contiguous > > > storage > > > > because it > > > > is so much faster. Contiguous indexing makes sense for > > contiguous > > > > storage. If you > > > > want to use non-contiguous indexing for contiguous > > storage, you > > > would > > > > need some > > > > translation layer. The AO is such a translation, but you > > could do > > > this > > > > any way you want. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > My initial idea was to create another global index > > ordering > > > within my > > > > application to use only for the Petsc interface but > then I > > > think that > > > > the ghost cells are wrong. > > > > > > > > On 19/10/2023 14:50, Matthew Knepley wrote: > > > > > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > > > > > > > > > > > >> > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > > Hello, > > > > > > > > > > if I create an application ordering using > > > AOCreateBasic, should I > > > > > provide the same array for const PetscInt > > myapp[] and > > > const > > > > PetscInt > > > > > mypetsc[] in order to get the same ordering of > the > > > application > > > > > within PETSC? > > > > > > > > > > > > > > > Are you asking if the identity permutation can be > > constructed > > > > using the > > > > > same array twice? Yes. > > > > > > > > > > And once I define the ordering so that the local > > > vector and > > > > matrix are > > > > > defined in PETSC as in my application, how can > > I use it to > > > > create the > > > > > actual vector and matrix? > > > > > > > > > > > > > > > The vectors and matrices do not change. The AO is a > > > permutation. > > > > You can > > > > > use it to permute > > > > > a vector into another order, or to convert on index > to > > > another. > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > Thanks in advance for the help. > > > > > > > > > > Cheers, > > > > > Enrico > > > > > > > > > > On 18/10/2023 13:39, Matthew Knepley wrote: > > > > > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > > > > > > > > > > > >> > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > >>>>> wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > I'm trying to use Petsc to solve a linear > > > system in an > > > > > application. I'm > > > > > > using the coordinate format to define the > > > matrix and the > > > > > vector (it > > > > > > should work better on GPU but at the > moment > > > every test > > > > is on > > > > > CPU). > > > > > > After > > > > > > the call to VecSetValuesCOO, I've > > noticed that the > > > > vector is > > > > > storing > > > > > > the > > > > > > data in a different way from my > > application. For > > > > example with two > > > > > > processes in the application > > > > > > > > > > > > process 0 owns cells 2, 3, 4 > > > > > > > > > > > > process 1 owns cells 0, 1, 5 > > > > > > > > > > > > But in the vector data structure of Petsc > > > > > > > > > > > > process 0 owns cells 0, 1, 2 > > > > > > > > > > > > process 1 owns cells 3, 4, 5 > > > > > > > > > > > > This is in principle not a big issue, > > but after > > > > solving the > > > > > linear > > > > > > system I get the solution vector x and I > > want > > > to get the > > > > > values in the > > > > > > correct processes. Is there a way to get > > vector > > > values > > > > from other > > > > > > processes or to get a mapping so that I > > can do > > > it myself? > > > > > > > > > > > > > > > > > > By definition, PETSc vectors and matrices own > > > contiguous row > > > > > blocks. If > > > > > > you want to have another, > > > > > > global ordering, we support that with > > > > > > https://petsc.org/main/manualpages/AO/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Matt > > > > > > > > > > > > Cheers, > > > > > > Enrico Degregori > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted > before > > > they begin > > > > their > > > > > > experiments is infinitely more interesting > > than any > > > > results to which > > > > > > their experiments lead. > > > > > > -- Norbert Wiener > > > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before > > they begin > > > their > > > > > experiments is infinitely more interesting than any > > > results to which > > > > > their experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin > > their > > > > experiments is infinitely more interesting than any > > results to which > > > > their experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > >> > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to > which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From degregori at dkrz.de Thu Oct 19 12:46:04 2023 From: degregori at dkrz.de (Enrico) Date: Thu, 19 Oct 2023 19:46:04 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de> Message-ID: <3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de> Sorry but I don't want another partition, Petsc internally is changing the partition. I would like to have the same partition that I have in the application. Is the example not clear? On 19/10/2023 19:43, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 1:00?PM Enrico > wrote: > > Here is a very very simple reproducer of my problem. It is a fortran > program and it has to run with 2 processes. > > > You seem to be saying that you start with one partition of your data, > but you would like > another partition. For this, you have to initially communicate. For this > I would use VecScatter. > However, since most data is generated, I would consider not generating > my data in that initial > distribution. > > There are many examples in the repository. In the discretization of a > PDE, we first?divide the domain, > then number up each piece, then assemble the linear algebra objects. > > ? Thanks, > > ? ? ? Matt > > The output is: > > ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) =? ? ?0.000000000000000 > ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) =? ? ?1.000000000000000 > ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) =? ? ?2.000000000000000 > ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) =? ? ?3.000000000000000 > ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) =? ? ?4.000000000000000 > ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) =? ? ?5.000000000000000 > > and I would like to have: > > ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) =? ? ?2.000000000000000 > ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) =? ? ?3.000000000000000 > ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) =? ? ?4.000000000000000 > ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) =? ? ?0.000000000000000 > ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) =? ? ?1.000000000000000 > ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) =? ? ?5.000000000000000 > > How can I do that? > > program main > #include > ? ? ?use petscksp > ? ? ?implicit none > > ? ? ?PetscErrorCode ierr > ? ? ?PetscInt? :: Psize = 6 > ? ? ?integer? :: Lsize > ? ? ?PetscInt? :: work_size > ? ? ?PetscInt? :: work_rank > ? ? ?Vec :: b > ? ? ?integer, allocatable, dimension(:) :: glb_index > ? ? ?double precision, allocatable, dimension(:) :: array > ? ? ?PetscScalar, pointer :: xx_v(:) > ? ? ?integer :: i > ? ? ?PetscCount :: csize > > ? ? ?CALL PetscInitialize(ierr) > > ? ? ?Lsize = 3 > ? ? ?csize = Lsize > > ? ? ?allocate(glb_index(0:Lsize-1), array(0:Lsize-1)) > > ? ? ?CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr); > ? ? ?CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr); > ? ? ?if (work_rank == 0) then > ? ? ? ?glb_index(0) = 2 > ? ? ? ?glb_index(1) = 3 > ? ? ? ?glb_index(2) = 4 > ? ? ? ?array(0) = 2 > ? ? ? ?array(1) = 3 > ? ? ? ?array(2) = 4 > ? ? ?else if (work_rank == 1) then > ? ? ? ?glb_index(0) = 0 > ? ? ? ?glb_index(1) = 1 > ? ? ? ?glb_index(2) = 5 > ? ? ? ?array(0) = 0 > ? ? ? ?array(1) = 1 > ? ? ? ?array(2) = 5 > ? ? ?end if > > ? ? ?! Create and fill rhs vector > ? ? ?CALL VecCreate(PETSC_COMM_WORLD, b, ierr); > ? ? ?CALL VecSetSizes(b, Lsize, Psize, ierr); > ? ? ?CALL VecSetType(b, VECMPI, ierr); > > ? ? ?CALL VecSetPreallocationCOO(b, csize, glb_index, ierr) > ? ? ?CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr) > > ? ? ?CALL VecGetArrayReadF90(b, xx_v, ierr) > > ? ? ?do i=1,Lsize > ? ? ? ?write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i) > ? ? ?end do > > ? ? ?CALL VecRestoreArrayReadF90(b, xx_v, ierr) > > ? ? ?deallocate(glb_index, array) > ? ? ?CALL VecDestroy(b,ierr) > > ? ? ?CALL PetscFinalize(ierr) > > end program main > > > On 19/10/2023 17:36, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 11:33?AM Enrico > > >> wrote: > > > >? ? ?The layout is not poor, just the global indices are not > contiguous,this > >? ? ?has nothing to do with the local memory layout which is extremely > >? ? ?optimized for different architectures. I can not change the > layout > >? ? ?anyway because it's a climate model with a million lines of code. > > > >? ? ?I don't understand why Petsc is doing all this MPI > communication under > >? ? ?the hood. > > > > > > I don't think we are communicating under?the hood. > > > >? ? ?I mean, it is changing the layout of the application and doing > >? ? ?a lot of communication. > > > > > > We do not create the layout. The user creates the data layout > when they > > create a vector or matrix. > > > >? ? ?Is there no way to force the same layout and > >? ? ?provide info about how to do the halo exchange? In this way I > can have > >? ? ?the same memory layout and there is no communication when I > fill or > >? ? ?fetch the vectors and the matrix. > > > > > > Yes, you tell the vector/matrix your data layout when you create it. > > > >? ? Thanks, > > > >? ? ? ? Matt > > > >? ? ?Cheers, > >? ? ?Enrico > > > >? ? ?On 19/10/2023 17:21, Matthew Knepley wrote: > >? ? ? > On Thu, Oct 19, 2023 at 10:51?AM Enrico > >? ? ?> > >? ? ? > > >>> wrote: > >? ? ? > > >? ? ? >? ? ?In the application the storage is contiguous but the > global > >? ? ?indexing is > >? ? ? >? ? ?not. I would like to use AO as a translation layer but > I don't > >? ? ? >? ? ?understand it. > >? ? ? > > >? ? ? > > >? ? ? > Why would you choose to index differently from your storage? > >? ? ? > > >? ? ? >? ? ?My case is actually simple even if it is in a large > >? ? ?application, I have > >? ? ? > > >? ? ? >? ? ?Mat A, Vec b and Vec x > >? ? ? > > >? ? ? >? ? ?After calling KSPSolve, I use VecGetArrayReadF90 to get a > >? ? ?pointer to > >? ? ? >? ? ?the > >? ? ? >? ? ?data and they are in the wrong ordering, so for > example the first > >? ? ? >? ? ?element of the solution array on process 0 belongs to > process > >? ? ?1 in the > >? ? ? >? ? ?application. > >? ? ? > > >? ? ? > > >? ? ? > Again, this seems to be a poor choice of layout. What we > >? ? ?typically do is > >? ? ? > to partition > >? ? ? > the data into chunks owned by each process first. > >? ? ? > > >? ? ? >? ? ?Is it at this point that I should use the AO translation > >? ? ?layer? This > >? ? ? >? ? ?would be quite bad, it means to build Mat A and Vec b > there > >? ? ?is MPI > >? ? ? >? ? ?communication and also to get the data of Vec x back > in the > >? ? ?application. > >? ? ? > > >? ? ? > > >? ? ? > If you want to store data that process i updates on process j, > >? ? ?this will > >? ? ? > need communication. > >? ? ? > > >? ? ? >? ? ?Anyway, I've tried to use > AOPetscToApplicationPermuteReal on the > >? ? ? >? ? ?solution array but it doesn't work as I would like. Is > this > >? ? ?function > >? ? ? >? ? ?suppose to do MPI communication between processes and > fetch > >? ? ?the values > >? ? ? >? ? ?of the application ordering? > >? ? ? > > >? ? ? > > >? ? ? > There is no communication here. That function call just > changes one > >? ? ? > integer into another. > >? ? ? > If you want to update values on another process, we > recommend using > >? ? ? > VecScatter() or > >? ? ? > MatSetValues(), both of which take global indices and do > >? ? ?communication > >? ? ? > if necessary. > >? ? ? > > >? ? ? >? ? Thanks, > >? ? ? > > >? ? ? >? ? ? Matt > >? ? ? > > >? ? ? >? ? ?Cheers, > >? ? ? >? ? ?Enrico > >? ? ? > > >? ? ? >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote: > >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>> wrote: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to > >? ? ?completely get > >? ? ? >? ? ?rid of > >? ? ? >? ? ? >? ? ?Petsc > >? ? ? >? ? ? >? ? ?ordering because I don't want extra > communication between > >? ? ? >? ? ?processes to > >? ? ? >? ? ? >? ? ?construct the vector and the matrix (since I > have to fill > >? ? ? >? ? ?them every > >? ? ? >? ? ? >? ? ?time step because I'm just using the linear solver > >? ? ?with a Mat > >? ? ? >? ? ?and a Vec > >? ? ? >? ? ? >? ? ?data structure). I don't understand how I can > do that. > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Any program you write to do linear algebra will have > >? ? ?contiguous > >? ? ? >? ? ?storage > >? ? ? >? ? ? > because it > >? ? ? >? ? ? > is so much faster. Contiguous indexing makes sense for > >? ? ?contiguous > >? ? ? >? ? ? > storage. If you > >? ? ? >? ? ? > want to use non-contiguous indexing for contiguous > >? ? ?storage, you > >? ? ? >? ? ?would > >? ? ? >? ? ? > need some > >? ? ? >? ? ? > translation layer. The AO is such a translation, > but you > >? ? ?could do > >? ? ? >? ? ?this > >? ? ? >? ? ? > any way you want. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?My initial idea was to create another global index > >? ? ?ordering > >? ? ? >? ? ?within my > >? ? ? >? ? ? >? ? ?application to use only for the Petsc interface > but then I > >? ? ? >? ? ?think that > >? ? ? >? ? ? >? ? ?the ghost cells are wrong. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote: > >? ? ? >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico > >? ? ? > > > >? ? ? >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>>> wrote: > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?if I create an application ordering using > >? ? ? >? ? ?AOCreateBasic, should I > >? ? ? >? ? ? >? ? ? >? ? ?provide the same array for const PetscInt > >? ? ?myapp[] and > >? ? ? >? ? ?const > >? ? ? >? ? ? >? ? ?PetscInt > >? ? ? >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same > ordering of the > >? ? ? >? ? ?application > >? ? ? >? ? ? >? ? ? >? ? ?within PETSC? > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > Are you asking if the identity permutation > can be > >? ? ?constructed > >? ? ? >? ? ? >? ? ?using the > >? ? ? >? ? ? >? ? ? > same array twice? Yes. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?And once I define the ordering so that > the local > >? ? ? >? ? ?vector and > >? ? ? >? ? ? >? ? ?matrix are > >? ? ? >? ? ? >? ? ? >? ? ?defined in PETSC as in my application, > how can > >? ? ?I use it to > >? ? ? >? ? ? >? ? ?create the > >? ? ? >? ? ? >? ? ? >? ? ?actual vector and matrix? > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > The vectors and matrices do not change. The > AO is a > >? ? ? >? ? ?permutation. > >? ? ? >? ? ? >? ? ?You can > >? ? ? >? ? ? >? ? ? > use it to permute > >? ? ? >? ? ? >? ? ? > a vector into another order, or to convert > on index to > >? ? ? >? ? ?another. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? Matt > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Thanks in advance for the help. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ? >? ? ?Enrico > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote: > >? ? ? >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico > >? ? ? >? ? ? > > > >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? > > > >? ? ? >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>>>> wrote: > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve > a linear > >? ? ? >? ? ?system in an > >? ? ? >? ? ? >? ? ? >? ? ?application. I'm > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to > define the > >? ? ? >? ? ?matrix and the > >? ? ? >? ? ? >? ? ? >? ? ?vector (it > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU but at > the moment > >? ? ? >? ? ?every test > >? ? ? >? ? ? >? ? ?is on > >? ? ? >? ? ? >? ? ? >? ? ?CPU). > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?After > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've > >? ? ?noticed that the > >? ? ? >? ? ? >? ? ?vector is > >? ? ? >? ? ? >? ? ? >? ? ?storing > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?data in a different way from my > >? ? ?application. For > >? ? ? >? ? ? >? ? ?example with two > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes in the application > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4 > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5 > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data structure > of Petsc > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2 > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5 > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a big issue, > >? ? ?but after > >? ? ? >? ? ? >? ? ?solving the > >? ? ? >? ? ? >? ? ? >? ? ?linear > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution vector > x and I > >? ? ?want > >? ? ? >? ? ?to get the > >? ? ? >? ? ? >? ? ? >? ? ?values in the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is there a way > to get > >? ? ?vector > >? ? ? >? ? ?values > >? ? ? >? ? ? >? ? ?from other > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a mapping so > that I > >? ? ?can do > >? ? ? >? ? ?it myself? > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and > matrices own > >? ? ? >? ? ?contiguous row > >? ? ? >? ? ? >? ? ? >? ? ?blocks. If > >? ? ? >? ? ? >? ? ? >? ? ? > you want to have another, > >? ? ? >? ? ? >? ? ? >? ? ? > global ordering, we support that with > >? ? ? >? ? ? >? ? ? >? ? ? > > https://petsc.org/main/manualpages/AO/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? >? ? ? > What most experimenters take for > granted before > >? ? ? >? ? ?they begin > >? ? ? >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? >? ? ? > experiments is infinitely more > interesting > >? ? ?than any > >? ? ? >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>> > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before > >? ? ?they begin > >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting > than any > >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > -- > >? ? ? >? ? ? > What most experimenters take for granted before > they begin > >? ? ?their > >? ? ? >? ? ? > experiments is infinitely more interesting than any > >? ? ?results to which > >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? > > >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > -- > >? ? ? > What most experimenters take for granted before they begin > their > >? ? ? > experiments is infinitely more interesting than any > results to which > >? ? ? > their experiments lead. > >? ? ? > -- Norbert Wiener > >? ? ? > > >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? > >? ? ? >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Thu Oct 19 13:41:43 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 14:41:43 -0400 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: <3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de> References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de> <3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de> Message-ID: On Thu, Oct 19, 2023 at 1:46?PM Enrico wrote: > Sorry but I don't want another partition, Petsc internally is changing > the partition. I would like to have the same partition that I have in > the application. Is the example not clear? > "Petsc internally is changing the partition" This is not correct. PETSc does not prescribe partitions. I refer to the documentation for the creation of Vec: https://petsc.org/main/manualpages/Vec/VecCreateMPI/ Here the user sets the local and global sizes. Since data is contiguous, these completely define the vector, and are under user control. Thanks, Matt > On 19/10/2023 19:43, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 1:00?PM Enrico > > wrote: > > > > Here is a very very simple reproducer of my problem. It is a fortran > > program and it has to run with 2 processes. > > > > > > You seem to be saying that you start with one partition of your data, > > but you would like > > another partition. For this, you have to initially communicate. For this > > I would use VecScatter. > > However, since most data is generated, I would consider not generating > > my data in that initial > > distribution. > > > > There are many examples in the repository. In the discretization of a > > PDE, we first divide the domain, > > then number up each piece, then assemble the linear algebra objects. > > > > Thanks, > > > > Matt > > > > The output is: > > > > process 0 : xx_v( 1 ) = > 0.000000000000000 > > process 0 : xx_v( 2 ) = > 1.000000000000000 > > process 0 : xx_v( 3 ) = > 2.000000000000000 > > process 1 : xx_v( 1 ) = > 3.000000000000000 > > process 1 : xx_v( 2 ) = > 4.000000000000000 > > process 1 : xx_v( 3 ) = > 5.000000000000000 > > > > and I would like to have: > > > > process 0 : xx_v( 1 ) = > 2.000000000000000 > > process 0 : xx_v( 2 ) = > 3.000000000000000 > > process 0 : xx_v( 3 ) = > 4.000000000000000 > > process 1 : xx_v( 1 ) = > 0.000000000000000 > > process 1 : xx_v( 2 ) = > 1.000000000000000 > > process 1 : xx_v( 3 ) = > 5.000000000000000 > > > > How can I do that? > > > > program main > > #include > > use petscksp > > implicit none > > > > PetscErrorCode ierr > > PetscInt :: Psize = 6 > > integer :: Lsize > > PetscInt :: work_size > > PetscInt :: work_rank > > Vec :: b > > integer, allocatable, dimension(:) :: glb_index > > double precision, allocatable, dimension(:) :: array > > PetscScalar, pointer :: xx_v(:) > > integer :: i > > PetscCount :: csize > > > > CALL PetscInitialize(ierr) > > > > Lsize = 3 > > csize = Lsize > > > > allocate(glb_index(0:Lsize-1), array(0:Lsize-1)) > > > > CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr); > > CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr); > > if (work_rank == 0) then > > glb_index(0) = 2 > > glb_index(1) = 3 > > glb_index(2) = 4 > > array(0) = 2 > > array(1) = 3 > > array(2) = 4 > > else if (work_rank == 1) then > > glb_index(0) = 0 > > glb_index(1) = 1 > > glb_index(2) = 5 > > array(0) = 0 > > array(1) = 1 > > array(2) = 5 > > end if > > > > ! Create and fill rhs vector > > CALL VecCreate(PETSC_COMM_WORLD, b, ierr); > > CALL VecSetSizes(b, Lsize, Psize, ierr); > > CALL VecSetType(b, VECMPI, ierr); > > > > CALL VecSetPreallocationCOO(b, csize, glb_index, ierr) > > CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr) > > > > CALL VecGetArrayReadF90(b, xx_v, ierr) > > > > do i=1,Lsize > > write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i) > > end do > > > > CALL VecRestoreArrayReadF90(b, xx_v, ierr) > > > > deallocate(glb_index, array) > > CALL VecDestroy(b,ierr) > > > > CALL PetscFinalize(ierr) > > > > end program main > > > > > > On 19/10/2023 17:36, Matthew Knepley wrote: > > > On Thu, Oct 19, 2023 at 11:33?AM Enrico > > > > >> wrote: > > > > > > The layout is not poor, just the global indices are not > > contiguous,this > > > has nothing to do with the local memory layout which is > extremely > > > optimized for different architectures. I can not change the > > layout > > > anyway because it's a climate model with a million lines of > code. > > > > > > I don't understand why Petsc is doing all this MPI > > communication under > > > the hood. > > > > > > > > > I don't think we are communicating under the hood. > > > > > > I mean, it is changing the layout of the application and doing > > > a lot of communication. > > > > > > > > > We do not create the layout. The user creates the data layout > > when they > > > create a vector or matrix. > > > > > > Is there no way to force the same layout and > > > provide info about how to do the halo exchange? In this way I > > can have > > > the same memory layout and there is no communication when I > > fill or > > > fetch the vectors and the matrix. > > > > > > > > > Yes, you tell the vector/matrix your data layout when you create > it. > > > > > > Thanks, > > > > > > Matt > > > > > > Cheers, > > > Enrico > > > > > > On 19/10/2023 17:21, Matthew Knepley wrote: > > > > On Thu, Oct 19, 2023 at 10:51?AM Enrico > > > > > > > > > > > >>> wrote: > > > > > > > > In the application the storage is contiguous but the > > global > > > indexing is > > > > not. I would like to use AO as a translation layer but > > I don't > > > > understand it. > > > > > > > > > > > > Why would you choose to index differently from your > storage? > > > > > > > > My case is actually simple even if it is in a large > > > application, I have > > > > > > > > Mat A, Vec b and Vec x > > > > > > > > After calling KSPSolve, I use VecGetArrayReadF90 to > get a > > > pointer to > > > > the > > > > data and they are in the wrong ordering, so for > > example the first > > > > element of the solution array on process 0 belongs to > > process > > > 1 in the > > > > application. > > > > > > > > > > > > Again, this seems to be a poor choice of layout. What we > > > typically do is > > > > to partition > > > > the data into chunks owned by each process first. > > > > > > > > Is it at this point that I should use the AO > translation > > > layer? This > > > > would be quite bad, it means to build Mat A and Vec b > > there > > > is MPI > > > > communication and also to get the data of Vec x back > > in the > > > application. > > > > > > > > > > > > If you want to store data that process i updates on > process j, > > > this will > > > > need communication. > > > > > > > > Anyway, I've tried to use > > AOPetscToApplicationPermuteReal on the > > > > solution array but it doesn't work as I would like. Is > > this > > > function > > > > suppose to do MPI communication between processes and > > fetch > > > the values > > > > of the application ordering? > > > > > > > > > > > > There is no communication here. That function call just > > changes one > > > > integer into another. > > > > If you want to update values on another process, we > > recommend using > > > > VecScatter() or > > > > MatSetValues(), both of which take global indices and do > > > communication > > > > if necessary. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Cheers, > > > > Enrico > > > > > > > > On 19/10/2023 15:25, Matthew Knepley wrote: > > > > > On Thu, Oct 19, 2023 at 8:57?AM Enrico > > > > > > > > > > > > >> > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > > Maybe I wasn't clear enough. I would like to > > > completely get > > > > rid of > > > > > Petsc > > > > > ordering because I don't want extra > > communication between > > > > processes to > > > > > construct the vector and the matrix (since I > > have to fill > > > > them every > > > > > time step because I'm just using the linear > solver > > > with a Mat > > > > and a Vec > > > > > data structure). I don't understand how I can > > do that. > > > > > > > > > > > > > > > Any program you write to do linear algebra will have > > > contiguous > > > > storage > > > > > because it > > > > > is so much faster. Contiguous indexing makes sense > for > > > contiguous > > > > > storage. If you > > > > > want to use non-contiguous indexing for contiguous > > > storage, you > > > > would > > > > > need some > > > > > translation layer. The AO is such a translation, > > but you > > > could do > > > > this > > > > > any way you want. > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > My initial idea was to create another global > index > > > ordering > > > > within my > > > > > application to use only for the Petsc interface > > but then I > > > > think that > > > > > the ghost cells are wrong. > > > > > > > > > > On 19/10/2023 14:50, Matthew Knepley wrote: > > > > > > On Thu, Oct 19, 2023 at 6:51?AM Enrico > > > > > > > > > > > > >> > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > >>>>> wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > if I create an application ordering using > > > > AOCreateBasic, should I > > > > > > provide the same array for const PetscInt > > > myapp[] and > > > > const > > > > > PetscInt > > > > > > mypetsc[] in order to get the same > > ordering of the > > > > application > > > > > > within PETSC? > > > > > > > > > > > > > > > > > > Are you asking if the identity permutation > > can be > > > constructed > > > > > using the > > > > > > same array twice? Yes. > > > > > > > > > > > > And once I define the ordering so that > > the local > > > > vector and > > > > > matrix are > > > > > > defined in PETSC as in my application, > > how can > > > I use it to > > > > > create the > > > > > > actual vector and matrix? > > > > > > > > > > > > > > > > > > The vectors and matrices do not change. The > > AO is a > > > > permutation. > > > > > You can > > > > > > use it to permute > > > > > > a vector into another order, or to convert > > on index to > > > > another. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Matt > > > > > > > > > > > > Thanks in advance for the help. > > > > > > > > > > > > Cheers, > > > > > > Enrico > > > > > > > > > > > > On 18/10/2023 13:39, Matthew Knepley > wrote: > > > > > > > On Wed, Oct 18, 2023 at 5:55?AM Enrico > > > > > > > > > > > > >> > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > >>> > > > > > > > > > > > > > >> > > > > > > > > > > > > >>>>>> wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I'm trying to use Petsc to solve > > a linear > > > > system in an > > > > > > application. I'm > > > > > > > using the coordinate format to > > define the > > > > matrix and the > > > > > > vector (it > > > > > > > should work better on GPU but at > > the moment > > > > every test > > > > > is on > > > > > > CPU). > > > > > > > After > > > > > > > the call to VecSetValuesCOO, I've > > > noticed that the > > > > > vector is > > > > > > storing > > > > > > > the > > > > > > > data in a different way from my > > > application. For > > > > > example with two > > > > > > > processes in the application > > > > > > > > > > > > > > process 0 owns cells 2, 3, 4 > > > > > > > > > > > > > > process 1 owns cells 0, 1, 5 > > > > > > > > > > > > > > But in the vector data structure > > of Petsc > > > > > > > > > > > > > > process 0 owns cells 0, 1, 2 > > > > > > > > > > > > > > process 1 owns cells 3, 4, 5 > > > > > > > > > > > > > > This is in principle not a big > issue, > > > but after > > > > > solving the > > > > > > linear > > > > > > > system I get the solution vector > > x and I > > > want > > > > to get the > > > > > > values in the > > > > > > > correct processes. Is there a way > > to get > > > vector > > > > values > > > > > from other > > > > > > > processes or to get a mapping so > > that I > > > can do > > > > it myself? > > > > > > > > > > > > > > > > > > > > > By definition, PETSc vectors and > > matrices own > > > > contiguous row > > > > > > blocks. If > > > > > > > you want to have another, > > > > > > > global ordering, we support that with > > > > > > > > > https://petsc.org/main/manualpages/AO/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > Cheers, > > > > > > > Enrico Degregori > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > What most experimenters take for > > granted before > > > > they begin > > > > > their > > > > > > > experiments is infinitely more > > interesting > > > than any > > > > > results to which > > > > > > > their experiments lead. > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted > before > > > they begin > > > > their > > > > > > experiments is infinitely more interesting > > than any > > > > results to which > > > > > > their experiments lead. > > > > > > -- Norbert Wiener > > > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before > > they begin > > > their > > > > > experiments is infinitely more interesting than any > > > results to which > > > > > their experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin > > their > > > > experiments is infinitely more interesting than any > > results to which > > > > their experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > >> > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to > which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorgenin at mit.edu Thu Oct 19 14:46:35 2023 From: jorgenin at mit.edu (Jorge Nin) Date: Thu, 19 Oct 2023 19:46:35 +0000 Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version Message-ID: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> Hi, I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac. Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) I?ve attached two log views to show the comparison. I was mostly curious about the possible cause for this. I was also curious how I could use my own compiled version of PETSc in my Conda install? Best, Jorge -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Selfcompiled.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Conda Version.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1862 bytes Desc: not available URL: From knepley at gmail.com Thu Oct 19 15:00:47 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Oct 2023 16:00:47 -0400 Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version In-Reply-To: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> Message-ID: On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin wrote: > Hi, > I was playing around with a self compiled version and, and a the Conda > binary of Petsc on the same problem, on my M1 Mac. > Interestingly I found that the Conda binary solves the problem 2-3 times > slower vs the self compiled version. (For context I?m using the petsc4py > python interface) > > I?ve attached two log views to show the comparison. > > I was mostly curious about the possible cause for this. > All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference. Thanks, Matt > I was also curious how I could use my own compiled version of PETSc in my > Conda install? > > > Best, > Jorge > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorgenin at mit.edu Thu Oct 19 19:35:12 2023 From: jorgenin at mit.edu (Jorge Nin) Date: Fri, 20 Oct 2023 00:35:12 +0000 Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version In-Reply-To: References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> Message-ID: <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu> Hi Mathew, Thanks for the response. It actually seems like the matrix is very sparse (0.99% sparsity from what I?m measuring). It?s an FEA solver so it would make sense. My current guess is the optimization flags are making a large difference for the M1 Mac, but I am also surprised it makes such a huge difference. It?s why I was asking if there was a resource or another to use my own version of PETSc with Conda. I believe a 2-3 x speed up is worth the hassle. Best, Jorge > On Oct 19, 2023, at 4:00?PM, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin > wrote: >> Hi, >> I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac. >> Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) >> >> I?ve attached two log views to show the comparison. >> >> I was mostly curious about the possible cause for this. > > All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference. > > Thanks, > > Matt > >> I was also curious how I could use my own compiled version of PETSc in my Conda install? >> >> >> Best, >> Jorge >> > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1862 bytes Desc: not available URL: From degregori at dkrz.de Fri Oct 20 02:05:41 2023 From: degregori at dkrz.de (Enrico) Date: Fri, 20 Oct 2023 09:05:41 +0200 Subject: [petsc-users] Coordinate format internal reordering In-Reply-To: References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de> <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de> <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de> <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de> <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de> <3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de> Message-ID: <17f581e3-ebd1-0c2e-4cb8-4810ba811ac2@dkrz.de> Yes but in the documentation it should be clear that if the global indices are not contiguous in each process, even if the memory is locally contiguous in each process, then there will be communication to build the matrix and the vector. Cheers, Enrico On 19/10/2023 20:41, Matthew Knepley wrote: > On Thu, Oct 19, 2023 at 1:46?PM Enrico > wrote: > > Sorry but I don't want another partition, Petsc internally is changing > the partition. I would like to have the same partition that I have in > the application. Is the example not clear? > > > "Petsc internally is changing the partition" This is not correct. PETSc does > not prescribe partitions. I refer to the documentation for the creation > of Vec: > > https://petsc.org/main/manualpages/Vec/VecCreateMPI/ > > > Here the user sets the local and global sizes. Since data is contiguous, > these > completely define the vector, and are under user control. > > ? Thanks, > > ? ? ?Matt > > On 19/10/2023 19:43, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 1:00?PM Enrico > > >> wrote: > > > >? ? ?Here is a very very simple reproducer of my problem. It is a > fortran > >? ? ?program and it has to run with 2 processes. > > > > > > You seem to be saying that you start with one partition of your > data, > > but you would like > > another partition. For this, you have to initially communicate. > For this > > I would use VecScatter. > > However, since most data is generated, I would consider not > generating > > my data in that initial > > distribution. > > > > There are many examples in the repository. In the discretization > of a > > PDE, we first?divide the domain, > > then number up each piece, then assemble the linear algebra objects. > > > >? ? Thanks, > > > >? ? ? ? Matt > > > >? ? ?The output is: > > > >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) = > ?0.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) = > ?1.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) = > ?2.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) = > ?3.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) = > ?4.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) = > ?5.000000000000000 > > > >? ? ?and I would like to have: > > > >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) = > ?2.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) = > ?3.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) = > ?4.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) = > ?0.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) = > ?1.000000000000000 > >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) = > ?5.000000000000000 > > > >? ? ?How can I do that? > > > >? ? ?program main > >? ? ?#include > >? ? ? ? ? ?use petscksp > >? ? ? ? ? ?implicit none > > > >? ? ? ? ? ?PetscErrorCode ierr > >? ? ? ? ? ?PetscInt? :: Psize = 6 > >? ? ? ? ? ?integer? :: Lsize > >? ? ? ? ? ?PetscInt? :: work_size > >? ? ? ? ? ?PetscInt? :: work_rank > >? ? ? ? ? ?Vec :: b > >? ? ? ? ? ?integer, allocatable, dimension(:) :: glb_index > >? ? ? ? ? ?double precision, allocatable, dimension(:) :: array > >? ? ? ? ? ?PetscScalar, pointer :: xx_v(:) > >? ? ? ? ? ?integer :: i > >? ? ? ? ? ?PetscCount :: csize > > > >? ? ? ? ? ?CALL PetscInitialize(ierr) > > > >? ? ? ? ? ?Lsize = 3 > >? ? ? ? ? ?csize = Lsize > > > >? ? ? ? ? ?allocate(glb_index(0:Lsize-1), array(0:Lsize-1)) > > > >? ? ? ? ? ?CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr); > >? ? ? ? ? ?CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr); > >? ? ? ? ? ?if (work_rank == 0) then > >? ? ? ? ? ? ?glb_index(0) = 2 > >? ? ? ? ? ? ?glb_index(1) = 3 > >? ? ? ? ? ? ?glb_index(2) = 4 > >? ? ? ? ? ? ?array(0) = 2 > >? ? ? ? ? ? ?array(1) = 3 > >? ? ? ? ? ? ?array(2) = 4 > >? ? ? ? ? ?else if (work_rank == 1) then > >? ? ? ? ? ? ?glb_index(0) = 0 > >? ? ? ? ? ? ?glb_index(1) = 1 > >? ? ? ? ? ? ?glb_index(2) = 5 > >? ? ? ? ? ? ?array(0) = 0 > >? ? ? ? ? ? ?array(1) = 1 > >? ? ? ? ? ? ?array(2) = 5 > >? ? ? ? ? ?end if > > > >? ? ? ? ? ?! Create and fill rhs vector > >? ? ? ? ? ?CALL VecCreate(PETSC_COMM_WORLD, b, ierr); > >? ? ? ? ? ?CALL VecSetSizes(b, Lsize, Psize, ierr); > >? ? ? ? ? ?CALL VecSetType(b, VECMPI, ierr); > > > >? ? ? ? ? ?CALL VecSetPreallocationCOO(b, csize, glb_index, ierr) > >? ? ? ? ? ?CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr) > > > >? ? ? ? ? ?CALL VecGetArrayReadF90(b, xx_v, ierr) > > > >? ? ? ? ? ?do i=1,Lsize > >? ? ? ? ? ? ?write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', > xx_v(i) > >? ? ? ? ? ?end do > > > >? ? ? ? ? ?CALL VecRestoreArrayReadF90(b, xx_v, ierr) > > > >? ? ? ? ? ?deallocate(glb_index, array) > >? ? ? ? ? ?CALL VecDestroy(b,ierr) > > > >? ? ? ? ? ?CALL PetscFinalize(ierr) > > > >? ? ?end program main > > > > > >? ? ?On 19/10/2023 17:36, Matthew Knepley wrote: > >? ? ? > On Thu, Oct 19, 2023 at 11:33?AM Enrico > >? ? ?> > >? ? ? > > >>> wrote: > >? ? ? > > >? ? ? >? ? ?The layout is not poor, just the global indices are not > >? ? ?contiguous,this > >? ? ? >? ? ?has nothing to do with the local memory layout which > is extremely > >? ? ? >? ? ?optimized for different architectures. I can not > change the > >? ? ?layout > >? ? ? >? ? ?anyway because it's a climate model with a million > lines of code. > >? ? ? > > >? ? ? >? ? ?I don't understand why Petsc is doing all this MPI > >? ? ?communication under > >? ? ? >? ? ?the hood. > >? ? ? > > >? ? ? > > >? ? ? > I don't think we are communicating under?the hood. > >? ? ? > > >? ? ? >? ? ?I mean, it is changing the layout of the application > and doing > >? ? ? >? ? ?a lot of communication. > >? ? ? > > >? ? ? > > >? ? ? > We do not create the layout. The user creates the data layout > >? ? ?when they > >? ? ? > create a vector or matrix. > >? ? ? > > >? ? ? >? ? ?Is there no way to force the same layout and > >? ? ? >? ? ?provide info about how to do the halo exchange? In > this way I > >? ? ?can have > >? ? ? >? ? ?the same memory layout and there is no communication > when I > >? ? ?fill or > >? ? ? >? ? ?fetch the vectors and the matrix. > >? ? ? > > >? ? ? > > >? ? ? > Yes, you tell the vector/matrix your data layout when you > create it. > >? ? ? > > >? ? ? >? ? Thanks, > >? ? ? > > >? ? ? >? ? ? ? Matt > >? ? ? > > >? ? ? >? ? ?Cheers, > >? ? ? >? ? ?Enrico > >? ? ? > > >? ? ? >? ? ?On 19/10/2023 17:21, Matthew Knepley wrote: > >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 10:51?AM Enrico > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>> wrote: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?In the application the storage is contiguous > but the > >? ? ?global > >? ? ? >? ? ?indexing is > >? ? ? >? ? ? >? ? ?not. I would like to use AO as a translation > layer but > >? ? ?I don't > >? ? ? >? ? ? >? ? ?understand it. > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Why would you choose to index differently from your > storage? > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?My case is actually simple even if it is in a large > >? ? ? >? ? ?application, I have > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Mat A, Vec b and Vec x > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?After calling KSPSolve, I use > VecGetArrayReadF90 to get a > >? ? ? >? ? ?pointer to > >? ? ? >? ? ? >? ? ?the > >? ? ? >? ? ? >? ? ?data and they are in the wrong ordering, so for > >? ? ?example the first > >? ? ? >? ? ? >? ? ?element of the solution array on process 0 > belongs to > >? ? ?process > >? ? ? >? ? ?1 in the > >? ? ? >? ? ? >? ? ?application. > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Again, this seems to be a poor choice of layout. > What we > >? ? ? >? ? ?typically do is > >? ? ? >? ? ? > to partition > >? ? ? >? ? ? > the data into chunks owned by each process first. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Is it at this point that I should use the AO > translation > >? ? ? >? ? ?layer? This > >? ? ? >? ? ? >? ? ?would be quite bad, it means to build Mat A and > Vec b > >? ? ?there > >? ? ? >? ? ?is MPI > >? ? ? >? ? ? >? ? ?communication and also to get the data of Vec x > back > >? ? ?in the > >? ? ? >? ? ?application. > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > If you want to store data that process i updates on > process j, > >? ? ? >? ? ?this will > >? ? ? >? ? ? > need communication. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Anyway, I've tried to use > >? ? ?AOPetscToApplicationPermuteReal on the > >? ? ? >? ? ? >? ? ?solution array but it doesn't work as I would > like. Is > >? ? ?this > >? ? ? >? ? ?function > >? ? ? >? ? ? >? ? ?suppose to do MPI communication between > processes and > >? ? ?fetch > >? ? ? >? ? ?the values > >? ? ? >? ? ? >? ? ?of the application ordering? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > There is no communication here. That function call just > >? ? ?changes one > >? ? ? >? ? ? > integer into another. > >? ? ? >? ? ? > If you want to update values on another process, we > >? ? ?recommend using > >? ? ? >? ? ? > VecScatter() or > >? ? ? >? ? ? > MatSetValues(), both of which take global indices > and do > >? ? ? >? ? ?communication > >? ? ? >? ? ? > if necessary. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? Matt > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ?Enrico > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote: > >? ? ? >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico > >? ? ? > > > >? ? ? >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>>> wrote: > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to > >? ? ? >? ? ?completely get > >? ? ? >? ? ? >? ? ?rid of > >? ? ? >? ? ? >? ? ? >? ? ?Petsc > >? ? ? >? ? ? >? ? ? >? ? ?ordering because I don't want extra > >? ? ?communication between > >? ? ? >? ? ? >? ? ?processes to > >? ? ? >? ? ? >? ? ? >? ? ?construct the vector and the matrix (since I > >? ? ?have to fill > >? ? ? >? ? ? >? ? ?them every > >? ? ? >? ? ? >? ? ? >? ? ?time step because I'm just using the > linear solver > >? ? ? >? ? ?with a Mat > >? ? ? >? ? ? >? ? ?and a Vec > >? ? ? >? ? ? >? ? ? >? ? ?data structure). I don't understand how > I can > >? ? ?do that. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > Any program you write to do linear algebra > will have > >? ? ? >? ? ?contiguous > >? ? ? >? ? ? >? ? ?storage > >? ? ? >? ? ? >? ? ? > because it > >? ? ? >? ? ? >? ? ? > is so much faster. Contiguous indexing makes > sense for > >? ? ? >? ? ?contiguous > >? ? ? >? ? ? >? ? ? > storage. If you > >? ? ? >? ? ? >? ? ? > want to use non-contiguous indexing for > contiguous > >? ? ? >? ? ?storage, you > >? ? ? >? ? ? >? ? ?would > >? ? ? >? ? ? >? ? ? > need some > >? ? ? >? ? ? >? ? ? > translation layer. The AO is such a translation, > >? ? ?but you > >? ? ? >? ? ?could do > >? ? ? >? ? ? >? ? ?this > >? ? ? >? ? ? >? ? ? > any way you want. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?My initial idea was to create another > global index > >? ? ? >? ? ?ordering > >? ? ? >? ? ? >? ? ?within my > >? ? ? >? ? ? >? ? ? >? ? ?application to use only for the Petsc > interface > >? ? ?but then I > >? ? ? >? ? ? >? ? ?think that > >? ? ? >? ? ? >? ? ? >? ? ?the ghost cells are wrong. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote: > >? ? ? >? ? ? >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico > >? ? ? >? ? ? > > > >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? > > > >? ? ? >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>>>> wrote: > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?if I create an application > ordering using > >? ? ? >? ? ? >? ? ?AOCreateBasic, should I > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?provide the same array for const > PetscInt > >? ? ? >? ? ?myapp[] and > >? ? ? >? ? ? >? ? ?const > >? ? ? >? ? ? >? ? ? >? ? ?PetscInt > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same > >? ? ?ordering of the > >? ? ? >? ? ? >? ? ?application > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?within PETSC? > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > Are you asking if the identity > permutation > >? ? ?can be > >? ? ? >? ? ?constructed > >? ? ? >? ? ? >? ? ? >? ? ?using the > >? ? ? >? ? ? >? ? ? >? ? ? > same array twice? Yes. > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?And once I define the ordering so > that > >? ? ?the local > >? ? ? >? ? ? >? ? ?vector and > >? ? ? >? ? ? >? ? ? >? ? ?matrix are > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?defined in PETSC as in my > application, > >? ? ?how can > >? ? ? >? ? ?I use it to > >? ? ? >? ? ? >? ? ? >? ? ?create the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?actual vector and matrix? > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > The vectors and matrices do not > change. The > >? ? ?AO is a > >? ? ? >? ? ? >? ? ?permutation. > >? ? ? >? ? ? >? ? ? >? ? ?You can > >? ? ? >? ? ? >? ? ? >? ? ? > use it to permute > >? ? ? >? ? ? >? ? ? >? ? ? > a vector into another order, or to > convert > >? ? ?on index to > >? ? ? >? ? ? >? ? ?another. > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? ? Matt > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Thanks in advance for the help. > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Enrico > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew > Knepley wrote: > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at > 5:55?AM Enrico > >? ? ? >? ? ? >? ? ? > > > >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>>> > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? > > > >? ? ? >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>>> > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ?> > >? ? ? >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>> > >? ? ? >? ? ? > > > >? ? ? > >> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ?> > > >? ? ?>> > >? ? ? >? ? ? > > > >? ? ? > >>> > >? ? ? >? ? ? >? ? ? > > >? ? ? > >> > >? ? ? >? ? ? > > > >? ? ? > >>>>>>> wrote: > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Hello, > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to > solve > >? ? ?a linear > >? ? ? >? ? ? >? ? ?system in an > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?application. I'm > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to > >? ? ?define the > >? ? ? >? ? ? >? ? ?matrix and the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?vector (it > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU > but at > >? ? ?the moment > >? ? ? >? ? ? >? ? ?every test > >? ? ? >? ? ? >? ? ? >? ? ?is on > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?CPU). > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?After > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the call to > VecSetValuesCOO, I've > >? ? ? >? ? ?noticed that the > >? ? ? >? ? ? >? ? ? >? ? ?vector is > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?storing > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?data in a different way > from my > >? ? ? >? ? ?application. For > >? ? ? >? ? ? >? ? ? >? ? ?example with two > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes in the application > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4 > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5 > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data > structure > >? ? ?of Petsc > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2 > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5 > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a > big issue, > >? ? ? >? ? ?but after > >? ? ? >? ? ? >? ? ? >? ? ?solving the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?linear > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution > vector > >? ? ?x and I > >? ? ? >? ? ?want > >? ? ? >? ? ? >? ? ?to get the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?values in the > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is > there a way > >? ? ?to get > >? ? ? >? ? ?vector > >? ? ? >? ? ? >? ? ?values > >? ? ? >? ? ? >? ? ? >? ? ?from other > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a > mapping so > >? ? ?that I > >? ? ? >? ? ?can do > >? ? ? >? ? ? >? ? ?it myself? > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and > >? ? ?matrices own > >? ? ? >? ? ? >? ? ?contiguous row > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?blocks. If > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > you want to have another, > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > global ordering, we support > that with > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > > https://petsc.org/main/manualpages/AO/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>> > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>>> > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? Thanks, > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? ?Matt > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Cheers, > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > What most experimenters take for > >? ? ?granted before > >? ? ? >? ? ? >? ? ?they begin > >? ? ? >? ? ? >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > experiments is infinitely more > >? ? ?interesting > >? ? ? >? ? ?than any > >? ? ? >? ? ? >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>>> > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? >? ? ? > What most experimenters take for > granted before > >? ? ? >? ? ?they begin > >? ? ? >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? >? ? ? > experiments is infinitely more > interesting > >? ? ?than any > >? ? ? >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>>> > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > -- > >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before > >? ? ?they begin > >? ? ? >? ? ?their > >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting > than any > >? ? ? >? ? ?results to which > >? ? ? >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>>> > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > -- > >? ? ? >? ? ? > What most experimenters take for granted before > they begin > >? ? ?their > >? ? ? >? ? ? > experiments is infinitely more interesting than any > >? ? ?results to which > >? ? ? >? ? ? > their experiments lead. > >? ? ? >? ? ? > -- Norbert Wiener > >? ? ? >? ? ? > > >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? >? ? ? > >? ? ? >> > >? ? ? >? ? ? > >? ? ? > > >? ? ? >? ? ? > >? ? ? >>> > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > -- > >? ? ? > What most experimenters take for granted before they begin > their > >? ? ? > experiments is infinitely more interesting than any > results to which > >? ? ? > their experiments lead. > >? ? ? > -- Norbert Wiener > >? ? ? > > >? ? ? > https://www.cse.buffalo.edu/~knepley/ > > >? ? ? > > >? ? ? > >? ? ? >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Fri Oct 20 06:12:30 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 20 Oct 2023 07:12:30 -0400 Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version In-Reply-To: <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu> References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu> Message-ID: On Thu, Oct 19, 2023 at 8:35?PM Jorge Nin wrote: > Hi Mathew, > > Thanks for the response. It actually seems like the matrix is very sparse (0.99% > sparsity from what I?m measuring). It?s an FEA solver so it would make > sense. > My current guess is the optimization flags are making a large difference > for the M1 Mac, but I am also surprised it makes such a huge difference. > > It?s why I was asking if there was a resource or another to use my own > version of PETSc with Conda. > We do not know how Conda works unfortunately. Thanks, Matt > I believe a 2-3 x speed up is worth the hassle. > > > Best, > Jorge > > > > On Oct 19, 2023, at 4:00?PM, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin wrote: > >> Hi, >> I was playing around with a self compiled version and, and a the Conda >> binary of Petsc on the same problem, on my M1 Mac. >> Interestingly I found that the Conda binary solves the problem 2-3 times >> slower vs the self compiled version. (For context I?m using the petsc4py >> python interface) >> >> I?ve attached two log views to show the comparison. >> >> I was mostly curious about the possible cause for this. >> > > All the time is in the LU numeric factorization. I don't know if your > matrix is sparse or dense. I am guessing it is dense and different LAPACK > implementations are linked. If it is sparse, then the compiler options are > different between builds, but I would be surprised if it made this much > difference. > > Thanks, > > Matt > > >> I was also curious how I could use my own compiled version of PETSc in >> my Conda install? >> >> >> Best, >> Jorge >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 20 09:11:43 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 20 Oct 2023 10:11:43 -0400 Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version In-Reply-To: References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu> Message-ID: > On Oct 20, 2023, at 7:12?AM, Matthew Knepley wrote: > > On Thu, Oct 19, 2023 at 8:35?PM Jorge Nin > wrote: >> Hi Mathew, >> >> Thanks for the response. It actually seems like the matrix is very sparse (0.99% sparsity from what I?m measuring). It?s an FEA solver so it would make sense. >> My current guess is the optimization flags are making a large difference for the M1 Mac, but I am also surprised it makes such a huge difference. >> >> It?s why I was asking if there was a resource or another to use my own version of PETSc with Conda. What do you mean by "with Conda"? You can entire the Conda environment, configure and compile PETSc, and then link your code against this PETSc library (instead of the one provided by Conda). By being in the Conda environment it means it is using the Conda Python, the Conda compilers etc. Barry Some users have difficulty configuring PETSc inside the Conda environment; if your ./configure or make of PETSc fails just send configure.log (and make.log) to petsc-maint at mcs.anl.gov and we'll figure out how to get it compiled. > > We do not know how Conda works unfortunately. > > Thanks, > > Matt > >> I believe a 2-3 x speed up is worth the hassle. >> >> >> Best, >> Jorge >> >> >> >>> On Oct 19, 2023, at 4:00?PM, Matthew Knepley > wrote: >>> >>> On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin > wrote: >>>> Hi, >>>> I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac. >>>> Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) >>>> >>>> I?ve attached two log views to show the comparison. >>>> >>>> I was mostly curious about the possible cause for this. >>> >>> All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference. >>> >>> Thanks, >>> >>> Matt >>> >>>> I was also curious how I could use my own compiled version of PETSc in my Conda install? >>>> >>>> >>>> Best, >>>> Jorge >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From miaodi1987 at gmail.com Fri Oct 20 15:19:27 2023 From: miaodi1987 at gmail.com (Di Miao) Date: Fri, 20 Oct 2023 13:19:27 -0700 Subject: [petsc-users] use MATSEQAIJMKL in 64-bit indices Message-ID: Hi, I found that when compiled with '--with-64-bit-indices=1' option, the following three definitions in petscconf.h will be removed: #define PETSC_HAVE_MKL_SPARSE 1 #define PETSC_HAVE_MKL_SPARSE_OPTIMIZE 1 #define PETSC_HAVE_MKL_SPARSE_SP2M_FEATURE 1 I believe mkl can also use 64-bit indices (libmkl_intel_ilp64). I tried to add ' --with-mkl_sparse=1 --with-mkl_sparse_optimize=1' into configuration but does not succeed. Would I know if it is possible to use MATSEQAIJMKL matrix type in 64-bit mode? Regards, Di -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Oct 20 15:52:11 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 20 Oct 2023 15:52:11 -0500 (CDT) Subject: [petsc-users] use MATSEQAIJMKL in 64-bit indices In-Reply-To: References: Message-ID: <5c512d72-fc62-84c3-c7a6-b2848f352c01@mcs.anl.gov> Try using the additional option --with-64-bit-blas-indices=1 Satish On Fri, 20 Oct 2023, Di Miao wrote: > Hi, > > I found that when compiled with '--with-64-bit-indices=1' option, the > following three definitions in petscconf.h will be removed: > > #define PETSC_HAVE_MKL_SPARSE 1 > #define PETSC_HAVE_MKL_SPARSE_OPTIMIZE 1 > #define PETSC_HAVE_MKL_SPARSE_SP2M_FEATURE 1 > > I believe mkl can also use 64-bit indices (libmkl_intel_ilp64). I tried to > add ' --with-mkl_sparse=1 --with-mkl_sparse_optimize=1' into configuration > but does not succeed. > > Would I know if it is possible to use MATSEQAIJMKL matrix type in 64-bit > mode? > > Regards, > Di > From miaodi1987 at gmail.com Fri Oct 20 16:50:29 2023 From: miaodi1987 at gmail.com (Di Miao) Date: Fri, 20 Oct 2023 14:50:29 -0700 Subject: [petsc-users] use MATSEQAIJMKL in 64-bit indices In-Reply-To: <5c512d72-fc62-84c3-c7a6-b2848f352c01@mcs.anl.gov> References: <5c512d72-fc62-84c3-c7a6-b2848f352c01@mcs.anl.gov> Message-ID: Thanks, it worked. Di On Fri, Oct 20, 2023 at 1:52?PM Satish Balay wrote: > Try using the additional option --with-64-bit-blas-indices=1 > > Satish > > On Fri, 20 Oct 2023, Di Miao wrote: > > > Hi, > > > > I found that when compiled with '--with-64-bit-indices=1' option, the > > following three definitions in petscconf.h will be removed: > > > > #define PETSC_HAVE_MKL_SPARSE 1 > > #define PETSC_HAVE_MKL_SPARSE_OPTIMIZE 1 > > #define PETSC_HAVE_MKL_SPARSE_SP2M_FEATURE 1 > > > > I believe mkl can also use 64-bit indices (libmkl_intel_ilp64). I tried > to > > add ' --with-mkl_sparse=1 --with-mkl_sparse_optimize=1' into > configuration > > but does not succeed. > > > > Would I know if it is possible to use MATSEQAIJMKL matrix type in 64-bit > > mode? > > > > Regards, > > Di > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrsd at gmail.com Sat Oct 21 07:36:06 2023 From: andrsd at gmail.com (David Andrs) Date: Sat, 21 Oct 2023 06:36:06 -0600 Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version In-Reply-To: <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu> References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu> <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu> Message-ID: Hi Jorge. On Oct 19, 2023, at 18:35, Jorge Nin wrote: > > Hi Mathew, > > Thanks for the response. It actually seems like the matrix is very sparse (0.99% sparsity from what I?m measuring). It?s an FEA solver so it would make sense. > My current guess is the optimization flags are making a large difference for the M1 Mac, but I am also surprised it makes such a huge difference. > > It?s why I was asking if there was a resource or another to use my own version of PETSc with Conda. If you are using PETSc from conda-forge, then you can go look at https://github.com/conda-forge/petsc-feedstock repo. That is where the conda recipe (i.e. the description of how to build the conda package) is located. Under the `recipe` directory, you will see `build.sh` script which has the configure line, etc. included. You can adapt the recipe to your needs and then deploy your own version to your own conda channel (it would be hosted on anaconda.org - you will need an account there). The whole process is quite complex and beyond the scope of this email. conda has a lot of documentation for this, though. This could be a good starting point in case you want to dive into this: https://docs.conda.io/projects/conda-build/en/stable/concepts/recipe.html -- David > > I believe a 2-3 x speed up is worth the hassle. > > > Best, > Jorge > > > >> On Oct 19, 2023, at 4:00?PM, Matthew Knepley wrote: >> >> On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin > wrote: >>> Hi, >>> I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac. >>> Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) >>> >>> I?ve attached two log views to show the comparison. >>> >>> I was mostly curious about the possible cause for this. >> >> All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference. >> >> Thanks, >> >> Matt >> >>> I was also curious how I could use my own compiled version of PETSc in my Conda install? >>> >>> >>> Best, >>> Jorge >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Oct 23 07:15:44 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 23 Oct 2023 08:15:44 -0400 Subject: [petsc-users] MatSetValue problem (in Fortran) Message-ID: I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in v3.19 and v3.20 and it works with v3.17. They are trying to get a line number but I was thinking it might be worth trying the old dumb MatSetValues. Is that possible? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Oct 23 09:09:07 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 23 Oct 2023 09:09:07 -0500 Subject: [petsc-users] MatSetValue problem (in Fortran) In-Reply-To: References: Message-ID: Copy the code block of MatSetValues_MPIAIJ() in 3.17 to 3.20. If it works, then it is possible :) --Junchao Zhang On Mon, Oct 23, 2023 at 7:16?AM Mark Adams wrote: > I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in > v3.19 and v3.20 and it works with v3.17. > They are trying to get a line number but I was thinking it might be > worth trying the old dumb MatSetValues. > Is that possible? > > Thanks, > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Oct 23 09:18:33 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 23 Oct 2023 10:18:33 -0400 Subject: [petsc-users] MatSetValue problem (in Fortran) In-Reply-To: References: Message-ID: <83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev> Are they then not using any preallocation? The "old dumb MatSetValues" used default preallocation. To reproduce that they can call MatXAIJSetPreallocation() with 10 and 3 as the local and nonlocal number of nonzeros per row. Best to use -start_in_debugger or -on_error_attach_debugger to find the details of the crash > On Oct 23, 2023, at 8:15?AM, Mark Adams wrote: > > I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in v3.19 and v3.20 and it works with v3.17. > They are trying to get a line number but I was thinking it might be worth trying the old dumb MatSetValues. > Is that possible? > > Thanks, > Mark From mfadams at lbl.gov Mon Oct 23 10:22:57 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 23 Oct 2023 11:22:57 -0400 Subject: [petsc-users] MatSetValue problem (in Fortran) In-Reply-To: <83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev> References: <83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev> Message-ID: On Mon, Oct 23, 2023 at 10:18?AM Barry Smith wrote: > > Are they then not using any preallocation? I asked but have not gotten a response. > The "old dumb MatSetValues" used default preallocation. To reproduce that > they can call MatXAIJSetPreallocation() with 10 and 3 as the local and > nonlocal number of nonzeros per row. > > I am sure they use some sort of preallocation, or they would have seen bad performance problems and they are pretty mature PETSc users, but we have to wait and see. > Best to use -start_in_debugger or -on_error_attach_debugger to find the > details of the crash > > They did give me a ddt --offline .hdftm file, attached (nice when you can't open a window, eg, Frontier). I asked them to try to get a line number. Thanks, Mark > > > > On Oct 23, 2023, at 8:15?AM, Mark Adams wrote: > > > > I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in > v3.19 and v3.20 and it works with v3.17. > > They are trying to get a line number but I was thinking it might be > worth trying the old dumb MatSetValues. > > Is that possible? > > > > Thanks, > > Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Oct 23 11:45:06 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 23 Oct 2023 12:45:06 -0400 Subject: [petsc-users] MatSetValue problem (in Fortran) In-Reply-To: References: <83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev> Message-ID: It's nice to see DOE is still buying computers for 100s of millions of dollars that do not support a debugger. > On Oct 23, 2023, at 11:22?AM, Mark Adams wrote: > > > > On Mon, Oct 23, 2023 at 10:18?AM Barry Smith > wrote: >> >> Are they then not using any preallocation? > > I asked but have not gotten a response. > >> The "old dumb MatSetValues" used default preallocation. To reproduce that they can call MatXAIJSetPreallocation() with 10 and 3 as the local and nonlocal number of nonzeros per row. >> > > I am sure they use some sort of preallocation, or they would have seen bad performance problems and they are pretty mature PETSc users, but we have to wait and see. > >> Best to use -start_in_debugger or -on_error_attach_debugger to find the details of the crash >> > > They did give me a ddt --offline .hdftm file, attached > (nice when you can't open a window, eg, Frontier). > I asked them to try to get a line number. > > Thanks, > Mark > >> >> >> > On Oct 23, 2023, at 8:15?AM, Mark Adams > wrote: >> > >> > I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in v3.19 and v3.20 and it works with v3.17. >> > They are trying to get a line number but I was thinking it might be worth trying the old dumb MatSetValues. >> > Is that possible? >> > >> > Thanks, >> > Mark >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rene.chenard at me.com Mon Oct 23 18:16:06 2023 From: rene.chenard at me.com (Rene Chenard) Date: Mon, 23 Oct 2023 19:16:06 -0400 Subject: [petsc-users] Seeking Clarification on SNES Solver Behavior Message-ID: <2A793409-82F7-4A70-A2E4-882D39B6ABE0@me.com> Hi! We have recently noticed some inconsistencies in the behavior of the SNES solver when using different solver types, and we would greatly appreciate your insights in resolving this matter. While working with SNESSolve in parallel, we have encountered a discrepancy in the behavior of the evaluation functions for the ComputeFunction and the JacobianFunction. Specifically, there seems to be an inconsistency in whether Vec x receives automatic updates to its ghosts or if manual updates are required (with calls to VecGhostUpdateBegin/End). For instance, when using the ngmres solver, the ghosts of Vec x are adequately updated. However, when employing the nrichardson solver, it appears that manual updates to the ghosts are necessary. It's important to note that we do not utilize the DM object in our implementation, as we have developed our own solution to manage models and discretization. To better understand the root cause of this behavior, we kindly request your assistance in determining if we may be overlooking something in our implementation, or if there are inherent inconsistencies in the SNES solver itself. Your expertise in this matter would be invaluable to us, and we thank you in advance for your consideration and support. Warm regards, ?Ren? Chenard Research Professional at Universit? Laval rene.chenard.1 at ulaval.ca From knepley at gmail.com Tue Oct 24 07:23:17 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Oct 2023 08:23:17 -0400 Subject: [petsc-users] Seeking Clarification on SNES Solver Behavior In-Reply-To: <2A793409-82F7-4A70-A2E4-882D39B6ABE0@me.com> References: <2A793409-82F7-4A70-A2E4-882D39B6ABE0@me.com> Message-ID: On Mon, Oct 23, 2023 at 10:23?PM Rene Chenard via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi! > > We have recently noticed some inconsistencies in the behavior of the SNES > solver when using different solver types, and we would greatly appreciate > your insights in resolving this matter. > > While working with SNESSolve in parallel, we have encountered a > discrepancy in the behavior of the evaluation functions for the > ComputeFunction and the JacobianFunction. Specifically, there seems to be > an inconsistency in whether Vec x receives automatic updates to its ghosts > or if manual updates are required (with calls to VecGhostUpdateBegin/End). > > For instance, when using the ngmres solver, the ghosts of Vec x are > adequately updated. However, when employing the nrichardson solver, it > appears that manual updates to the ghosts are necessary. > > It's important to note that we do not utilize the DM object in our > implementation, as we have developed our own solution to manage models and > discretization. > > To better understand the root cause of this behavior, we kindly request > your assistance in determining if we may be overlooking something in our > implementation, or if there are inherent inconsistencies in the SNES solver > itself. > > Your expertise in this matter would be invaluable to us, and we thank you > in advance for your consideration and support. > Since you are not using DM, does that mean that you register a callback with https://petsc.org/main/manualpages/SNES/DMSNESSetFunction/ If so, we do not do any kind of local-to-global calls. I am not sure how NGMRES would populate local vectors for you. My guess is that you have a ghost update call somewhere in your callback, and this gets hit in the NGMRES because it has extra residual evaluations. We can be more specific with more details about the code. Thanks, Matt > Warm regards, > > ?Ren? Chenard > Research Professional at Universit? Laval > rene.chenard.1 at ulaval.ca > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From damynchipman at u.boisestate.edu Tue Oct 24 18:53:25 2023 From: damynchipman at u.boisestate.edu (Damyn Chipman) Date: Tue, 24 Oct 2023 17:53:25 -0600 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators Message-ID: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> Hi PETSc developers, In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. Thanks in advance, Damyn Chipman Boise State University PhD Candidate Computational Sciences and Engineering damynchipman at u.boisestate.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 24 22:51:46 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 24 Oct 2023 21:51:46 -0600 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> Message-ID: <87h6mfmka5.fsf@jedbrown.org> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. Damyn Chipman writes: > Hi PETSc developers, > > In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? > > The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. > > My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. > > Thanks in advance, > > Damyn Chipman > Boise State University > PhD Candidate > Computational Sciences and Engineering > damynchipman at u.boisestate.edu From joauma.marichal at uclouvain.be Wed Oct 25 07:31:43 2023 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Wed, 25 Oct 2023 12:31:43 +0000 Subject: [petsc-users] DMSwarm on multiple processors Message-ID: Hello, I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water. I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error: free(): invalid size [cns136:590327] *** Process received signal *** [cns136:590327] Signal: Aborted (6) [cns136:590327] Signal code: (-6) [cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20] [cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f] [cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05] [cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037] [cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c] [cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac] [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64] [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642] [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e] [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde] [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8] [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448] [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20] [cns136:590327] [13] ./cobpor[0x4418dc] [cns136:590327] [14] ./cobpor[0x408b63] [cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3] [cns136:590327] [16] ./cobpor[0x40bdee] [cns136:590327] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted). -------------------------------------------------------------------------- When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works. The problem seems to take place at this moment: DMCreate(PETSC_COMM_WORLD,swarm); DMSetType(*swarm,DMSWARM); DMSetDimension(*swarm,3); DMSwarmSetType(*swarm,DMSWARM_PIC); DMSwarmSetCellDM(*swarm,*dmcell); Thanks a lot for your help. Best regards, Joauma -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 25 07:45:38 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 25 Oct 2023 08:45:38 -0400 Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors In-Reply-To: References: Message-ID: On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint < petsc-maint at mcs.anl.gov> wrote: > Hello, > > > > I am using the DMSwarm library in some Eulerian-Lagrangian approach to > have vapor bubbles in water. > > I have obtained nice results recently and wanted to perform bigger > simulations. Unfortunately, when I increase the number of processors used > to run the simulation, I get the following error: > > > > free(): invalid size > > [cns136:590327] *** Process received signal *** > > [cns136:590327] Signal: Aborted (6) > > [cns136:590327] Signal code: (-6) > > [cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20] > > [cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f] > > [cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05] > > [cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037] > > [cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c] > > [cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac] > > [cns136:590327] [ 6] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64] > > [cns136:590327] [ 7] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642] > > [cns136:590327] [ 8] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e] > > [cns136:590327] [ 9] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde] > > [cns136:590327] [10] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8] > > [cns136:590327] [11] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448] > > [cns136:590327] [12] > /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20] > > [cns136:590327] [13] ./cobpor[0x4418dc] > > [cns136:590327] [14] ./cobpor[0x408b63] > > [cns136:590327] [15] > /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3] > > [cns136:590327] [16] ./cobpor[0x40bdee] > > [cns136:590327] *** End of error message *** > > -------------------------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code. Per user-direction, the job has been aborted. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited > on signal 6 (Aborted). > > -------------------------------------------------------------------------- > > > > When I reduce the number of processors the error disappears and when I run > my code without the vapor bubbles it also works. > > The problem seems to take place at this moment: > > > > DMCreate(PETSC_COMM_WORLD,swarm); > > DMSetType(*swarm,DMSWARM); > > DMSetDimension(*swarm,3); > > DMSwarmSetType(*swarm,DMSWARM_PIC); > > DMSwarmSetCellDM(*swarm,*dmcell); > > > > > > Thanks a lot for your help. > Things that would help us track this down: 1) The smallest example where it fails 2) The smallest number of processes where it fails 3) A stack trace of the failure 4) A simple example that we can run that also fails Thanks, Matt > Best regards, > > > > Joauma > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiyuelu1 at gmail.com Wed Oct 25 10:09:08 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Wed, 25 Oct 2023 10:09:08 -0500 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules Message-ID: Hello, I have an in-house code enabled OpenMP and it works. Now I am trying to incorporate PETSc as the linear solver and build together using the building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the OpenMP part doesn't work anymore. Should I re-configure the petsc installation with --with-openmp=1 option? I wonder are the building rules affected by this missing option? Thanks, Qiyue Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Oct 25 10:14:38 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 25 Oct 2023 11:14:38 -0400 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: <3A18A24D-5445-4F2F-A932-75B2D1C6B6CD@petsc.dev> To have OpenMP available from the PETSc make system you need to have --with-openmp with the .PETSc /configure options > On Oct 25, 2023, at 11:09?AM, Qiyue Lu wrote: > > Hello, > I have an in-house code enabled OpenMP and it works. Now I am trying to incorporate PETSc as the linear solver and build together using the building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the OpenMP part doesn't work anymore. > Should I re-configure the petsc installation with --with-openmp=1 option? I wonder are the building rules affected by this missing option? > > Thanks, > Qiyue Lu From knepley at gmail.com Wed Oct 25 10:48:07 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 25 Oct 2023 11:48:07 -0400 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: On Wed, Oct 25, 2023 at 11:12?AM Qiyue Lu wrote: > Hello, > I have an in-house code enabled OpenMP and it works. Now I am trying to > incorporate PETSc as the linear solver and build together using the > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the > OpenMP part doesn't work anymore. > Should I re-configure the petsc installation with --with-openmp=1 option? > I wonder are the building rules affected by this missing option? > There are parts of PETSc that are not threadsafe unless you configure using --with-threadsafety. If you plan to call PETSc methods from different threads, you need this. Thanks, Matt > Thanks, > Qiyue Lu > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Oct 25 10:54:24 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 25 Oct 2023 10:54:24 -0500 (CDT) Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023, Qiyue Lu wrote: > Hello, > I have an in-house code enabled OpenMP and it works. Now I am trying to > incorporate PETSc as the linear solver and build together using the > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the > OpenMP part doesn't work anymore. If you are looking at building only your sources with openmp - using petsc formatted makefile [using petsc build rules], you can specify it via CFLAGS - either in makefile - or on command line. >>>>>>> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the corresponding make fules] [balay at pj01 tutorials]$ make ex2 mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2 [balay at pj01 tutorials]$ make clean [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2 [balay at pj01 tutorials]$ <<<<< Satish > Should I re-configure the petsc installation with --with-openmp=1 option? I > wonder are the building rules affected by this missing option? > > Thanks, > Qiyue Lu > From qiyuelu1 at gmail.com Wed Oct 25 11:06:14 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Wed, 25 Oct 2023 11:06:14 -0500 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: Thanks for your reply, using this configurations: *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1 --with-threadsafety=1* However, I got an error like: *nvcc fatal : Unknown option '-fopenmp'* Previously, when I don't have --with-openmp for the configuration, the PETSc make system can build my *.cu code using nvcc and g++, of course, OpenMP doesn't work. Now with this --with-openmp option, it cannot even build. The interesting thing is, I got this error even after removing the *-fopenmp* from *CXXFLAGS* contents: CXXFLAGS=-std=c++17 LDFLAGS= CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules Thanks, Qiyue Lu On Wed, Oct 25, 2023 at 10:54?AM Satish Balay wrote: > > On Wed, 25 Oct 2023, Qiyue Lu wrote: > > > Hello, > > I have an in-house code enabled OpenMP and it works. Now I am trying to > > incorporate PETSc as the linear solver and build together using the > > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the > > OpenMP part doesn't work anymore. > > If you are looking at building only your sources with openmp - using petsc > formatted makefile [using petsc build rules], > you can specify it via CFLAGS - either in makefile - or on command line. > > >>>>>>> > For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the > corresponding make fules] > > [balay at pj01 tutorials]$ make ex2 > mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include > -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > -L/home/balay/petsc/arch-linux-c-debug/lib > -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 > -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > -lquadmath -o ex2 > [balay at pj01 tutorials]$ make clean > [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp > mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include > -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > -L/home/balay/petsc/arch-linux-c-debug/lib > -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 > -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > -lquadmath -o ex2 > [balay at pj01 tutorials]$ > <<<<< > > Satish > > > > Should I re-configure the petsc installation with --with-openmp=1 > option? I > > wonder are the building rules affected by this missing option? > > > > Thanks, > > Qiyue Lu > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiyuelu1 at gmail.com Wed Oct 25 11:35:35 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Wed, 25 Oct 2023 11:35:35 -0500 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: Even with CXXFLAGS=-Xcompiler -fopenmp -std=c++17 LDFLAGS= -Xcompiler -fopenmp CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules won't work. On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu wrote: > Thanks for your reply, using this configurations: > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1 > --with-threadsafety=1* > However, I got an error like: > *nvcc fatal : Unknown option '-fopenmp'* > Previously, when I don't have --with-openmp for the configuration, the > PETSc make system can build my *.cu code using nvcc and g++, of course, > OpenMP doesn't work. Now with this --with-openmp option, it cannot even > build. The interesting thing is, I got this error even after removing the > *-fopenmp* from *CXXFLAGS* contents: > CXXFLAGS=-std=c++17 > LDFLAGS= > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > Thanks, > Qiyue Lu > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay wrote: > >> >> On Wed, 25 Oct 2023, Qiyue Lu wrote: >> >> > Hello, >> > I have an in-house code enabled OpenMP and it works. Now I am trying to >> > incorporate PETSc as the linear solver and build together using the >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the >> > OpenMP part doesn't work anymore. >> >> If you are looking at building only your sources with openmp - using >> petsc formatted makefile [using petsc build rules], >> you can specify it via CFLAGS - either in makefile - or on command line. >> >> >>>>>>> >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the >> corresponding make fules] >> >> [balay at pj01 tutorials]$ make ex2 >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib >> -L/home/balay/petsc/arch-linux-c-debug/lib >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ >> -lquadmath -o ex2 >> [balay at pj01 tutorials]$ make clean >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib >> -L/home/balay/petsc/arch-linux-c-debug/lib >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ >> -lquadmath -o ex2 >> [balay at pj01 tutorials]$ >> <<<<< >> >> Satish >> >> >> > Should I re-configure the petsc installation with --with-openmp=1 >> option? I >> > wonder are the building rules affected by this missing option? >> > >> > Thanks, >> > Qiyue Lu >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Oct 25 11:44:06 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 25 Oct 2023 11:44:06 -0500 (CDT) Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: I guess the flag you are looking for is CUDAFLAGS >>> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100 CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp /usr/local/cuda/bin/nvcc -o ex100.o -c -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include -ccbin mpicxx -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode arch=compute_86,code=sm_86 -Xcompiler -fopenmp -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-debug/include -I/usr/local/cuda/include `pwd`/ex100.cu mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -fopenmp -Wl,-export-dynamic ex100.o -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib -L/scratch/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex100 rm ex100.o balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ <<< Satish On Wed, 25 Oct 2023, Qiyue Lu wrote: > Even with > CXXFLAGS=-Xcompiler -fopenmp -std=c++17 > LDFLAGS= -Xcompiler -fopenmp > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > won't work. > > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu wrote: > > > Thanks for your reply, using this configurations: > > > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1 > > --with-threadsafety=1* > > However, I got an error like: > > *nvcc fatal : Unknown option '-fopenmp'* > > Previously, when I don't have --with-openmp for the configuration, the > > PETSc make system can build my *.cu code using nvcc and g++, of course, > > OpenMP doesn't work. Now with this --with-openmp option, it cannot even > > build. The interesting thing is, I got this error even after removing the > > *-fopenmp* from *CXXFLAGS* contents: > > CXXFLAGS=-std=c++17 > > LDFLAGS= > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common > > include ${PETSC_DIR}/lib/petsc/conf/variables > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > > > > > Thanks, > > Qiyue Lu > > > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay wrote: > > > >> > >> On Wed, 25 Oct 2023, Qiyue Lu wrote: > >> > >> > Hello, > >> > I have an in-house code enabled OpenMP and it works. Now I am trying to > >> > incorporate PETSc as the linear solver and build together using the > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the > >> > OpenMP part doesn't work anymore. > >> > >> If you are looking at building only your sources with openmp - using > >> petsc formatted makefile [using petsc build rules], > >> you can specify it via CFLAGS - either in makefile - or on command line. > >> > >> >>>>>>> > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the > >> corresponding make fules] > >> > >> [balay at pj01 tutorials]$ make ex2 > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > >> -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include > >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > >> -L/home/balay/petsc/arch-linux-c-debug/lib > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > >> -lquadmath -o ex2 > >> [balay at pj01 tutorials]$ make clean > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include > >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > >> -L/home/balay/petsc/arch-linux-c-debug/lib > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > >> -lquadmath -o ex2 > >> [balay at pj01 tutorials]$ > >> <<<<< > >> > >> Satish > >> > >> > >> > Should I re-configure the petsc installation with --with-openmp=1 > >> option? I > >> > wonder are the building rules affected by this missing option? > >> > > >> > Thanks, > >> > Qiyue Lu > >> > > >> > >> > From qiyuelu1 at gmail.com Wed Oct 25 11:47:53 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Wed, 25 Oct 2023 11:47:53 -0500 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: Thanks, however, CUDAFLAGS doesn't work. Even I removed all -fopenmp string from the make file, it still complains nvcc doesn't know -fopenmp. It seems by enabling --with-openmp, there is a background -fopenmp option added without -Xcompiler to any compiler. On Wed, Oct 25, 2023 at 11:44?AM Satish Balay wrote: > I guess the flag you are looking for is CUDAFLAGS > > >>> > balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100 > CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp > /usr/local/cuda/bin/nvcc -o ex100.o -c > -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include -ccbin mpicxx > -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo > -gencode arch=compute_86,code=sm_86 -Xcompiler -fopenmp > -I/scratch/balay/petsc/include > -I/scratch/balay/petsc/arch-linux-c-debug/include > -I/usr/local/cuda/include `pwd`/ex100.cu > mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -g3 -O0 -fopenmp -Wl,-export-dynamic ex100.o > -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib > -L/scratch/balay/petsc/arch-linux-c-debug/lib > -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 > -L/usr/local/cuda/lib64/stubs > -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib > -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 > -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart > -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 > -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > -lquadmath -o ex100 > rm ex100.o > balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ > <<< > > Satish > > On Wed, 25 Oct 2023, Qiyue Lu wrote: > > > Even with > > CXXFLAGS=-Xcompiler -fopenmp -std=c++17 > > LDFLAGS= -Xcompiler -fopenmp > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common > > include ${PETSC_DIR}/lib/petsc/conf/variables > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > won't work. > > > > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu wrote: > > > > > Thanks for your reply, using this configurations: > > > > > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 > --with-openmp=1 > > > --with-threadsafety=1* > > > However, I got an error like: > > > *nvcc fatal : Unknown option '-fopenmp'* > > > Previously, when I don't have --with-openmp for the configuration, the > > > PETSc make system can build my *.cu code using nvcc and g++, of course, > > > OpenMP doesn't work. Now with this --with-openmp option, it cannot even > > > build. The interesting thing is, I got this error even after removing > the > > > *-fopenmp* from *CXXFLAGS* contents: > > > CXXFLAGS=-std=c++17 > > > LDFLAGS= > > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common > > > include ${PETSC_DIR}/lib/petsc/conf/variables > > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > > > > > > > > > Thanks, > > > Qiyue Lu > > > > > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay > wrote: > > > > > >> > > >> On Wed, 25 Oct 2023, Qiyue Lu wrote: > > >> > > >> > Hello, > > >> > I have an in-house code enabled OpenMP and it works. Now I am > trying to > > >> > incorporate PETSc as the linear solver and build together using the > > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I > found the > > >> > OpenMP part doesn't work anymore. > > >> > > >> If you are looking at building only your sources with openmp - using > > >> petsc formatted makefile [using petsc build rules], > > >> you can specify it via CFLAGS - either in makefile - or on command > line. > > >> > > >> >>>>>>> > > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with > the > > >> corresponding make fules] > > >> > > >> [balay at pj01 tutorials]$ make ex2 > > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > > >> -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include > > >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > > >> -L/home/balay/petsc/arch-linux-c-debug/lib > > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm > -lX11 > > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath > -lstdc++ > > >> -lquadmath -o ex2 > > >> [balay at pj01 tutorials]$ make clean > > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp > > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include > > >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > > >> -L/home/balay/petsc/arch-linux-c-debug/lib > > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm > -lX11 > > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath > -lstdc++ > > >> -lquadmath -o ex2 > > >> [balay at pj01 tutorials]$ > > >> <<<<< > > >> > > >> Satish > > >> > > >> > > >> > Should I re-configure the petsc installation with --with-openmp=1 > > >> option? I > > >> > wonder are the building rules affected by this missing option? > > >> > > > >> > Thanks, > > >> > Qiyue Lu > > >> > > > >> > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Oct 25 11:57:45 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 25 Oct 2023 12:57:45 -0400 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: I apologize, my suggestion of adding the --with-openmp fails in your environment. You should follow Satish's recommendation of adding the flags in your makefile exactly as needed. If you are looking at building only your sources with openmp - using petsc formatted makefile [using petsc build rules], you can specify it via CFLAGS - either in makefile - or on command line. >>>>>>> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the corresponding make fules] [balay at pj01 tutorials]$ make ex2 mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2 [balay at pj01 tutorials]$ make clean [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2 [balay at pj01 tutorials]$ > On Oct 25, 2023, at 12:47?PM, Qiyue Lu wrote: > > Thanks, however, CUDAFLAGS doesn't work. > Even I removed all -fopenmp string from the make file, it still complains nvcc doesn't know -fopenmp. It seems by enabling --with-openmp, there is a background -fopenmp option added without -Xcompiler to any compiler. > > On Wed, Oct 25, 2023 at 11:44?AM Satish Balay > wrote: >> I guess the flag you are looking for is CUDAFLAGS >> >> >>> >> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100 CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp >> /usr/local/cuda/bin/nvcc -o ex100.o -c -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include -ccbin mpicxx -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode arch=compute_86,code=sm_86 -Xcompiler -fopenmp -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-debug/include -I/usr/local/cuda/include `pwd`/ex100.cu >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -fopenmp -Wl,-export-dynamic ex100.o -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib -L/scratch/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex100 >> rm ex100.o >> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ >> <<< >> >> Satish >> >> On Wed, 25 Oct 2023, Qiyue Lu wrote: >> >> > Even with >> > CXXFLAGS=-Xcompiler -fopenmp -std=c++17 >> > LDFLAGS= -Xcompiler -fopenmp >> > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common >> > include ${PETSC_DIR}/lib/petsc/conf/variables >> > include ${PETSC_DIR}/lib/petsc/conf/rules >> > >> > won't work. >> > >> > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu > wrote: >> > >> > > Thanks for your reply, using this configurations: >> > > >> > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 >> > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1 >> > > --with-threadsafety=1* >> > > However, I got an error like: >> > > *nvcc fatal : Unknown option '-fopenmp'* >> > > Previously, when I don't have --with-openmp for the configuration, the >> > > PETSc make system can build my *.cu code using nvcc and g++, of course, >> > > OpenMP doesn't work. Now with this --with-openmp option, it cannot even >> > > build. The interesting thing is, I got this error even after removing the >> > > *-fopenmp* from *CXXFLAGS* contents: >> > > CXXFLAGS=-std=c++17 >> > > LDFLAGS= >> > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common >> > > include ${PETSC_DIR}/lib/petsc/conf/variables >> > > include ${PETSC_DIR}/lib/petsc/conf/rules >> > > >> > > >> > > >> > > Thanks, >> > > Qiyue Lu >> > > >> > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay > wrote: >> > > >> > >> >> > >> On Wed, 25 Oct 2023, Qiyue Lu wrote: >> > >> >> > >> > Hello, >> > >> > I have an in-house code enabled OpenMP and it works. Now I am trying to >> > >> > incorporate PETSc as the linear solver and build together using the >> > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the >> > >> > OpenMP part doesn't work anymore. >> > >> >> > >> If you are looking at building only your sources with openmp - using >> > >> petsc formatted makefile [using petsc build rules], >> > >> you can specify it via CFLAGS - either in makefile - or on command line. >> > >> >> > >> >>>>>>> >> > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the >> > >> corresponding make fules] >> > >> >> > >> [balay at pj01 tutorials]$ make ex2 >> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> > >> -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include >> > >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic >> > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib >> > >> -L/home/balay/petsc/arch-linux-c-debug/lib >> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib >> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 >> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 >> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ >> > >> -lquadmath -o ex2 >> > >> [balay at pj01 tutorials]$ make clean >> > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp >> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include >> > >> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic >> > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib >> > >> -L/home/balay/petsc/arch-linux-c-debug/lib >> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib >> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 >> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 >> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ >> > >> -lquadmath -o ex2 >> > >> [balay at pj01 tutorials]$ >> > >> <<<<< >> > >> >> > >> Satish >> > >> >> > >> >> > >> > Should I re-configure the petsc installation with --with-openmp=1 >> > >> option? I >> > >> > wonder are the building rules affected by this missing option? >> > >> > >> > >> > Thanks, >> > >> > Qiyue Lu >> > >> > >> > >> >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiyuelu1 at gmail.com Wed Oct 25 12:02:36 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Wed, 25 Oct 2023 12:02:36 -0500 Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules In-Reply-To: References: Message-ID: NO, NO, NO, Any try or suggestions are meaningful. I appreciate help from all of you. Have a nice day. Qiyue Lu On Wed, Oct 25, 2023 at 11:57?AM Barry Smith wrote: > > I apologize, my suggestion of adding the --with-openmp fails in your > environment. You should follow Satish's recommendation of adding the flags > in your makefile exactly as needed. > > If you are looking at building only your sources with openmp - using petsc > formatted makefile [using petsc build rules], > you can specify it via CFLAGS - either in makefile - or on command line. > > > For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the > corresponding make fules] > > [balay at pj01 tutorials]$ make ex2 > mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include > -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic > ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > -L/home/balay/petsc/arch-linux-c-debug/lib > -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 > -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > -lquadmath -o ex2 > [balay at pj01 tutorials]$ make clean > [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp > mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include > -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c > -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib > -L/home/balay/petsc/arch-linux-c-debug/lib > -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 > -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 > -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > -lquadmath -o ex2 > [balay at pj01 tutorials]$ > > > On Oct 25, 2023, at 12:47?PM, Qiyue Lu wrote: > > Thanks, however, CUDAFLAGS doesn't work. > Even I removed all -fopenmp string from the make file, it still complains > nvcc doesn't know -fopenmp. It seems by enabling --with-openmp, there is a > background -fopenmp option added without -Xcompiler to any compiler. > > On Wed, Oct 25, 2023 at 11:44?AM Satish Balay wrote: > >> I guess the flag you are looking for is CUDAFLAGS >> >> >>> >> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100 >> CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp >> /usr/local/cuda/bin/nvcc -o ex100.o -c >> -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include -ccbin mpicxx >> -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >> -gencode arch=compute_86,code=sm_86 -Xcompiler -fopenmp >> -I/scratch/balay/petsc/include >> -I/scratch/balay/petsc/arch-linux-c-debug/include >> -I/usr/local/cuda/include `pwd`/ex100.cu >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -g3 -O0 -fopenmp -Wl,-export-dynamic ex100.o >> -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib >> -L/scratch/balay/petsc/arch-linux-c-debug/lib >> -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 >> -L/usr/local/cuda/lib64/stubs >> -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib >> -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 >> -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart >> -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ >> -lquadmath -o ex100 >> rm ex100.o >> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ >> <<< >> >> Satish >> >> On Wed, 25 Oct 2023, Qiyue Lu wrote: >> >> > Even with >> > CXXFLAGS=-Xcompiler -fopenmp -std=c++17 >> > LDFLAGS= -Xcompiler -fopenmp >> > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common >> > include ${PETSC_DIR}/lib/petsc/conf/variables >> > include ${PETSC_DIR}/lib/petsc/conf/rules >> > >> > won't work. >> > >> > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu wrote: >> > >> > > Thanks for your reply, using this configurations: >> > > >> > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 >> > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 >> --with-openmp=1 >> > > --with-threadsafety=1* >> > > However, I got an error like: >> > > *nvcc fatal : Unknown option '-fopenmp'* >> > > Previously, when I don't have --with-openmp for the configuration, the >> > > PETSc make system can build my *.cu code using nvcc and g++, of >> course, >> > > OpenMP doesn't work. Now with this --with-openmp option, it cannot >> even >> > > build. The interesting thing is, I got this error even after removing >> the >> > > *-fopenmp* from *CXXFLAGS* contents: >> > > CXXFLAGS=-std=c++17 >> > > LDFLAGS= >> > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common >> > > include ${PETSC_DIR}/lib/petsc/conf/variables >> > > include ${PETSC_DIR}/lib/petsc/conf/rules >> > > >> > > >> > > >> > > Thanks, >> > > Qiyue Lu >> > > >> > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay >> wrote: >> > > >> > >> >> > >> On Wed, 25 Oct 2023, Qiyue Lu wrote: >> > >> >> > >> > Hello, >> > >> > I have an in-house code enabled OpenMP and it works. Now I am >> trying to >> > >> > incorporate PETSc as the linear solver and build together using the >> > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I >> found the >> > >> > OpenMP part doesn't work anymore. >> > >> >> > >> If you are looking at building only your sources with openmp - using >> > >> petsc formatted makefile [using petsc build rules], >> > >> you can specify it via CFLAGS - either in makefile - or on command >> line. >> > >> >> > >> >>>>>>> >> > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with >> the >> > >> corresponding make fules] >> > >> >> > >> [balay at pj01 tutorials]$ make ex2 >> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> > >> -fvisibility=hidden -g3 -O0 -I/home/balay/petsc/include >> > >> -I/home/balay/petsc/arch-linux-c-debug/include >> -Wl,-export-dynamic >> > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib >> > >> -L/home/balay/petsc/arch-linux-c-debug/lib >> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib >> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 >> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm >> -lX11 >> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath >> -lstdc++ >> > >> -lquadmath -o ex2 >> > >> [balay at pj01 tutorials]$ make clean >> > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp >> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include >> > >> -I/home/balay/petsc/arch-linux-c-debug/include >> -Wl,-export-dynamic >> > >> ex2.c -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib >> > >> -L/home/balay/petsc/arch-linux-c-debug/lib >> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib >> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 >> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm >> -lX11 >> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath >> -lstdc++ >> > >> -lquadmath -o ex2 >> > >> [balay at pj01 tutorials]$ >> > >> <<<<< >> > >> >> > >> Satish >> > >> >> > >> >> > >> > Should I re-configure the petsc installation with --with-openmp=1 >> > >> option? I >> > >> > wonder are the building rules affected by this missing option? >> > >> > >> > >> > Thanks, >> > >> > Qiyue Lu >> > >> > >> > >> >> > >> >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damynchipman at u.boisestate.edu Wed Oct 25 14:38:10 2023 From: damynchipman at u.boisestate.edu (Damyn Chipman) Date: Wed, 25 Oct 2023 13:38:10 -0600 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: <87h6mfmka5.fsf@jedbrown.org> References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> Message-ID: <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries? -Damyn > On Oct 24, 2023, at 9:51?PM, Jed Brown wrote: > > You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. > > That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. > > Damyn Chipman writes: > >> Hi PETSc developers, >> >> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? >> >> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. >> >> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. >> >> Thanks in advance, >> >> Damyn Chipman >> Boise State University >> PhD Candidate >> Computational Sciences and Engineering >> damynchipman at u.boisestate.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Oct 25 14:47:09 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 25 Oct 2023 15:47:09 -0400 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> Message-ID: If the matrices are stored as dense it is likely new code is the best way to go. What pieces live on the sub communicator? Is it an m by N matrix where m is the number of rows (on that rank) and N is the total number of columns in the final matrix? Or are they smaller "chunks" that need to be combined together? Barry > On Oct 25, 2023, at 3:38?PM, Damyn Chipman wrote: > > Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries? > > -Damyn > >> On Oct 24, 2023, at 9:51?PM, Jed Brown wrote: >> >> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. >> >> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. >> >> Damyn Chipman writes: >> >>> Hi PETSc developers, >>> >>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? >>> >>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. >>> >>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. >>> >>> Thanks in advance, >>> >>> Damyn Chipman >>> Boise State University >>> PhD Candidate >>> Computational Sciences and Engineering >>> damynchipman at u.boisestate.edu > From damynchipman at u.boisestate.edu Wed Oct 25 15:38:44 2023 From: damynchipman at u.boisestate.edu (Damyn Chipman) Date: Wed, 25 Oct 2023 14:38:44 -0600 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> Message-ID: <0500A373-0274-42C2-B127-5D964D203F44@u.boisestate.edu> More like smaller pieces that need to be combined. Combining them (merging) means sharing the actual data across a sibling communicator and doing some linear algebra to compute the merged matrices (it involves computing a Schur complement of a combined system from the sibling matrices). The solver is based on the Hierarchical Poincar?-Steklov (HPS) method, a direct method for solving elliptic PDEs. I had a conversation with Richard at this year?s ATPESC2023 about this idea. For some more context, here?s the test routine I wrote based on the MatCreateSubMatrix idea. The actual implementation would be part of a recursive merge up a quadtree. Each node's communicator would be a sub-communicator of its parent, and so on. I want to spread the data and compute across any ranks that are involved in that node?s merging. The sizes involved start ?small? at each leaf node (say, no more than 256x256), then are essentially doubled up the tree to the root node. ``` void TEST_petsc_matrix_comm() { // Create local matrices on local communicator int M_local = 4; int N_local = 4; Mat mat_local; MatCreate(MPI_COMM_SELF, &mat_local); // Note the MPI_COMM_SELF as a substitute for a sub-communicator of MPI_COMM_WORLD MatSetSizes(mat_local, PETSC_DECIDE, PETSC_DECIDE, M_local, N_local); MatSetFromOptions(mat_local); // Set values in local matrix int* row_indices = (int*) malloc(M_local*sizeof(int)); for (int i = 0; i < M_local; i++) { row_indices[i] = i; } int* col_indices = (int*) malloc(N_local*sizeof(int)); for (int j = 0; j < M_local; j++) {; col_indices[j] = j; } double* values = (double*) malloc(M_local*N_local*sizeof(double)); int v = M_local*N_local*rank; for (int j = 0; j < N_local; j++) { for (int i = 0; i < M_local; i++) { values[i + j*N_local] = (double) v; v++; } } MatSetValues(mat_local, M_local, row_indices, N_local, col_indices, values, INSERT_VALUES); MatAssemblyBegin(mat_local, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(mat_local, MAT_FINAL_ASSEMBLY); // Create local matrices on global communicator Mat mat_global; IS is_row; int idx[4] = {0, 1, 2, 3}; ISCreateGeneral(MPI_COMM_WORLD, M_local, idx, PETSC_COPY_VALUES, &is_row); MatCreateSubMatrix(mat_local, is_row, NULL, MAT_INITIAL_MATRIX, &mat_global); // View each local mat on global communicator (sleep for `rank` seconds so output is ordered) sleep(rank); MatView(mat_global, 0); // Create merged mat on global communicator // For this test, I just put the four locally computed matrices on the diagonal of the merged matrix // In the 4-to-1 merge, this would compute T_merged from T_alpha, T_beta, T_gamma, and T_omega (children) int M_merged = M_local*size; int N_merged = N_local*size; Mat mat_merged; MatCreate(MPI_COMM_WORLD, &mat_merged); MatSetSizes(mat_merged, PETSC_DECIDE, PETSC_DECIDE, M_merged, N_merged); MatSetFromOptions(mat_merged); // Get values of local matrix to put on diagonal double* values_diag = (double*) malloc(M_local*N_local*sizeof(double)); MatGetValues(mat_global, M_local, row_indices, N_local, col_indices, values_diag); // Put local matrix contributions into merged matrix (placeholder for computing merged matrix) for (int i = 0; i < M_local; i++) { row_indices[i] = i + M_local*rank; } for (int j = 0; j < N_local; j++) { col_indices[j] = j + N_local*rank; } MatSetValues(mat_merged, M_local, row_indices, N_local, col_indices, values_diag, INSERT_VALUES); MatAssemblyBegin(mat_merged, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(mat_merged, MAT_FINAL_ASSEMBLY); // View merged mat on global communicator sleep(rank); MatView(mat_merged, 0); // Clean up free(row_indices); free(col_indices); free(values); free(values_diag); MatDestroy(&mat_local); MatDestroy(&mat_global); MatDestroy(&mat_merged); } ``` With the following output : ``` (base) ? mpi git:(feature-parallel) ? mpirun -n 4 ./mpi_matrix Mat Object: 1 MPI process type: seqaij row 0: (0, 0.) (1, 1.) (2, 2.) (3, 3.) row 1: (0, 4.) (1, 5.) (2, 6.) (3, 7.) row 2: (0, 8.) (1, 9.) (2, 10.) (3, 11.) row 3: (0, 12.) (1, 13.) (2, 14.) (3, 15.) Mat Object: 1 MPI process type: seqaij row 0: (0, 16.) (1, 17.) (2, 18.) (3, 19.) row 1: (0, 20.) (1, 21.) (2, 22.) (3, 23.) row 2: (0, 24.) (1, 25.) (2, 26.) (3, 27.) row 3: (0, 28.) (1, 29.) (2, 30.) (3, 31.) Mat Object: 1 MPI process type: seqaij row 0: (0, 32.) (1, 33.) (2, 34.) (3, 35.) row 1: (0, 36.) (1, 37.) (2, 38.) (3, 39.) row 2: (0, 40.) (1, 41.) (2, 42.) (3, 43.) row 3: (0, 44.) (1, 45.) (2, 46.) (3, 47.) Mat Object: 1 MPI process type: seqaij row 0: (0, 48.) (1, 49.) (2, 50.) (3, 51.) row 1: (0, 52.) (1, 53.) (2, 54.) (3, 55.) row 2: (0, 56.) (1, 57.) (2, 58.) (3, 59.) row 3: (0, 60.) (1, 61.) (2, 62.) (3, 63.) Mat Object: 4 MPI processes type: mpiaij row 0: (0, 0.) (1, 1.) (2, 2.) (3, 3.) row 1: (0, 4.) (1, 5.) (2, 6.) (3, 7.) row 2: (0, 8.) (1, 9.) (2, 10.) (3, 11.) row 3: (0, 12.) (1, 13.) (2, 14.) (3, 15.) row 4: (4, 16.) (5, 17.) (6, 18.) (7, 19.) row 5: (4, 20.) (5, 21.) (6, 22.) (7, 23.) row 6: (4, 24.) (5, 25.) (6, 26.) (7, 27.) row 7: (4, 28.) (5, 29.) (6, 30.) (7, 31.) row 8: (8, 32.) (9, 33.) (10, 34.) (11, 35.) row 9: (8, 36.) (9, 37.) (10, 38.) (11, 39.) row 10: (8, 40.) (9, 41.) (10, 42.) (11, 43.) row 11: (8, 44.) (9, 45.) (10, 46.) (11, 47.) row 12: (12, 48.) (13, 49.) (14, 50.) (15, 51.) row 13: (12, 52.) (13, 53.) (14, 54.) (15, 55.) row 14: (12, 56.) (13, 57.) (14, 58.) (15, 59.) row 15: (12, 60.) (13, 61.) (14, 62.) (15, 63.) ``` -Damyn > On Oct 25, 2023, at 1:47?PM, Barry Smith wrote: > > > If the matrices are stored as dense it is likely new code is the best way to go. > > What pieces live on the sub communicator? Is it an m by N matrix where m is the number of rows (on that rank) and N is the total number of columns in the final matrix? Or are they smaller "chunks" that need to be combined together? > > Barry > > >> On Oct 25, 2023, at 3:38?PM, Damyn Chipman wrote: >> >> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries? >> >> -Damyn >> >>> On Oct 24, 2023, at 9:51?PM, Jed Brown wrote: >>> >>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. >>> >>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. >>> >>> Damyn Chipman writes: >>> >>>> Hi PETSc developers, >>>> >>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? >>>> >>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. >>>> >>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. >>>> >>>> Thanks in advance, >>>> >>>> Damyn Chipman >>>> Boise State University >>>> PhD Candidate >>>> Computational Sciences and Engineering >>>> damynchipman at u.boisestate.edu >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiyuelu1 at gmail.com Thu Oct 26 07:58:14 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Thu, 26 Oct 2023 07:58:14 -0500 Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays Message-ID: Hello, I am trying to incorporate PETSc as a linear solver to compute Ax=b in my code. Currently, the sequential version works. 1) I have the global matrix A in CSR format and they are stored in three 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works. 2) I am trying to use multicores, and when I use "srun -n 6", I got the error *Comm must be of size 1* from the MatCreateSeqAIJWithArrays. Saying I cannot use SEQ function in a parallel context. 3) I don't think MatCreateMPIAIJWithArrays and MatMPIAIJSetPreallocationCSR are good options for me, since I already have the global matrix as a whole. I wonder, from the global CSR format data, how can I reach the PETSc format matrix for parallel KSP computation. Are the MatSetValue, MatSetValues what I need? Thanks, Qiyue Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Oct 26 09:08:50 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 26 Oct 2023 09:08:50 -0500 Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays In-Reply-To: References: Message-ID: On Thu, Oct 26, 2023 at 8:21?AM Qiyue Lu wrote: > Hello, > I am trying to incorporate PETSc as a linear solver to compute Ax=b in my > code. Currently, the sequential version works. > 1) I have the global matrix A in CSR format and they are stored in three > 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using > MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works. > 2) I am trying to use multicores, and when I use "srun -n 6", I got the > error *Comm must be of size 1* from the MatCreateSeqAIJWithArrays. Saying > I cannot use SEQ function in a parallel context. > 3) I don't think MatCreateMPIAIJWithArrays and > MatMPIAIJSetPreallocationCSR are good options for me, since I already have > the global matrix as a whole. > > I wonder, from the global CSR format data, how can I reach the PETSc > format matrix for parallel KSP computation. Are the MatSetValue, > MatSetValues what I need? > Yes, MatSetValues on each row. Your matrix data is originally on one process, which is not efficient. You could try to distribute it at the beginning. > > Thanks, > Qiyue Lu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 26 09:30:20 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 26 Oct 2023 10:30:20 -0400 Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays In-Reply-To: References: Message-ID: <9593F9C1-178E-4CC3-8BFC-2EDAD29ABC05@petsc.dev> Is your code sequential (with possibly OpenMP) or MPI parallel? Do you plan to make your part of the code MPI parallel? If it is sequential or OpenMP parallel you might consider using the new feature https://petsc.org/release/manualpages/PC/PCMPI/#pcmpi Depending on your system it is an easy way to run linear solver in parallel while the code is sequential and can give some reasonable speedup. > On Oct 26, 2023, at 8:58?AM, Qiyue Lu wrote: > > Hello, > I am trying to incorporate PETSc as a linear solver to compute Ax=b in my code. Currently, the sequential version works. > 1) I have the global matrix A in CSR format and they are stored in three 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works. > 2) I am trying to use multicores, and when I use "srun -n 6", I got the error Comm must be of size 1 from the MatCreateSeqAIJWithArrays. Saying I cannot use SEQ function in a parallel context. > 3) I don't think MatCreateMPIAIJWithArrays and MatMPIAIJSetPreallocationCSR are good options for me, since I already have the global matrix as a whole. > > I wonder, from the global CSR format data, how can I reach the PETSc format matrix for parallel KSP computation. Are the MatSetValue, MatSetValues what I need? > > Thanks, > Qiyue Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: From joauma.marichal at uclouvain.be Thu Oct 26 09:35:37 2023 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Thu, 26 Oct 2023 14:35:37 +0000 Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors In-Reply-To: References: Message-ID: Hello, Here is a very simple version where I have issues. Which I run as follows: cd Grid_generation make clean make all ./grid_generation cd .. make clean make all ./cobpor # on 1 proc # OR mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs The error that I get is the following: munmap_chunk(): invalid pointer [cns266:2552391] *** Process received signal *** [cns266:2552391] Signal: Aborted (6) [cns266:2552391] Signal code: (-6) [cns266:2552391] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7fd7fd194b20] [cns266:2552391] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fd7fd194a9f] [cns266:2552391] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fd7fd167e05] [cns266:2552391] [ 3] /lib64/libc.so.6(+0x91037)[0x7fd7fd1d7037] [cns266:2552391] [ 4] /lib64/libc.so.6(+0x9819c)[0x7fd7fd1de19c] [cns266:2552391] [ 5] /lib64/libc.so.6(+0x9844c)[0x7fd7fd1de44c] [cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e] [cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad] [cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59] [cns266:2552391] [ 9] ./cobpor[0x402df9] [cns266:2552391] [10] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7fd7fd180cf3] [cns266:2552391] [11] ./cobpor[0x40304e] [cns266:2552391] *** End of error message *** Thanks a lot for your help. Best regards, Joauma De : Matthew Knepley Date : mercredi, 25 octobre 2023 ? 14:45 ? : Joauma Marichal Cc : petsc-maint at mcs.anl.gov , petsc-users at mcs.anl.gov Objet : Re: [petsc-maint] DMSwarm on multiple processors On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint > wrote: Hello, I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water. I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error: free(): invalid size [cns136:590327] *** Process received signal *** [cns136:590327] Signal: Aborted (6) [cns136:590327] Signal code: (-6) [cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20] [cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f] [cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05] [cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037] [cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c] [cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac] [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64] [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642] [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e] [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde] [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8] [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448] [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20] [cns136:590327] [13] ./cobpor[0x4418dc] [cns136:590327] [14] ./cobpor[0x408b63] [cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3] [cns136:590327] [16] ./cobpor[0x40bdee] [cns136:590327] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted). -------------------------------------------------------------------------- When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works. The problem seems to take place at this moment: DMCreate(PETSC_COMM_WORLD,swarm); DMSetType(*swarm,DMSWARM); DMSetDimension(*swarm,3); DMSwarmSetType(*swarm,DMSWARM_PIC); DMSwarmSetCellDM(*swarm,*dmcell); Thanks a lot for your help. Things that would help us track this down: 1) The smallest example where it fails 2) The smallest number of processes where it fails 3) A stack trace of the failure 4) A simple example that we can run that also fails Thanks, Matt Best regards, Joauma -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joauma.marichal at uclouvain.be Thu Oct 26 09:36:31 2023 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Thu, 26 Oct 2023 14:36:31 +0000 Subject: [petsc-users] =?iso-8859-1?q?Joauma_Marichal_a_partag=E9_le_doss?= =?iso-8859-1?q?ier_=AB=A0marha=A0=BB_avec_vous?= Message-ID: [Partager l'image] Joauma Marichal a partag? un dossier avec vous Joauma Marichal a partag? ce dossier avec vous. [icon] marha [permission globe icon] Ce lien ne fonctionne que pour les destinataires directs de ce message. Ouvrir [Microsoft logo] [cid:faf45f49-2eb0-45c1-831d-d87e9a739e5c] D?claration de confidentialit? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AttachedImage Type: image/png Size: 2877 bytes Desc: AttachedImage URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AttachedImage Type: image/png Size: 560 bytes Desc: AttachedImage URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AttachedImage Type: image/png Size: 2133 bytes Desc: AttachedImage URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AttachedImage Type: image/png Size: 5135 bytes Desc: AttachedImage URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AttachedImage Type: image/png Size: 3404 bytes Desc: AttachedImage URL: From bsmith at petsc.dev Thu Oct 26 09:58:55 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 26 Oct 2023 10:58:55 -0400 Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors In-Reply-To: References: Message-ID: <595BCCE5-C4DD-4B7E-B7E7-9D07961DD24D@petsc.dev> Please run with -malloc_debug option or even better run under Valgrind https://petsc.org/release/faq/ > On Oct 26, 2023, at 10:35?AM, Joauma Marichal via petsc-users wrote: > > Hello, > > Here is a very simple version where I have issues. > > Which I run as follows: > > cd Grid_generation > make clean > make all > ./grid_generation > cd .. > make clean > make all > ./cobpor # on 1 proc > # OR > mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs > > The error that I get is the following: > munmap_chunk(): invalid pointer > [cns266:2552391] *** Process received signal *** > [cns266:2552391] Signal: Aborted (6) > [cns266:2552391] Signal code: (-6) > [cns266:2552391] [ 0] /lib64/libc.so .6(+0x4eb20)[0x7fd7fd194b20] > [cns266:2552391] [ 1] /lib64/libc.so .6(gsignal+0x10f)[0x7fd7fd194a9f] > [cns266:2552391] [ 2] /lib64/libc.so .6(abort+0x127)[0x7fd7fd167e05] > [cns266:2552391] [ 3] /lib64/libc.so .6(+0x91037)[0x7fd7fd1d7037] > [cns266:2552391] [ 4] /lib64/libc.so .6(+0x9819c)[0x7fd7fd1de19c] > [cns266:2552391] [ 5] /lib64/libc.so .6(+0x9844c)[0x7fd7fd1de44c] > [cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e] > [cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(DMSetMatType+0x3d)[0x7fd7feab87ad] > [cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59] > [cns266:2552391] [ 9] ./cobpor[0x402df9] > [cns266:2552391] [10] /lib64/libc.so .6(__libc_start_main+0xf3)[0x7fd7fd180cf3] > [cns266:2552391] [11] ./cobpor[0x40304e] > [cns266:2552391] *** End of error message *** > > > Thanks a lot for your help. > > Best regards, > > Joauma > > > > De : Matthew Knepley > > Date : mercredi, 25 octobre 2023 ? 14:45 > ? : Joauma Marichal > > Cc : petsc-maint at mcs.anl.gov >, petsc-users at mcs.anl.gov > > Objet : Re: [petsc-maint] DMSwarm on multiple processors > > On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint > wrote: > Hello, > > I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water. > I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error: > > free(): invalid size > > [cns136:590327] *** Process received signal *** > > [cns136:590327] Signal: Aborted (6) > > [cns136:590327] Signal code: (-6) > > [cns136:590327] [ 0] /lib64/libc.so .6(+0x4eb20)[0x7f56cd4c9b20] > > [cns136:590327] [ 1] /lib64/libc.so .6(gsignal+0x10f)[0x7f56cd4c9a9f] > > [cns136:590327] [ 2] /lib64/libc.so .6(abort+0x127)[0x7f56cd49ce05] > > [cns136:590327] [ 3] /lib64/libc.so .6(+0x91037)[0x7f56cd50c037] > > [cns136:590327] [ 4] /lib64/libc.so .6(+0x9819c)[0x7f56cd51319c] > > [cns136:590327] [ 5] /lib64/libc.so .6(+0x99aac)[0x7f56cd514aac] > > [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64] > > [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(+0x841642)[0x7f56cea83642] > > [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e] > > [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde] > > [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8] > > [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448] > > [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so .3.019(DMSetUp+0x20)[0x7f56cededa20] > > [cns136:590327] [13] ./cobpor[0x4418dc] > > [cns136:590327] [14] ./cobpor[0x408b63] > > [cns136:590327] [15] /lib64/libc.so .6(__libc_start_main+0xf3)[0x7f56cd4b5cf3] > > [cns136:590327] [16] ./cobpor[0x40bdee] > > [cns136:590327] *** End of error message *** > > -------------------------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code. Per user-direction, the job has been aborted. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted). > > -------------------------------------------------------------------------- > > > When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works. > The problem seems to take place at this moment: > > DMCreate(PETSC_COMM_WORLD,swarm); > DMSetType(*swarm,DMSWARM); > DMSetDimension(*swarm,3); > DMSwarmSetType(*swarm,DMSWARM_PIC); > DMSwarmSetCellDM(*swarm,*dmcell); > > > Thanks a lot for your help. > > Things that would help us track this down: > > 1) The smallest example where it fails > > 2) The smallest number of processes where it fails > > 3) A stack trace of the failure > > 4) A simple example that we can run that also fails > > Thanks, > > Matt > > Best regards, > > Joauma > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 26 11:01:42 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 26 Oct 2023 12:01:42 -0400 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> Message-ID: On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman < damynchipman at u.boisestate.edu> wrote: > Great thanks, that seemed to work well. This is something my algorithm > will do fairly often (?elevating? a node?s communicator to a communicator > that includes siblings). The matrices formed are dense but low rank. With > MatCreateSubMatrix, it appears I do a lot of copying from one Mat to > another. Is there a way to do it with array copying or pointer movement > instead of copying entries? > We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness. Thanks, Matt > -Damyn > > On Oct 24, 2023, at 9:51?PM, Jed Brown wrote: > > You can place it in a parallel Mat (that has rows or columns on only one > rank or a subset of ranks) and then MatCreateSubMatrix with all new > rows/columns on a different rank or subset of ranks. > > That said, you usually have a function that assembles the matrix and you > can just call that on the new communicator. > > Damyn Chipman writes: > > Hi PETSc developers, > > In short, my question is this: Does PETSc provide a way to move or copy an > object (say a Mat) from one communicator to another? > > The more detailed scenario is this: I?m working on a linear algebra solver > on quadtree meshes (i.e., p4est). I use communicator subsets in order to > facilitate communication between siblings or nearby neighbors. When > performing linear algebra across siblings (a group of 4), I need to copy a > node?s data (i.e., a Mat object) from a sibling?s communicator to the > communicator that includes the four siblings. From what I can tell, I can > only copy a PETSc object onto the same communicator. > > My current approach will be to copy the raw data from the Mat on one > communicator to a new Mat on the new communicator, but I wanted to see if > there is a more ?elegant? approach within PETSc. > > Thanks in advance, > > Damyn Chipman > Boise State University > PhD Candidate > Computational Sciences and Engineering > damynchipman at u.boisestate.edu > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 26 14:34:49 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 26 Oct 2023 15:34:49 -0400 Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors In-Reply-To: <595BCCE5-C4DD-4B7E-B7E7-9D07961DD24D@petsc.dev> References: <595BCCE5-C4DD-4B7E-B7E7-9D07961DD24D@petsc.dev> Message-ID: Okay, there were a few problems: 1) You overwrote the bounds on string loc_grid_gen[] 2) You destroyed the coordinate DA I fixed these and it runs for me fine on several processes. I am including my revised source since I check a lot more error values. I converted it to C because that is easier for me, although C has a problem with your sqrt() in a compile-time constant. Thanks, Matt On Thu, Oct 26, 2023 at 10:59?AM Barry Smith wrote: > > Please run with -malloc_debug option or even better run under Valgrind > https://petsc.org/release/faq/ > > > > On Oct 26, 2023, at 10:35?AM, Joauma Marichal via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, > > Here is a very simple version where I have issues. > > Which I run as follows: > > cd Grid_generation > make clean > make all > ./grid_generation > cd .. > make clean > make all > ./cobpor # on 1 proc > # OR > mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct > -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs > > The error that I get is the following: > munmap_chunk(): invalid pointer > [cns266:2552391] *** Process received signal *** > [cns266:2552391] Signal: Aborted (6) > [cns266:2552391] Signal code: (-6) > [cns266:2552391] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7fd7fd194b20] > [cns266:2552391] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fd7fd194a9f] > [cns266:2552391] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fd7fd167e05] > [cns266:2552391] [ 3] /lib64/libc.so.6(+0x91037)[0x7fd7fd1d7037] > [cns266:2552391] [ 4] /lib64/libc.so.6(+0x9819c)[0x7fd7fd1de19c] > [cns266:2552391] [ 5] /lib64/libc.so.6(+0x9844c)[0x7fd7fd1de44c] > [cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e] > [cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad] > [cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59] > [cns266:2552391] [ 9] ./cobpor[0x402df9] > [cns266:2552391] [10] /lib64/libc.so > .6(__libc_start_main+0xf3)[0x7fd7fd180cf3] > [cns266:2552391] [11] ./cobpor[0x40304e] > [cns266:2552391] *** End of error message *** > > > Thanks a lot for your help. > > Best regards, > > Joauma > > > > > *De : *Matthew Knepley > *Date : *mercredi, 25 octobre 2023 ? 14:45 > *? : *Joauma Marichal > *Cc : *petsc-maint at mcs.anl.gov , > petsc-users at mcs.anl.gov > *Objet : *Re: [petsc-maint] DMSwarm on multiple processors > On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint < > petsc-maint at mcs.anl.gov> wrote: > > Hello, > > I am using the DMSwarm library in some Eulerian-Lagrangian approach to > have vapor bubbles in water. > I have obtained nice results recently and wanted to perform bigger > simulations. Unfortunately, when I increase the number of processors used > to run the simulation, I get the following error: > > > free(): invalid size > > [cns136:590327] *** Process received signal *** > > [cns136:590327] Signal: Aborted (6) > > [cns136:590327] Signal code: (-6) > > [cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20] > > [cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f] > > [cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05] > > [cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037] > > [cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c] > > [cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac] > > [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64] > > [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(+0x841642)[0x7f56cea83642] > > [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e] > > [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde] > > [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8] > > [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448] > > [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/ > libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20] > > [cns136:590327] [13] ./cobpor[0x4418dc] > > [cns136:590327] [14] ./cobpor[0x408b63] > > [cns136:590327] [15] /lib64/libc.so > .6(__libc_start_main+0xf3)[0x7f56cd4b5cf3] > > [cns136:590327] [16] ./cobpor[0x40bdee] > > [cns136:590327] *** End of error message *** > > -------------------------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code. Per user-direction, the job has been aborted. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited > on signal 6 (Aborted). > > -------------------------------------------------------------------------- > > When I reduce the number of processors the error disappears and when I run > my code without the vapor bubbles it also works. > The problem seems to take place at this moment: > > DMCreate(PETSC_COMM_WORLD,swarm); > DMSetType(*swarm,DMSWARM); > DMSetDimension(*swarm,3); > DMSwarmSetType(*swarm,DMSWARM_PIC); > DMSwarmSetCellDM(*swarm,*dmcell); > > > Thanks a lot for your help. > > > Things that would help us track this down: > > 1) The smallest example where it fails > > 2) The smallest number of processes where it fails > > 3) A stack trace of the failure > > 4) A simple example that we can run that also fails > > Thanks, > > Matt > > > Best regards, > > Joauma > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: global.h Type: application/octet-stream Size: 11245 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: global_evap.h Type: application/octet-stream Size: 1099 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main.c Type: application/octet-stream Size: 16931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile_matt Type: application/octet-stream Size: 135 bytes Desc: not available URL: From damian.marek at mail.utoronto.ca Fri Oct 27 08:12:34 2023 From: damian.marek at mail.utoronto.ca (Damian Marek) Date: Fri, 27 Oct 2023 13:12:34 +0000 Subject: [petsc-users] MatDenseSetLDA Documentation Clarification Message-ID: Hello, I found a minor issue with the documentation for MatDenseSetLDA, which ended up causing my program to idle unexpectedly. In the documentation it is specified as "Not Collective". However, in the MPIDense implementation it is possible for PetscLayoutSetUp to be called, which is a collective. (Lines 201-202: https://petsc.org/main/src/mat/impls/dense/mpi/mpidense.c.html#MatDenseSetLDA_MPIDense) Regards, Damian -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 27 10:11:59 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 27 Oct 2023 11:11:59 -0400 Subject: [petsc-users] MatDenseSetLDA Documentation Clarification In-Reply-To: References: Message-ID: <61F224FC-8C7E-430D-A00C-FC0167C54F38@petsc.dev> Damian, Thanks for the report. Fixing in https://gitlab.com/petsc/petsc/-/merge_requests/6972 Barry > On Oct 27, 2023, at 9:12?AM, Damian Marek wrote: > > Hello, > > I found a minor issue with the documentation for MatDenseSetLDA, which ended up causing my program to idle unexpectedly. In the documentation it is specified as "Not Collective". However, in the MPIDense implementation it is possible for PetscLayoutSetUp to be called, which is a collective. (Lines 201-202: https://petsc.org/main/src/mat/impls/dense/mpi/mpidense.c.html#MatDenseSetLDA_MPIDense) > > Regards, > Damian -------------- next part -------------- An HTML attachment was scrubbed... URL: From damynchipman at u.boisestate.edu Fri Oct 27 14:53:56 2023 From: damynchipman at u.boisestate.edu (Damyn Chipman) Date: Fri, 27 Oct 2023 13:53:56 -0600 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> Message-ID: <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu> Yeah, I?ll make an issue and use a modified version of this test routine. Does anything change if I will be using MATSCALAPACK matrices instead of the built in MATDENSE? Like I said, I will be computing Schur complements and need to use a parallel and dense matrix format. -Damyn > On Oct 26, 2023, at 10:01?AM, Matthew Knepley wrote: > > On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman > wrote: >> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries? > > We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness. > > Thanks, > > Matt > >> -Damyn >> >>> On Oct 24, 2023, at 9:51?PM, Jed Brown > wrote: >>> >>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. >>> >>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. >>> >>> Damyn Chipman > writes: >>> >>>> Hi PETSc developers, >>>> >>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? >>>> >>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. >>>> >>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. >>>> >>>> Thanks in advance, >>>> >>>> Damyn Chipman >>>> Boise State University >>>> PhD Candidate >>>> Computational Sciences and Engineering >>>> damynchipman at u.boisestate.edu >> > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sat Oct 28 02:35:33 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sat, 28 Oct 2023 09:35:33 +0200 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu> References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu> Message-ID: <7281ED63-0D4B-4A56-97DA-40781D27D856@dsic.upv.es> Currently MATSCALAPACK does not support MatCreateSubMatrix(). I guess it would not be difficult to implement. Jose > El 27 oct 2023, a las 21:53, Damyn Chipman escribi?: > > Yeah, I?ll make an issue and use a modified version of this test routine. > > Does anything change if I will be using MATSCALAPACK matrices instead of the built in MATDENSE? Like I said, I will be computing Schur complements and need to use a parallel and dense matrix format. > > -Damyn > >> On Oct 26, 2023, at 10:01?AM, Matthew Knepley wrote: >> >> On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman wrote: >> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries? >> >> We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness. >> >> Thanks, >> >> Matt >> >> -Damyn >> >>> On Oct 24, 2023, at 9:51?PM, Jed Brown wrote: >>> >>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. >>> >>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. >>> >>> Damyn Chipman writes: >>> >>>> Hi PETSc developers, >>>> >>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? >>>> >>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. >>>> >>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. >>>> >>>> Thanks in advance, >>>> >>>> Damyn Chipman >>>> Boise State University >>>> PhD Candidate >>>> Computational Sciences and Engineering >>>> damynchipman at u.boisestate.edu >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > From qiyuelu1 at gmail.com Sat Oct 28 09:20:16 2023 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Sat, 28 Oct 2023 09:20:16 -0500 Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays In-Reply-To: <9593F9C1-178E-4CC3-8BFC-2EDAD29ABC05@petsc.dev> References: <9593F9C1-178E-4CC3-8BFC-2EDAD29ABC05@petsc.dev> Message-ID: Yes, this is exactly what I need. And it works now. For the record to other potential users: 1) PetscInitialize and PCMPIServerBegin looping start 2) Sequential code part, and as Junchao said, MatSetValues on each row to create matrices. 3) parallel KSPSolve looping end 4) PCMPIServerEnd and PetscFinalize Thank you all for these suggestions. Have a good weekend. Qiyue Lu On Thu, Oct 26, 2023 at 9:30?AM Barry Smith wrote: > > Is your code sequential (with possibly OpenMP) or MPI parallel? Do you > plan to make your part of the code MPI parallel? > > If it is sequential or OpenMP parallel you might consider using the > new feature https://petsc.org/release/manualpages/PC/PCMPI/#pcmpi Depending > on your system it is an easy way to run linear solver in parallel while the > code is sequential and can give some reasonable speedup. > > On Oct 26, 2023, at 8:58?AM, Qiyue Lu wrote: > > Hello, > I am trying to incorporate PETSc as a linear solver to compute Ax=b in my > code. Currently, the sequential version works. > 1) I have the global matrix A in CSR format and they are stored in three > 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using > MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works. > 2) I am trying to use multicores, and when I use "srun -n 6", I got the > error *Comm must be of size 1* from the MatCreateSeqAIJWithArrays. Saying > I cannot use SEQ function in a parallel context. > 3) I don't think MatCreateMPIAIJWithArrays and > MatMPIAIJSetPreallocationCSR are good options for me, since I already have > the global matrix as a whole. > > I wonder, from the global CSR format data, how can I reach the PETSc > format matrix for parallel KSP computation. Are the MatSetValue, > MatSetValues what I need? > > Thanks, > Qiyue Lu > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Oct 28 09:56:47 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 28 Oct 2023 10:56:47 -0400 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu> References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu> Message-ID: On Fri, Oct 27, 2023 at 3:54?PM Damyn Chipman wrote: > Yeah, I?ll make an issue and use a modified version of this test routine. > > Does anything change if I will be using MATSCALAPACK matrices instead of > the built in MATDENSE? > No, that is likely worse. > Like I said, I will be computing Schur complements and need to use a > parallel and dense matrix format. > I do not understand the communication pattern, but it is possible that Elemental would be slightly faster since it has some cool built-in communication operations, however it might be more programming. Thanks, Matt > -Damyn > > On Oct 26, 2023, at 10:01?AM, Matthew Knepley wrote: > > On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman < > damynchipman at u.boisestate.edu> wrote: > >> Great thanks, that seemed to work well. This is something my algorithm >> will do fairly often (?elevating? a node?s communicator to a communicator >> that includes siblings). The matrices formed are dense but low rank. With >> MatCreateSubMatrix, it appears I do a lot of copying from one Mat to >> another. Is there a way to do it with array copying or pointer movement >> instead of copying entries? >> > > We could make a fast path for dense that avoids MatSetValues(). Can you > make an issue for this? The number one thing that would make this faster is > to contribute a small test. Then we could run it continually when putting > in the fast path to make sure we are preserving correctness. > > Thanks, > > Matt > > >> -Damyn >> >> On Oct 24, 2023, at 9:51?PM, Jed Brown wrote: >> >> You can place it in a parallel Mat (that has rows or columns on only one >> rank or a subset of ranks) and then MatCreateSubMatrix with all new >> rows/columns on a different rank or subset of ranks. >> >> That said, you usually have a function that assembles the matrix and you >> can just call that on the new communicator. >> >> Damyn Chipman writes: >> >> Hi PETSc developers, >> >> In short, my question is this: Does PETSc provide a way to move or copy >> an object (say a Mat) from one communicator to another? >> >> The more detailed scenario is this: I?m working on a linear algebra >> solver on quadtree meshes (i.e., p4est). I use communicator subsets in >> order to facilitate communication between siblings or nearby neighbors. >> When performing linear algebra across siblings (a group of 4), I need to >> copy a node?s data (i.e., a Mat object) from a sibling?s communicator to >> the communicator that includes the four siblings. From what I can tell, I >> can only copy a PETSc object onto the same communicator. >> >> My current approach will be to copy the raw data from the Mat on one >> communicator to a new Mat on the new communicator, but I wanted to see if >> there is a more ?elegant? approach within PETSc. >> >> Thanks in advance, >> >> Damyn Chipman >> Boise State University >> PhD Candidate >> Computational Sciences and Engineering >> damynchipman at u.boisestate.edu >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From onur.notonur at proton.me Mon Oct 30 04:37:21 2023 From: onur.notonur at proton.me (onur.notonur) Date: Mon, 30 Oct 2023 09:37:21 +0000 Subject: [petsc-users] Advices on creating DMPlex from custom input format Message-ID: Hi, I hope this message finds you all in good health and high spirits. I wanted to discuss an approach problem input file reading/processing in a solver which is using PETSc DMPlex. In our team we have a range of solvers, they are not built on PETSc except this one, but they all share a common problem input format. This format includes essential data such as node coordinates, element connectivity, boundary conditions based on elements, and specific metadata related to the problem. I create an array for boundary points on each rank and utilize them in our computations, I am doing it hardcoded currently but I need to start reading those input files, But I am not sure about the approach. Here's what I have in mind: - - Begin by reading the node coordinates and connectivity on a single core. - Utilize the DMPlexCreateFromCellListPetsc() function to construct the DMPlex. - Distribute the mesh across processors. - Proceed to read and process the boundary conditions on each processor. If the global index of the boundary element corresponds to that processor, we process it; otherwise, we pass. Additionally, maybe I need to reorder the mesh. In that case I think I can use the point permutation IS obtained from the DMPlexGetOrdering() function while processing boundary conditions. Also I have another approach in my mind but I don't know if it's possible: Read/construct DMPlex on single core including boundary conditions. Store BC related data in Vec or another appropriate data structure. Then distribute this BC holding data structure too as well as DMPlex. I would greatly appreciate your thoughts and any suggestions you might have regarding this approach. Looking forward to hearing your insights. Best regards, Onur -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Oct 30 11:34:51 2023 From: jed at jedbrown.org (Jed Brown) Date: Mon, 30 Oct 2023 10:34:51 -0600 Subject: [petsc-users] Advices on creating DMPlex from custom input format In-Reply-To: References: Message-ID: <874ji8jcgk.fsf@jedbrown.org> It's probably easier to apply boundary conditions when you have the serial mesh. You may consider contributing the reader if it's a format that others use. "onur.notonur via petsc-users" writes: > Hi, > > I hope this message finds you all in good health and high spirits. > > I wanted to discuss an approach problem input file reading/processing in a solver which is using PETSc DMPlex. In our team we have a range of solvers, they are not built on PETSc except this one, but they all share a common problem input format. This format includes essential data such as node coordinates, element connectivity, boundary conditions based on elements, and specific metadata related to the problem. I create an array for boundary points on each rank and utilize them in our computations, I am doing it hardcoded currently but I need to start reading those input files, But I am not sure about the approach. > > Here's what I have in mind: > > - - Begin by reading the node coordinates and connectivity on a single core. > - Utilize the DMPlexCreateFromCellListPetsc() function to construct the DMPlex. > - Distribute the mesh across processors. > - Proceed to read and process the boundary conditions on each processor. If the global index of the boundary element corresponds to that processor, we process it; otherwise, we pass. > > Additionally, maybe I need to reorder the mesh. In that case I think I can use the point permutation IS obtained from the DMPlexGetOrdering() function while processing boundary conditions. > > Also I have another approach in my mind but I don't know if it's possible: Read/construct DMPlex on single core including boundary conditions. Store BC related data in Vec or another appropriate data structure. Then distribute this BC holding data structure too as well as DMPlex. > > I would greatly appreciate your thoughts and any suggestions you might have regarding this approach. Looking forward to hearing your insights. > > Best regards, > > Onur From damynchipman at u.boisestate.edu Mon Oct 30 11:42:42 2023 From: damynchipman at u.boisestate.edu (Damyn Chipman) Date: Mon, 30 Oct 2023 10:42:42 -0600 Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators In-Reply-To: References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu> <87h6mfmka5.fsf@jedbrown.org> <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu> <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu> Message-ID: Sounds good, thanks. I?ve also been looking into Elemental, but the documentation seems outdated and I can?t find good examples on how to use it. I have the LLNL fork installed. Thanks, -Damyn > On Oct 28, 2023, at 8:56?AM, Matthew Knepley wrote: > > On Fri, Oct 27, 2023 at 3:54?PM Damyn Chipman > wrote: >> Yeah, I?ll make an issue and use a modified version of this test routine. >> >> Does anything change if I will be using MATSCALAPACK matrices instead of the built in MATDENSE? > > No, that is likely worse. > >> Like I said, I will be computing Schur complements and need to use a parallel and dense matrix format. > > I do not understand the communication pattern, but it is possible that Elemental would be slightly faster since it has some cool built-in communication operations, however it might be more programming. > > Thanks, > > Matt > >> -Damyn >> >>> On Oct 26, 2023, at 10:01?AM, Matthew Knepley > wrote: >>> >>> On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman > wrote: >>>> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries? >>> >>> We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness. >>> >>> Thanks, >>> >>> Matt >>> >>>> -Damyn >>>> >>>>> On Oct 24, 2023, at 9:51?PM, Jed Brown > wrote: >>>>> >>>>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks. >>>>> >>>>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator. >>>>> >>>>> Damyn Chipman > writes: >>>>> >>>>>> Hi PETSc developers, >>>>>> >>>>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another? >>>>>> >>>>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator. >>>>>> >>>>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc. >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Damyn Chipman >>>>>> Boise State University >>>>>> PhD Candidate >>>>>> Computational Sciences and Engineering >>>>>> damynchipman at u.boisestate.edu >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 30 12:16:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Oct 2023 13:16:56 -0400 Subject: [petsc-users] Advices on creating DMPlex from custom input format In-Reply-To: References: Message-ID: On Mon, Oct 30, 2023 at 5:37?AM onur.notonur via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I hope this message finds you all in good health and high spirits. > > I wanted to discuss an approach problem input file reading/processing in a > solver which is using PETSc DMPlex. In our team we have a range of solvers, > they are not built on PETSc except this one, but they all share a common > problem input format. This format includes essential data such as node > coordinates, element connectivity, boundary conditions based on elements, > and specific metadata related to the problem. I create an array for > boundary points on each rank and utilize them in our computations, I am > doing it hardcoded currently but I need to start reading those input files, > But I am not sure about the approach. > > Here's what I have in mind: > > 1. - Begin by reading the node coordinates and connectivity on a > single core. > - Utilize the DMPlexCreateFromCellListPetsc() function to construct > the DMPlex. > - Distribute the mesh across processors. > - Proceed to read and process the boundary conditions on each > processor. If the global index of the boundary element corresponds to that > processor, we process it; otherwise, we pass. > > Additionally, maybe I need to reorder the mesh. In that case I think I can > use the point permutation IS obtained from the DMPlexGetOrdering() function > while processing boundary conditions. > > Also I have another approach in my mind but I don't know if it's possible: > Read/construct DMPlex on single core including boundary conditions. Store > BC related data in Vec or another appropriate data structure. Then > distribute this BC holding data structure too as well as DMPlex. > This is by far the easier approach. If you do not have meshes that are too big to load in serial, I would do this. Here is what you do: - Read in the mesh onto 1 process - Mark the boundary conditions, probably with a DMLabel - Make a Section over the mesh indicating what data you have for BC - Create a Vec from this Section and fill it with boundary values (DMCreateGlobalVector) - Distribute the mesh, and keep the point SF (DMPlexDIstribute) - Create a BC SF from the points SF (PetscSFCreateSectionSF) - DIstribute the BC values using the BC SF (PetscSFBcast) Thanks, Matt > I would greatly appreciate your thoughts and any suggestions you might > have regarding this approach. Looking forward to hearing your insights. > > Best regards, > > Onur > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From onur.notonur at proton.me Tue Oct 31 04:59:27 2023 From: onur.notonur at proton.me (onur.notonur) Date: Tue, 31 Oct 2023 09:59:27 +0000 Subject: [petsc-users] Advices on creating DMPlex from custom input format In-Reply-To: References: Message-ID: Dear Matt and Jed, Thank you so much for your insights. Jed, as far as I know, the format is custom internal structure. I will double-check this. If it is used outside, I'm more than willing to contribute the reader. Best, Onur Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Monday, October 30th, 2023 at 8:16 PM, Matthew Knepley wrote: > On Mon, Oct 30, 2023 at 5:37?AM onur.notonur via petsc-users wrote: > >> Hi, >> >> I hope this message finds you all in good health and high spirits. >> >> I wanted to discuss an approach problem input file reading/processing in a solver which is using PETSc DMPlex. In our team we have a range of solvers, they are not built on PETSc except this one, but they all share a common problem input format. This format includes essential data such as node coordinates, element connectivity, boundary conditions based on elements, and specific metadata related to the problem. I create an array for boundary points on each rank and utilize them in our computations, I am doing it hardcoded currently but I need to start reading those input files, But I am not sure about the approach. >> >> Here's what I have in mind: >> >> - - Begin by reading the node coordinates and connectivity on a single core. >> - Utilize the DMPlexCreateFromCellListPetsc() function to construct the DMPlex. >> - Distribute the mesh across processors. >> - Proceed to read and process the boundary conditions on each processor. If the global index of the boundary element corresponds to that processor, we process it; otherwise, we pass. >> >> Additionally, maybe I need to reorder the mesh. In that case I think I can use the point permutation IS obtained from the DMPlexGetOrdering() function while processing boundary conditions. >> >> Also I have another approach in my mind but I don't know if it's possible: Read/construct DMPlex on single core including boundary conditions. Store BC related data in Vec or another appropriate data structure. Then distribute this BC holding data structure too as well as DMPlex. > > This is by far the easier approach. If you do not have meshes that are too big to load in serial, I would do > this. Here is what you do: > > - Read in the mesh onto 1 process > - Mark the boundary conditions, probably with a DMLabel > - Make a Section over the mesh indicating what data you have for BC > - Create a Vec from this Section and fill it with boundary values (DMCreateGlobalVector) > - Distribute the mesh, and keep the point SF (DMPlexDIstribute) > - Create a BC SF from the points SF (PetscSFCreateSectionSF) > - DIstribute the BC values using the BC SF (PetscSFBcast) > > Thanks, > > Matt > >> I would greatly appreciate your thoughts and any suggestions you might have regarding this approach. Looking forward to hearing your insights. >> >> Best regards, >> >> Onur > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 31 06:00:00 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 31 Oct 2023 07:00:00 -0400 Subject: [petsc-users] Boundary integral problem In-Reply-To: <1621606128.5024918.1695630786287@mail.yahoo.com> References: <1621606128.5024918.1695630786287.ref@mail.yahoo.com> <1621606128.5024918.1695630786287@mail.yahoo.com> Message-ID: On Mon, Sep 25, 2023 at 8:58?AM Azeddine Messikh via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear developers > > I tried to run ex24.c > https://petsc.org/release/src/snes/tutorials/ex24.c.html using the > following command line > > ./ex24 -sol_type quadratic -dm_plex_simplex 0 -field_petscspace_degree 1 > -potential_petscspace_degree 1 -dm_plex_box_faces 2,1 > > I discovered that at > > 254: PetscCall (PetscWeakFormSetIndexBdResidual(wf, label, 1, 0, 0, 0, f0_bd_quadratic_q, 0, NULL)); > > reverses the value of the integrals at the top only. That is > The boundary integral corresponding to node 5 becomes that of 4 and > vise-versa. > Same thing for nodes 5 and 6. > I apologize for taking so long to reply. This email fell out of my Inbox. I believe the problem is understanding the ordering of unknowns in Plex. For all shapes, I orient the boundary to have outward normals. This means that for quads, the vertex ordering would be 1--2--5--4 and 2--3--6--5 Does this make sense? Thanks, Matt > The mesh index is as follows > *4---*5---*6 > | | | > | | | > *1--- *2---*3 > > However, if I use -dm_plex_simplex 1 there is no problem. > > The model is in the form Au = b > > the value of b with "-dm_plex_simplex 0" is > [0.25 > 0.0104167 > 0. > 0. > 0.145833 > 0. > -0.583333 > 0.177083 > 0. > 0.0833333 > -0.28125 > 0. > 0. > -0.6875 > 0. > -0.75 > -0.364583 > 0.] > > and for -dm_plex_simplex 1 > [0.0833333 > 0.0104167 > 0. > 0. > 0.145833 > 0. > -0.583333 > 0.177083 > 0. > 0.25 > -0.260417 > 0. > 1.43404e-16 > -0.645833 > 0. > -0.75 > -0.427083 > 0.] > > you can see that the value at node 1 =0.25 and node 4 = 0.0833333 ( > simplex 0) > which is reversed, that is, node 4 =0.25 and node 1 = 0.0833333 > (simplex 1) > > So, my own calculation shows that at node 1 should be 0.083333 not 0.25. > The -dm_plex_simplex 1 gives the correct answer but -dm_plex_simplex 0 > gives wrong answer. > > > Would you please help me in this matter. > Sincerely yours > Azeddine M > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Oct 31 08:29:59 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 31 Oct 2023 09:29:59 -0400 Subject: [petsc-users] Kokkos PtAP error Message-ID: I am getting this error. This is in GAMG/HEM setup. PtAP for the coarse grid construction works, but I call this in a graph routine (/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043). Also, this PtAP does not need to be on the GPU anyway because P is extremely sparse ... can I pin, say P, to the CPU to keep this all on the host? Thanks, Mark [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Unspecified symbolic phase for product AB with A mpiaijkokkos, B mpiaij. Call MatProductSetFromOptions() first [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-ksp_converged_reason (no value) source: command line [0]PETSC ERROR: Option left: name:-ksp_viewxx (no value) source: command line [0]PETSC ERROR: Option left: name:-log_view_gpu_timexxx (no value) source: command line [0]PETSC ERROR: Option left: name:-options_left (no value) source: command line [0]PETSC ERROR: Option left: name:-pc_gamg_use_aggressive_square_graph value: true source: command line [0]PETSC ERROR: Option left: name:-pc_gamg_use_minimum_degree_ordering value: false source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.20.0-168-ga7898f52c39 GIT Date: 2023-10-28 10:07:38 -0500 [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/./ex13 on a arch-perlmutter-dbg-gcc-kokkos-cuda named nid001680 by madams Tue Oct 31 06:21:25 2023 [0]PETSC ERROR: Configure options --CFLAGS=" -g" --CXXFLAGS=" -g" --CUDAFLAGS="-g -Xcompiler -rdynamic" --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --FFLAGS=" -g " --COPTFLAGS=" -O0" --CXXOPTFLAGS=" -O0" --FOPTFLAGS=" -O0" --download-triangle=1 --with-debugging=1 --with-cuda=1 --with-cuda-arch=80 --with-mpiexec="srun -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --with-make-np=8 PETSC_ARCH=arch-perlmutter-dbg-gcc-kokkos-cuda [0]PETSC ERROR: #1 MatProductSymbolic() at /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:807 [0]PETSC ERROR: #2 MatProductSymbolic_PtAP_Unsafe() at /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:73 [0]PETSC ERROR: #3 MatProductSymbolic_Unsafe() at /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:185 [0]PETSC ERROR: #4 MatProductSymbolic() at /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:795 [0]PETSC ERROR: #5 MatPtAP() at /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9938 [0]PETSC ERROR: #6 MatCoarsenApply_HEM_private() at /global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Oct 31 13:24:52 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 31 Oct 2023 14:24:52 -0400 Subject: [petsc-users] Kokkos PtAP error In-Reply-To: References: Message-ID: Correction, I get the same message with -mat_type aijcusparse. Thanks, Mark On Tue, Oct 31, 2023 at 9:29?AM Mark Adams wrote: > I am getting this error. > This is in GAMG/HEM setup. PtAP for the coarse grid construction works, > but I call this in a graph routine > (/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043). > > Also, this PtAP does not need to be on the GPU anyway because P is > extremely sparse ... can I pin, say P, to the CPU to keep this all on the > host? > > Thanks, > Mark > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Unspecified symbolic phase for product AB with A > mpiaijkokkos, B mpiaij. Call MatProductSetFromOptions() first > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-ksp_converged_reason (no value) > source: command line > [0]PETSC ERROR: Option left: name:-ksp_viewxx (no value) source: command > line > [0]PETSC ERROR: Option left: name:-log_view_gpu_timexxx (no value) > source: command line > [0]PETSC ERROR: Option left: name:-options_left (no value) source: > command line > [0]PETSC ERROR: Option left: name:-pc_gamg_use_aggressive_square_graph > value: true source: command line > [0]PETSC ERROR: Option left: name:-pc_gamg_use_minimum_degree_ordering > value: false source: command line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.20.0-168-ga7898f52c39 > GIT Date: 2023-10-28 10:07:38 -0500 > [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/./ex13 on a > arch-perlmutter-dbg-gcc-kokkos-cuda named nid001680 by madams Tue Oct 31 > 06:21:25 2023 > [0]PETSC ERROR: Configure options --CFLAGS=" -g" --CXXFLAGS=" -g" > --CUDAFLAGS="-g -Xcompiler -rdynamic" --with-cc=cc --with-cxx=CC > --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --FFLAGS=" -g " --COPTFLAGS=" > -O0" --CXXOPTFLAGS=" -O0" --FOPTFLAGS=" -O0" --download-triangle=1 > --with-debugging=1 --with-cuda=1 --with-cuda-arch=80 --with-mpiexec="srun > -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels > --with-kokkos-kernels-tpl=0 --with-make-np=8 > PETSC_ARCH=arch-perlmutter-dbg-gcc-kokkos-cuda > [0]PETSC ERROR: #1 MatProductSymbolic() at > /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:807 > [0]PETSC ERROR: #2 MatProductSymbolic_PtAP_Unsafe() at > /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:73 > [0]PETSC ERROR: #3 MatProductSymbolic_Unsafe() at > /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:185 > [0]PETSC ERROR: #4 MatProductSymbolic() at > /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:795 > [0]PETSC ERROR: #5 MatPtAP() at > /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9938 > [0]PETSC ERROR: #6 MatCoarsenApply_HEM_private() at > /global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Oct 31 14:03:57 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 31 Oct 2023 15:03:57 -0400 Subject: [petsc-users] Kokkos PtAP error In-Reply-To: References: Message-ID: In reading the error message I see that I did not clone A, to get P, so P was the wrong type with a device. Thanks, Mark On Tue, Oct 31, 2023 at 2:24?PM Mark Adams wrote: > Correction, I get the same message with -mat_type aijcusparse. > > Thanks, > Mark > > On Tue, Oct 31, 2023 at 9:29?AM Mark Adams wrote: > >> I am getting this error. >> This is in GAMG/HEM setup. PtAP for the coarse grid construction works, >> but I call this in a graph routine >> (/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043). >> >> Also, this PtAP does not need to be on the GPU anyway because P is >> extremely sparse ... can I pin, say P, to the CPU to keep this all on the >> host? >> >> Thanks, >> Mark >> >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Petsc has generated inconsistent data >> [0]PETSC ERROR: Unspecified symbolic phase for product AB with A >> mpiaijkokkos, B mpiaij. Call MatProductSetFromOptions() first >> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the >> program crashed before usage or a spelling mistake, etc! >> [0]PETSC ERROR: Option left: name:-ksp_converged_reason (no value) >> source: command line >> [0]PETSC ERROR: Option left: name:-ksp_viewxx (no value) source: >> command line >> [0]PETSC ERROR: Option left: name:-log_view_gpu_timexxx (no value) >> source: command line >> [0]PETSC ERROR: Option left: name:-options_left (no value) source: >> command line >> [0]PETSC ERROR: Option left: name:-pc_gamg_use_aggressive_square_graph >> value: true source: command line >> [0]PETSC ERROR: Option left: name:-pc_gamg_use_minimum_degree_ordering >> value: false source: command line >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.20.0-168-ga7898f52c39 >> GIT Date: 2023-10-28 10:07:38 -0500 >> [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/./ex13 on a >> arch-perlmutter-dbg-gcc-kokkos-cuda named nid001680 by madams Tue Oct 31 >> 06:21:25 2023 >> [0]PETSC ERROR: Configure options --CFLAGS=" -g" --CXXFLAGS=" -g" >> --CUDAFLAGS="-g -Xcompiler -rdynamic" --with-cc=cc --with-cxx=CC >> --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --FFLAGS=" -g " --COPTFLAGS=" >> -O0" --CXXOPTFLAGS=" -O0" --FOPTFLAGS=" -O0" --download-triangle=1 >> --with-debugging=1 --with-cuda=1 --with-cuda-arch=80 --with-mpiexec="srun >> -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels >> --with-kokkos-kernels-tpl=0 --with-make-np=8 >> PETSC_ARCH=arch-perlmutter-dbg-gcc-kokkos-cuda >> [0]PETSC ERROR: #1 MatProductSymbolic() at >> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:807 >> [0]PETSC ERROR: #2 MatProductSymbolic_PtAP_Unsafe() at >> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:73 >> [0]PETSC ERROR: #3 MatProductSymbolic_Unsafe() at >> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:185 >> [0]PETSC ERROR: #4 MatProductSymbolic() at >> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:795 >> [0]PETSC ERROR: #5 MatPtAP() at >> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9938 >> [0]PETSC ERROR: #6 MatCoarsenApply_HEM_private() at >> /global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victoria.rolandi93 at gmail.com Tue Oct 31 22:30:56 2023 From: victoria.rolandi93 at gmail.com (Victoria Rolandi) Date: Tue, 31 Oct 2023 20:30:56 -0700 Subject: [petsc-users] Error using Metis with PETSc installed with MUMPS Message-ID: Hi, I'm solving a large sparse linear system in parallel and I am using PETSc with MUMPS. I am trying to test different options, like the ordering of the matrix. Everything works if I use the *-mat_mumps_icntl_7 2 *or *-mat_mumps_icntl_7 0 *options (with the first one, AMF, performing better than AMD), however when I test METIS *-mat_mumps_icntl_7** 5 *I get an error (reported at the end of the email). I have configured PETSc with the following options: --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-scalar-type=complex --with-debugging=0 --with-precision=single --download-mumps --download-scalapack --download-parmetis --download-metis and the installation didn't give any problems. Could you help me understand why metis is not working? Thank you in advance, Victoria Error: ****** ANALYSIS STEP ******** ** Maximum transversal (ICNTL(6)) not allowed because matrix is distributed Processing a graph of size: 699150 with 69238690 edges Ordering based on METIS 510522 37081376 [100] [10486 699150] Error! Unknown CType: -1 -------------- next part -------------- An HTML attachment was scrubbed... URL: